Skip to content

Keep only last line of a repeated pattern

An answer to this question on Stack Overflow.

Question

I would like to know if it is possible to delete all the lines of a selected pattern except the last one. It is not so easy to explain, so I will make an example.

I have a text file with content similar to this:

A sent (1)
A received (1)
B sent (1)
B sent (2)
B sent (3)
B received (1)

I would like to have an alternation between "sent" and "received" messages, where the "sent" one is the last between the sent messages with the same letter. So I need an output like:

A sent (1)
A received (1)
B sent (3)
B received (1)

Is there some program that can do something like that? I can use either Ubuntu or Windows, or build a simple C/C++ application, if necessary.

Answer

Here's a simple way:

tac FILE | uniq -w 6 | tac

We:

  1. Reverse-print the file using tac (necessary for uniq to work right here).
  2. Weed out duplicate lines basing uniqueness on only the first 6 characters (thereby ignoring the incrementing number in parantheses). Only the first line of a set of duplicate lines is kept, which is why we have used tac.
  3. Then reverse-print the file again so it's in the order you want.