Practical Applications Of The uniq Command
Many of the standard Linux command line commands are intended for use when working with plain text files. One such command is the uniq command, which is commonly used as a way to filter out duplicate lines of output from files or other commands. While this default behavior is helpful, uniq has a few more tricks up its sleeve that are also very handy.
To start with, let’s look at a simple file which I’ll call testfile.txt, the contents of which are as follows:
This line features once.
This line features twice.
This line features thrice.
This line features thrice.
This line features thrice.
This line features twice.
Running uniq without any arguments predictably produces a response with all of the duplicates removed:
uniq testfile.txt
This line features once.
This line features twice.
This line features thrice.
This line features twice.
Notice that only consecutive duplicate lines are removed. If you wish to deal with non-consecutive duplicates you’ll need to run the file through the sort command first and pipe the output through uniq:
sort testfile.txt | uniq
This line features once.
This line features thrice.
This line features twice.
Now let’s imagine you want to know how many times each line occurs in the text. This can be done using the -c flag that puts a count of the number of repetitions at the start of each line:
uniq -c testfile.txt
1 This line features once.
1 This line features twice.
3 This line features thrice.
1 This line features twice.
Sometimes you may only be interested in the duplicated lines in the file, and these can be viewed with the -D flag:
uniq -D testfile.tx
This line features thrice.
This line features thrice.
This line features thrice.
Unfortunately the lines are not condensed, so if you wanted to know which lines were duplicates from anywhere in the file and how many times they featured, you can chain a few commands together:
sort testfile.txt | uniq -D | uniq -c
3 This line features thrice.
2 This line features twice.
By way of contrast, the -u flag only displays the unique lines in the file:
uniq -u testfile.txt
This line features once.
This line features twice.
This line features twice.
Just as before, you need to sort the file first to remove all of the duplicates rather than just adjacent ones.
Sometimes the first few columns in a file can be used for information that you don’t want to be considered when checking for duplicates: examples would be dates and times in log files. You can use the -f flag to skip a number of these sections (also known as fields) separated by spaces. To take our example, “-f 2” would skip the first two words of each line when comparing our example files.
Imagine our testfile now contains the following:
123 456 testing
123 789 testing
124 567 bob
124 532 bob
Next, running the command:
uniq -f 2 testfile.txt
123 456 testing
124 567 bob
As you can see from the output, only the first of each match is returned (as per normal functionality) so if seeing the different data at the front of each line is important you’d need to combine this with the -D flag:
uniq -Df 2 testfile.txt
123 456 testing
123 789 testing
124 567 bob
124 532 bob
Alternately you can use the –group flag which will display all the duplicates, and also split them by blank lines. There are two options for this: ‘prepend’ (put the blank like first) and ‘append’ (put the blank line after):
uniq -f 2 –group=prepend testfile.txt
123 456 testing
123 789 testing
124 567 bob
124 532 bob
In a similar manner, you can also skip an initial quantity of characters using the -s flag:
uniq -s 7 testfile.txt
123 456 testing
124 567 bob
You may also want to match by a set number of characters in the file – think back to our original file with this example:
uniq -w 20 testfile.txt
This line features once.
This line features twice.
Note that the lines containing “thrice and twice” are combined as we are only counting characters up to the first character of the final word. We can confirm this with the -c flag:
uniq -cw 20 testfile.txt
1 This line features once.
5 This line features twice.
Finally, you may wish to know whether the letters match rather than what case they are. So in this example, you can use the -i flag to match regardless of case which can be combined with our previous example uses.
At this point we’ve really only just touched the tip of the iceberg as to the uses of uniq, but with these examples you should hopefully see how you can use it to solve your particular problem.