Added on Feb 25th, 2015 and marked as cli filesystem

Anybody who makes regular backups, uses some package manager to check out software or does any work on directories with several hundreds of files knows there will be a point that the directory structure is too vast to grasp in your mind. At that stage it is really easy to have duplicate, missing or unwanted files in your folders.

For instance, if you make a backup of a directory structure, you want to have an exact copy of the thing. Not an almost exact copy with some files missing, some files that turned corrupt and some extra files you put there by mistake or forgot to remove.

I often receive updates to (premium) WordPress plugins. Since these plugins have no version control whatsoever I like to keep them under version control myself. Most of the time the changes concern existing or new files, but sometimes an existing file is no longer needed and keeping it in the repository is not a good idea. But how do you know which files are no longer needed?

There are several tools to compare the contents of two directories. The one I like best is diff. Normally it is used to compare the contents of files, but it is also possible to make a recursive comparison (-r). And by including the quiet flag (-q) you get only a list of differences:

diff -rq original/ copy/

This will output something like this:

Files original/in-both.txt and copy/in-both.txt differ
Only in copy/: only-in-copy.txt
Only in original/: only-in-original.txt

As you can see, all differences between the directories original and copy are shown. In this case, both directories have a file called in-both.txt, but the content is different. Also both original and copy contain files that are not present in the other directory (only-in-original.txt and only-in-copy.txt respectively).