A Handy Tool for Cleaning Up Disorganised Backups

During my recent trip to Germany, I took a whole bunch of photos with my smartphone. While I was out there, I experimented with a different operating system on my phone. Since my regular contacts from home were communicating with me using different channels (email, WhatsApp, etc.), I could afford to screw up my phone and not go completely off the grid.

It's always a good idea to backup the contents you can't afford or don't wish to lose, even if you're pretty confident you won't screw up. So, before my experiments I lazily took a copy of my photos from my phone and dumped it on my laptop's hard drive. Naturally, after each experiment I did not go back and remove my backup.

Now that I'm back, I've been slowly migrating the files associated with the last six months of work from my laptop onto my desktop. Today was the turn of the photos, and I found myself with three separate directories full of largely the same photographs, but not the same names. Removing each duplicate manually would be a pain, so I did some searching and found a nifty program called fdupes. Exactly what I needed! It was available via apt-get, which was even better. Here's how to do it:

  1. Install fdupes. If you use Ubuntu (or Debian), it's easy:

    ~$ sudo apt-get install fdupes
    
  2. Move the directory/directories you want to remove duplicates across into another folder, e.g.:

    ~$ mv CameraBackup ~/tmppics
    ~$ mv CameraBackup2 ~/tmppics
    
  3. Run fdupes like this, but be careful, since you're deleting files - you should backup these files while you remove duplicates!):

    ~$ fdupes -rdN tmppics/
    

This will look for duplicates in each of the directories within tmppics recursively, and remove all instances of duplicates except the first one found. The -r flag tells fdupes to scan folders recursively, rather than just looking at the files in the top level of the directory specified. The -d flag tells fdupes to delete the files it finds. The -N flag tells fdupes not to prompt you before each deletion. The combination of -d and -N behaves in such a way that the first file found in each matched set is preserved, which is the behaviour I want.