Finding duplicate files

5 minute read

I’ve been using PhotoSync to automatically transfer any new photos from my iPhone to my NAS. It’s a pretty wonderful app, here’s the NAS part:

Securely backup photos & videos directly from your iPhone, iPad and Android devices to any NAS, remote server or personal cloud over (S)FTP, SMB or WebDAV.

You can also download and view photos & videos on (S)FTP, SMB and WebDAV servers. PhotoSync works seamlessly with all major NAS storage devices, servers and personal cloud services.

I have it set up so that it kicks off Autotransfer whenever my device connects to my home WiFi network.

It syncs to a folder that is then again synced to my Nextcloud server using a Docker container I adapted. It all works fine. Except…

The way I had PhotoSync configured meant that every time I changed iPhone, it synced everything again to a new subfolder with the name of the new device. So I have a tree looking like this:

root@dusky2: /volume1/docker/nextcloud/media/iPhoneSync
$ tree -d -I '*eaDir'
.
`-- James's iPhone 8
    |-- CABG
    |-- Camera Roll
    |-- deGeo'd Photos
    |-- Dropbox
    |-- Dubsmash
    |-- Instagram
    |-- James's iPhone 8
    |   |-- Adobe Capture
    |   |-- Camera Roll
    |   |-- Dropbox
    |   |-- RAW
    |   `-- WhatsApp
    |-- James's iPhone XS
    |   |-- Camera Roll
    |   |-- Dropbox
    |   |-- James's iPhone 11 Pro
    |   |   |-- Adobe Capture
    |   |   |-- Dropbox
    |   |   |-- My Photo Stream
    |   |   |-- PSExpress
    |   |   |-- Recents
    |   |   |   `-- James's iPhone 11 Pro
    |   |   |       |-- Dropbox
    |   |   |       |-- Nextcloud
    |   |   |       |-- Recents
    |   |   |       |-- Telegram
    |   |   |       |-- Twitter
    |   |   |       `-- WhatsApp
    |   |   |-- Snapchat
    |   |   |-- Telegram
    |   |   |-- Twitter
    |   |   `-- WhatsApp
    |   |-- James's iPhone XS
    |   |   |-- Camera Roll
    |   |   |-- Snapchat
    |   |   |-- Telegram
    |   |   |-- Twitter
    |   |   `-- WhatsApp
    |   |-- Josh's handiwork
    |   |-- Telegram
    |   |-- Twitter
    |   `-- WhatsApp
    |-- Josh's handiwork
    |-- Skype
    |-- Twitter
    `-- WhatsApp

Duplicate directories, duplicate device names, also, in iOS 13, Camera Roll changed to Recents.

A bit of a mess. 63 directories, 24590 files.

I wanted to know if any files were duplicates. I have some Mac apps like Duplicates Finder and Duplicate Detective 2, but wondered if there were any Unix tools I could try as well.

Straight to StackOverflow, which provided many methods:

find1

find . ! -empty -type f -exec md5sum {} + | sort | uniq -w32 -dD
find . -type f -exec md5sum {} \; > md5sums
awk '{print $1}' md5sums | sort | uniq -d > dupes
while read -r d; do echo "---"; grep -- "$d" md5sums | cut -d ' ' -f 2-; done < dupes
find . -type f -exec md5sum '{}' + | sort | uniq --all-repeated \
--check-chars=32 | cut --characters=35-

fdupes2

fdupes -r .
fdupes -r . | {
    while IFS= read -r file; do
        [[ $file ]] && du "$file"
    done
} | sort -n

fslint3

cd /usr/share/fslint/fslint && ./fslint /path/to/directory
cd /usr/share/fslint/fslint && findup -t --summary /path/to/directory

Amazingly, none of them found dupes in my iPhoneSync folder. So that’s good.

They did, however, find dupes in my Photos folder:

find

root@cloud: /mnt/volume_nc/data/james/files/Photos
$ find . ! -empty -type f -exec md5sum {} + | sort | uniq -w32 -dD
0aadc1b11a8407b295008e6c9d22fc3e  ./iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg
0aadc1b11a8407b295008e6c9d22fc3e  ./iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
0acd0d5b0c01c4bee9afe9176073b4fa  ./iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg
0acd0d5b0c01c4bee9afe9176073b4fa  ./iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
1212fb3e6490ddfb066e6c556bc392b4  ./iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg
1212fb3e6490ddfb066e6c556bc392b4  ./iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
155754a324daff68c8daed1044328a16  ./iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg
155754a324daff68c8daed1044328a16  ./iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
1a82cddba43617c75d9b686002db7cb6  ./iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg
1a82cddba43617c75d9b686002db7cb6  ./iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
2faf11e6b5c80f09fccf957c8dd2f95a  ./iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg
2faf11e6b5c80f09fccf957c8dd2f95a  ./iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
459acd94c6128d60d16341a815c7d6e5  ./iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg
459acd94c6128d60d16341a815c7d6e5  ./iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
5573dac4e8a2447dd4080374820a951b  ./iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg
5573dac4e8a2447dd4080374820a951b  ./iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
5ee8ed449af66395f11021da4c7f5e65  ./iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg
5ee8ed449af66395f11021da4c7f5e65  ./iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg
634ad569036b9afb05692d330453f598  ./iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg
634ad569036b9afb05692d330453f598  ./iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
75994932717b5a0eb9d1c9e5e096205e  ./iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png
75994932717b5a0eb9d1c9e5e096205e  ./iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
896c522d3c4972a01abc35c91993b959  ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg
896c522d3c4972a01abc35c91993b959  ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
8e5a153ee2e61eacca6ae04d790620d6  ./iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg
8e5a153ee2e61eacca6ae04d790620d6  ./iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
95d066c26fdf046f272939ae9606e7c5  ./iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg
95d066c26fdf046f272939ae9606e7c5  ./iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
a389e5ab5d8752884680dd7ea31ac1c3  ./iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg
a389e5ab5d8752884680dd7ea31ac1c3  ./iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
aa37f32c86a4206739d38922544717dd  ./iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg
aa37f32c86a4206739d38922544717dd  ./iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
b278f87efe7381facf3fd4dd75def8e9  ./iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg
b278f87efe7381facf3fd4dd75def8e9  ./iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
b27e3d78a9c419147b9d7feb387f4ec9  ./iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg
b27e3d78a9c419147b9d7feb387f4ec9  ./iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
dd7025344bd32d7433b16d99ce5c846e  ./iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg
dd7025344bd32d7433b16d99ce5c846e  ./iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
e5df162f9f04c0de281bd2f5692dfffe  ./iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg
e5df162f9f04c0de281bd2f5692dfffe  ./iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
fcbecc72fd2540829a4359a3ac620191  ./2020/03/Video-20-03-2020-06-53-01-9363.mp4
fcbecc72fd2540829a4359a3ac620191  ./2020/03/Video-21-03-2020-08-29-08-9370.mp4
ff846d763f428049dcb1b2615f9ee500  ./iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg
ff846d763f428049dcb1b2615f9ee500  ./iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg

22 dupes found.

fdupes

root@cloud: /mnt/volume_nc/data/james/files/Photos
$ fdupes -r . | {
>     while IFS= read -r file; do
>         [[ $file ]] && du "$file"
>     done
> } | sort -n

<snip>
64  ./iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg
64  ./iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg
80  ./iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg
80  ./iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
84  ./iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg
84  ./iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
88  ./iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg
88  ./iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
92  ./iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg
92  ./iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
96  ./iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg
96  ./iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
100 ./iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg
100 ./iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
108 ./iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg
108 ./iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
108 ./iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg
108 ./iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
108 ./iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg
108 ./iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
112 ./iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg
112 ./iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
116 ./iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg
116 ./iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
124 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg
124 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
128 ./iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg
128 ./iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
148 ./iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg
148 ./iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
192 ./iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg
192 ./iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
212 ./iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg
212 ./iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
264 ./iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg
264 ./iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg
304 ./iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg
304 ./iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
356 ./iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg
356 ./iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
404 ./iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png
404 ./iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
1924    ./2020/03/Video-20-03-2020-06-53-01-9363.mp4
1924    ./2020/03/Video-21-03-2020-08-29-08-9370.mp4

Excluding empty files, 22 dupes found.

fslint

root@cloud: /usr/share/fslint/fslint
$ ./findup -t --summary "/mnt/volume_nc/data/james/files/Photos"
2 * 1970176	2020/03/Video-20-03-2020-06-53-01-9363.mp4 2020/03/Video-21-03-2020-08-29-08-9370.mp4
2 * 413696	iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
2 * 364544	iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
2 * 311296	iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
2 * 270336	iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg
2 * 217088	iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
2 * 196608	iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
2 * 151552	iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
2 * 131072	iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
2 * 126976	iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
2 * 118784	iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
2 * 114688	iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
2 * 110592	iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
2 * 110592	iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
2 * 110592	iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
2 * 102400	iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
2 * 98304	iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
2 * 94208	iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
2 * 90112	iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
2 * 86016	iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
2 * 81920	iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
2 * 65536	iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg

22 dupes found.

So they all find the same dupes, but how do they perform?

Performance

find

real    1m15.949s
user    0m20.144s
sys     0m4.599s

fdupes

real    0m0.759s
user    0m0.181s
sys     0m0.324s

fslint

real    0m0.220s
user    0m0.148s
sys     0m0.049s

fslint is the winner there, with fdupes a close second, and find miles behind.

I compared the results with the Mac apps:

Duplicate Detective 2

Also found 22 dupes, in about 6 seconds.

Duplicates Finder screenshot

Duplicates Finder

Also found 22 dupes, but took 52 seconds.

Duplicates Finder screenshot

Summary

PhotoSync works better than I thought and while the Mac apps give you a nice preview of the duplicates, the command line tools are much faster.


  1. find man page

  2. fdupes home and man page

  3. fslint home and man page