Finding duplicate files
I’ve been using PhotoSync to automatically transfer any new photos from my iPhone to my NAS. It’s a pretty wonderful app, here’s the NAS part:
Securely backup photos & videos directly from your iPhone, iPad and Android devices to any NAS, remote server or personal cloud over (S)FTP, SMB or WebDAV.
You can also download and view photos & videos on (S)FTP, SMB and WebDAV servers. PhotoSync works seamlessly with all major NAS storage devices, servers and personal cloud services.
I have it set up so that it kicks off Autotransfer whenever my device connects to my home WiFi network.
It syncs to a folder that is then again synced to my Nextcloud server using a Docker container I adapted. It all works fine. Except…
The way I had PhotoSync configured meant that every time I changed iPhone, it synced everything again to a new subfolder with the name of the new device. So I have a tree looking like this:
root@dusky2: /volume1/docker/nextcloud/media/iPhoneSync
$ tree -d -I '*eaDir'
.
`-- James's iPhone 8
|-- CABG
|-- Camera Roll
|-- deGeo'd Photos
|-- Dropbox
|-- Dubsmash
|-- Instagram
|-- James's iPhone 8
| |-- Adobe Capture
| |-- Camera Roll
| |-- Dropbox
| |-- RAW
| `-- WhatsApp
|-- James's iPhone XS
| |-- Camera Roll
| |-- Dropbox
| |-- James's iPhone 11 Pro
| | |-- Adobe Capture
| | |-- Dropbox
| | |-- My Photo Stream
| | |-- PSExpress
| | |-- Recents
| | | `-- James's iPhone 11 Pro
| | | |-- Dropbox
| | | |-- Nextcloud
| | | |-- Recents
| | | |-- Telegram
| | | |-- Twitter
| | | `-- WhatsApp
| | |-- Snapchat
| | |-- Telegram
| | |-- Twitter
| | `-- WhatsApp
| |-- James's iPhone XS
| | |-- Camera Roll
| | |-- Snapchat
| | |-- Telegram
| | |-- Twitter
| | `-- WhatsApp
| |-- Josh's handiwork
| |-- Telegram
| |-- Twitter
| `-- WhatsApp
|-- Josh's handiwork
|-- Skype
|-- Twitter
`-- WhatsApp
Duplicate directories, duplicate device names, also, in iOS 13, Camera Roll changed to Recents.
A bit of a mess. 63 directories, 24590 files.
I wanted to know if any files were duplicates. I have some Mac apps like Duplicates Finder and Duplicate Detective 2, but wondered if there were any Unix tools I could try as well.
Straight to StackOverflow, which provided many methods:
find1
find . ! -empty -type f -exec md5sum {} + | sort | uniq -w32 -dD
find . -type f -exec md5sum {} \; > md5sums
awk '{print $1}' md5sums | sort | uniq -d > dupes
while read -r d; do echo "---"; grep -- "$d" md5sums | cut -d ' ' -f 2-; done < dupes
find . -type f -exec md5sum '{}' + | sort | uniq --all-repeated \
--check-chars=32 | cut --characters=35-
fdupes2
fdupes -r .
fdupes -r . | {
while IFS= read -r file; do
[[ $file ]] && du "$file"
done
} | sort -n
fslint3
cd /usr/share/fslint/fslint && ./fslint /path/to/directory
cd /usr/share/fslint/fslint && findup -t --summary /path/to/directory
Amazingly, none of them found dupes in my iPhoneSync folder. So that’s good.
They did, however, find dupes in my Photos folder:
find
root@cloud: /mnt/volume_nc/data/james/files/Photos
$ find . ! -empty -type f -exec md5sum {} + | sort | uniq -w32 -dD
0aadc1b11a8407b295008e6c9d22fc3e ./iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg
0aadc1b11a8407b295008e6c9d22fc3e ./iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
0acd0d5b0c01c4bee9afe9176073b4fa ./iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg
0acd0d5b0c01c4bee9afe9176073b4fa ./iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
1212fb3e6490ddfb066e6c556bc392b4 ./iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg
1212fb3e6490ddfb066e6c556bc392b4 ./iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
155754a324daff68c8daed1044328a16 ./iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg
155754a324daff68c8daed1044328a16 ./iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
1a82cddba43617c75d9b686002db7cb6 ./iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg
1a82cddba43617c75d9b686002db7cb6 ./iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
2faf11e6b5c80f09fccf957c8dd2f95a ./iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg
2faf11e6b5c80f09fccf957c8dd2f95a ./iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
459acd94c6128d60d16341a815c7d6e5 ./iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg
459acd94c6128d60d16341a815c7d6e5 ./iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
5573dac4e8a2447dd4080374820a951b ./iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg
5573dac4e8a2447dd4080374820a951b ./iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
5ee8ed449af66395f11021da4c7f5e65 ./iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg
5ee8ed449af66395f11021da4c7f5e65 ./iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg
634ad569036b9afb05692d330453f598 ./iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg
634ad569036b9afb05692d330453f598 ./iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
75994932717b5a0eb9d1c9e5e096205e ./iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png
75994932717b5a0eb9d1c9e5e096205e ./iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
896c522d3c4972a01abc35c91993b959 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg
896c522d3c4972a01abc35c91993b959 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
8e5a153ee2e61eacca6ae04d790620d6 ./iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg
8e5a153ee2e61eacca6ae04d790620d6 ./iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
95d066c26fdf046f272939ae9606e7c5 ./iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg
95d066c26fdf046f272939ae9606e7c5 ./iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
a389e5ab5d8752884680dd7ea31ac1c3 ./iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg
a389e5ab5d8752884680dd7ea31ac1c3 ./iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
aa37f32c86a4206739d38922544717dd ./iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg
aa37f32c86a4206739d38922544717dd ./iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
b278f87efe7381facf3fd4dd75def8e9 ./iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg
b278f87efe7381facf3fd4dd75def8e9 ./iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
b27e3d78a9c419147b9d7feb387f4ec9 ./iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg
b27e3d78a9c419147b9d7feb387f4ec9 ./iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
dd7025344bd32d7433b16d99ce5c846e ./iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg
dd7025344bd32d7433b16d99ce5c846e ./iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
e5df162f9f04c0de281bd2f5692dfffe ./iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg
e5df162f9f04c0de281bd2f5692dfffe ./iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
fcbecc72fd2540829a4359a3ac620191 ./2020/03/Video-20-03-2020-06-53-01-9363.mp4
fcbecc72fd2540829a4359a3ac620191 ./2020/03/Video-21-03-2020-08-29-08-9370.mp4
ff846d763f428049dcb1b2615f9ee500 ./iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg
ff846d763f428049dcb1b2615f9ee500 ./iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg
22 dupes found.
fdupes
root@cloud: /mnt/volume_nc/data/james/files/Photos
$ fdupes -r . | {
> while IFS= read -r file; do
> [[ $file ]] && du "$file"
> done
> } | sort -n
<snip>
64 ./iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg
64 ./iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg
80 ./iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg
80 ./iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
84 ./iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg
84 ./iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
88 ./iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg
88 ./iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
92 ./iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg
92 ./iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
96 ./iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg
96 ./iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
100 ./iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg
100 ./iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
108 ./iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg
108 ./iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
108 ./iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg
108 ./iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
108 ./iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg
108 ./iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
112 ./iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg
112 ./iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
116 ./iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg
116 ./iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
124 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg
124 ./iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
128 ./iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg
128 ./iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
148 ./iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg
148 ./iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
192 ./iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg
192 ./iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
212 ./iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg
212 ./iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
264 ./iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg
264 ./iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg
304 ./iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg
304 ./iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
356 ./iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg
356 ./iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
404 ./iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png
404 ./iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
1924 ./2020/03/Video-20-03-2020-06-53-01-9363.mp4
1924 ./2020/03/Video-21-03-2020-08-29-08-9370.mp4
Excluding empty files, 22 dupes found.
fslint
root@cloud: /usr/share/fslint/fslint
$ ./findup -t --summary "/mnt/volume_nc/data/james/files/Photos"
2 * 1970176 2020/03/Video-20-03-2020-06-53-01-9363.mp4 2020/03/Video-21-03-2020-08-29-08-9370.mp4
2 * 413696 iPhone/2011/04/Photo-15-04-2011-01-16-11-2990.png iPhone/2011/07/Photo-21-07-2011-09-16-29-4039.png
2 * 364544 iPhone/2013/12/Photo-31-12-2013-08-26-39-4052.jpg iPhone/2013/12/Photo-31-12-2013-08-35-05-4054.jpg
2 * 311296 iPhone/2015/02/Photo-27-02-2015 03-57-55-9189.jpg iPhone/2015/02/Photo-27-02-2015-03-57-55-9189.jpg
2 * 270336 iPhone/2015/03/Photo-28-03-2015 04-01-07-9655.jpg iPhone/2015/03/Photo-28-03-2015 07-26-49-9657.jpg
2 * 217088 iPhone/2020/02/Photo-05-02-2020-06-33-35-8879.jpg iPhone/2020/02/Photo-06-02-2020-10-47-00-8888.jpg
2 * 196608 iPhone/2014/03/Photo-07-03-2014-05-09-07-4544.jpg iPhone/2014/03/Photo-07-03-2014-05-11-11-4545.jpg
2 * 151552 iPhone/2013/09/Photo-08-09-2013-07-28-53-2276.jpg iPhone/2013/09/Photo-08-09-2013-07-37-27-2278.jpg
2 * 131072 iPhone/2014/07/Photo-19-07-2014-09-58-33-6798.jpg iPhone/2014/07/Photo-21-07-2014-06-12-52-6824.jpg
2 * 126976 iPhone/2020/03/Photo-03-03-2020-09-13-17-9198.jpeg iPhone/2020/03/Photo-03-03-2020-09-13-17-9283.jpeg
2 * 118784 iPhone/2014/03/Photo-29-03-2014-02-47-45-4865.jpg iPhone/2014/03/Photo-29-03-2014-09-36-14-4869.jpg
2 * 114688 iPhone/2014/08/Photo-05-08-2014-03-44-27-7051.jpg iPhone/2014/08/Photo-05-08-2014-03-54-09-7056.jpg
2 * 110592 iPhone/2020/02/Photo-10-02-2020-06-12-40-8979.jpg iPhone/2020/02/Photo-10-02-2020-07-38-06-8981.jpg
2 * 110592 iPhone/2014/08/Photo-09-08-2014-12-29-52-7143.jpg iPhone/2014/09/Photo-04-09-2014-10-28-17-7449.jpg
2 * 110592 iPhone/2014/05/Photo-21-05-2014-06-19-27-5844.jpg iPhone/2014/05/Photo-24-05-2014-03-55-02-5891.jpg
2 * 102400 iPhone/2014/04/Photo-04-04-2014-07-40-58-4998.jpg iPhone/2014/04/Photo-05-04-2014-09-35-52-5031.jpg
2 * 98304 iPhone/2014/03/Photo-29-03-2014-09-35-58-4868.jpg iPhone/2014/03/Photo-29-03-2014-09-42-29-4870.jpg
2 * 94208 iPhone/2014/05/Photo-02-05-2014-01-01-36-5396.jpg iPhone/2014/05/Photo-02-05-2014-01-02-44-5398.jpg
2 * 90112 iPhone/2014/04/Photo-01-04-2014-02-18-24-4946.jpg iPhone/2014/04/Photo-01-04-2014-03-25-45-4955.jpg
2 * 86016 iPhone/2014/02/Photo-04-02-2014-02-09-43-4341.jpg iPhone/2014/02/Photo-08-02-2014-01-13-46-4364.jpg
2 * 81920 iPhone/2014/05/Photo-02-05-2014-01-01-37-5397.jpg iPhone/2014/05/Photo-02-05-2014-01-02-45-5399.jpg
2 * 65536 iPhone/2015/03/Photo-03-03-2015 05-08-17-9238.jpg iPhone/2015/03/Photo-24-03-2015 11-16-35-9597.jpg
22 dupes found.
So they all find the same dupes, but how do they perform?
Performance
find
real 1m15.949s
user 0m20.144s
sys 0m4.599s
fdupes
real 0m0.759s
user 0m0.181s
sys 0m0.324s
fslint
real 0m0.220s
user 0m0.148s
sys 0m0.049s
fslint
is the winner there, with fdupes
a close second, and find
miles behind.
I compared the results with the Mac apps:
Duplicate Detective 2
Also found 22 dupes, in about 6 seconds.
Duplicates Finder
Also found 22 dupes, but took 52 seconds.
Summary
PhotoSync works better than I thought and while the Mac apps give you a nice preview of the duplicates, the command line tools are much faster.