If you’ve ever run file recovery tools on a disk, you know that you can end up with multiple copies of recovered files. Well, I made a little script that can help reduce the number of duplicates for you to clean up.
https://github.com/eltopo1971/file-duplicate-nuker
fileDuplicateNuker takes a directory as an argument, then recursively goes through that directory and takes a hash signature from the files in it. When it encounters a file with the same hash signature, it deletes the file.
Does this take care of all the duplicates? Oh heavens no. That’s a feature, not a bug — call it erring on the side of safety. The script has no idea what kind of file it’s dealing with. All it does is take a hash signature and base the decision of whether to delete the file on that. If there is so much as one byte of difference in the file it’s examining, it’s counted as a unique file and not deleted.
That being said, from my testing it does delete a good number of files, and when you have thousands of files to wade through, any little bit helps.
