We closed this forum 18 June 2010. It has served us well since 2005 as the ALPHA forum did before it from 2002 to 2005. New discussions are ongoing at the new URL http://forum.processing.org. You'll need to sign up and get a new user account. We're sorry about that inconvenience, but we think it's better in the long run. The content on this forum will remain online.
IndexProgramming Questions & HelpSyntax Questions › detect duplicate images
Page Index Toggle Pages: 1
detect duplicate images (Read 643 times)
detect duplicate images
Apr 16th, 2009, 1:44am
 
hi,

I have a lot of images like 500 or so, a bunch of them are the same and I have to know which ones. They almost look all the same so it's quite hard to tell.

I was hoping if someone could make a script for me that produces a list and tells which images are duplicates.
Re: detect duplicate images
Reply #1 - Apr 16th, 2009, 4:15am
 
Trying to get a script for free? Wink
I had this problem some years ago, when I was a beginner at Java. I wrote a little program computing the MD5 signature of images (of files, actually) and eliminated files with same signature.
It works only if the files are strictly identical, not if they are "similar", which is a much harder problem.
Recently I had a similar issue, I just searched for a freeware duplicate detector (there are plenty for Windows, not sure for other systems) and ran it. It did a good job... Fast and perhaps more reliable, with a better interface, than by doing it with an home program... (although the latter is interesting to write).
Re: detect duplicate images
Reply #2 - Apr 16th, 2009, 4:30am
 
It can't be that hard to make right?

Most pixtures are different, compare each pixel of img1 and 2 in a loop and as soon as a pixel is different end the loop for those images.

p.s. How can you end a loop if a certain condition is reached?
Re: detect duplicate images
Reply #3 - Apr 16th, 2009, 8:23am
 
clankill3r wrote on Apr 16th, 2009, 4:30am:
How can you end a loop if a certain condition is reached

break
Re: detect duplicate images
Reply #4 - Apr 16th, 2009, 9:53pm
 
There are some advantages and several inconveniences to your pixel comparison.
Pros: you compare image data, ignoring metadata; you can even compare PNG to Gif; you can add a tolerance for small color changes.
Cons: it is slow (you have to interpret the data, to load totally in memory for each comparison); it is more memory intensive.

With both methods, you have to read the whole files to do the comparison. Although in your method you can do a pre-check on file size.
With the signature method, you compute one signature per file, loading each file once and keeping the computed signature in memory. Then for each file, to compare it to all the other files, you just compare a (big) number. Or let Java do that for you, using a hash with the signature as key.
With your method, for each file, you have to read all the others files and do the comparison, so you will load each file several times!

Ah, I found back the software I used recently: http://www.EasyDuplicateFinder.com
Re: detect duplicate images
Reply #5 - Apr 18th, 2009, 2:50am
 
I have a mac...

I will search some myself later or maybe I try to code something.
I will see.
Page Index Toggle Pages: 1