使用 python/linux 方式比较两个图像-IT科技

摘要：问题描述：尝试解决防止重复图像上传的问题。我有两张 JPG。查看它们后，我发现它们实际上是相同的。但由于某种原因，它们的文件大小不同（一张是从备份中提取的，另一张是另一次上传的），因此它们的 md5 校验和也不同。我怎样才能有效而自信地比较两幅图像，使之像人类一样能够看出它们完全相同？例如：http://st...

问题描述：

尝试解决防止重复图像上传的问题。

我有两张 JPG。查看它们后，我发现它们实际上是相同的。但由于某种原因，它们的文件大小不同（一张是从备份中提取的，另一张是另一次上传的），因此它们的 md5 校验和也不同。

我怎样才能有效而自信地比较两幅图像，使之像人类一样能够看出它们完全相同？

例如：http://static.peterbe.com/a.jpg和http://static.peterbe.com/b.jpg

更新

我写了这个脚本：

import math, operator
from PIL import Image
def compare(file1, file2):
    image1 = Image.open(file1)
    image2 = Image.open(file2)
    h1 = image1.histogram()
    h2 = image2.histogram()
    rms = math.sqrt(reduce(operator.add,
                           map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
    return rms

if __name__==&#039;__main__&#039;:
    import sys
    file1, file2 = sys.argv[1:]
    print compare(file1, file2)

然后我下载了两个视觉上相同的图像并运行脚本。输出：

58.9830484122

谁能告诉我合适的截止值应该是多少？

更新二

a.jpg 和 b.jpg 的区别在于第二个已经用 PIL 保存了：

b=Image.open(&#039;a.jpg&#039;)
b.save(open(&#039;b.jpg&#039;,&#039;wb&#039;))

这显然应用了一些非常轻微的质量修改。现在，我通过将相同的 PIL 保存应用于正在上传的文件而不对其进行任何操作来解决了我的问题，现在它可以正常工作了！

解决方案 1：

有一个 OSS 项目使用 WebDriver 截屏，然后比较图像以查看是否存在任何问题（http://code.google.com/p/fighting-layout-bugs/））。它通过将文件打开为流然后比较每个位来实现。

您也许能够使用PIL做类似的事情。

编辑：

经过进一步研究我发现

h1 = Image.open(&quot;image1&quot;).histogram()
h2 = Image.open(&quot;image2&quot;).histogram()

rms = math.sqrt(reduce(operator.add,
    map(lambda a,b: (a-b)**2, h1, h2))/len(h1))

请访问http://snipplr.com/view/757/compare-two-pil-images-in-python/和http://effbot.org/zone/pil-comparing-images.htm

解决方案 2：

从这里

判断两幅图像是否具有完全相同内容的最快方法是获取两幅图像之间的差异，然后计算该图像中非零区域的边界框。
如果图像相同，则差异图像中的所有像素都为零，并且边界框函数返回 None。

from PIL import ImageChops


def equal(im1, im2):
    return ImageChops.difference(im1, im2).getbbox() is None

解决方案 3：

我想你应该解码图像并逐像素比较以查看它们是否合理相似。

使用 PIL 和 Numpy 你可以很容易地做到这一点：

import Image
import numpy
import sys

def main():
    img1 = Image.open(sys.argv[1])
    img2 = Image.open(sys.argv[2])

    if img1.size != img2.size or img1.getbands() != img2.getbands():
        return -1

    s = 0
    for band_index, band in enumerate(img1.getbands()):
        m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
        m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
        s += numpy.sum(numpy.abs(m1-m2))
    print s

if __name__ == &quot;__main__&quot;:
    sys.exit(main())

如果图像非常相似，这将给出一个非常接近 0 的数值。

请注意，移动/旋转的图像将被报告为非常不同，因为像素不会逐一匹配。

解决方案 4：

使用 ImageMagick，您可以简单地在 shell 中使用 [或者在程序内部通过 OS 库调用]

compare image1 image2 output

这将创建一个标记差异的输出图像

compare -metric AE -fuzz 5% image1 image2 output

将为您提供 5% 的模糊度系数，以忽略微小的像素差异。更多信息可从此处获取

解决方案 5：

了解是什么使得图像中的某些特征比其他特征更重要，这是一个完整的科学问题。根据您想要的解决方案，我建议一些替代方案：

如果你的问题是查看 JPEG 中是否存在位翻转，那么尝试对差异图像进行成像（本地可能存在微小编辑？），
要查看图像是否整体相同，请使用 Kullback Leibler 距离比较直方图，
在应用其他答案之前，请查看是否有一些质的变化，请使用以下功能过滤图像以提高高级频率的重要性：

代码：

def FTfilter(image,FTfilter):
    from scipy.fftpack import fft2, fftshift, ifft2, ifftshift
    from scipy import real
    FTimage = fftshift(fft2(image)) * FTfilter
    return real(ifft2(ifftshift(FTimage)))
    #return real(ifft2(fft2(image)* FTfilter))


#### whitening
def olshausen_whitening_filt(size, f_0 = .78, alpha = 4., N = 0.01):
    &quot;&quot;&quot;
    Returns the whitening filter used by (Olshausen, 98)

    f_0 = 200 / 512

    /! you will have some problems at dewhitening without a low-pass

    &quot;&quot;&quot;
    from scipy import mgrid, absolute
    fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
    rho = numpy.sqrt(fx**2+fy**2)
    K_ols = (N**2 + rho**2)**.5 * low_pass(size, f_0 = f_0, alpha = alpha)
    K_ols /= numpy.max(K_ols)

    return  K_ols

def low_pass(size, f_0, alpha):
    &quot;&quot;&quot;
    Returns the low_pass filter used by (Olshausen, 98)

    parameters from Atick (p.240)
    f_0 = 22 c/deg in primates: the full image is approx 45 deg
    alpha makes the aspect change (1=diamond on the vert and hor, 2 = anisotropic)

    &quot;&quot;&quot;

    from scipy import mgrid, absolute
    fx, fy = mgrid[-1:1:1j*size[0],-1:1:1j*size[1]]
    rho = numpy.sqrt(fx**2+fy**2)
    low_pass = numpy.exp(-(rho/f_0)**alpha)

    return  low_pass

（无耻抄袭自http://www.incm.cnrs-mrs.fr/LaurentPerrinet/Publications/Perrinet08spie）

解决方案 6：

仅使用 PIL 和一些 Python 数学库，就可以简单、简洁地查看两幅图像是否相同。此方法仅在具有相同尺寸和扩展名的图像文件上进行了测试，但避免了此问题其他答案中出现的几个错误。

import math, operator
from PIL import Image
from PIL import ImageChops

def images_are_similar(img1, img2, error=90):
    diff = ImageChops.difference(img1, img2).histogram()
    sq = (value * (i % 256) ** 2 for i, value in enumerate(diff))
    sum_squares = sum(sq)
    rms = math.sqrt(sum_squares / float(img1.size[0] * img1.size[1]))

    # Error is an arbitrary value, based on values when 
    # comparing 2 rotated images &amp; 2 different images.
    return rms &lt; error

优点：在平方计算中

增加每个颜色的权重。许多以前的答案的 RMS 公式给出的蓝色像素值是红色值权重的 3 倍，绿色像素值是红色值权重的 2 倍。% 256

更容易理解。虽然 RMS 计算可以用一行代码编写，但使用 lambda 和 Reduce 方法，将其扩展为 3 行可以大大提高一目了然的可读性。

此代码正确检测到旋转后的图像与不同方向的基础图像不同。这避免了使用直方图比较图像时的一个陷阱，正如@musicinmybrain 指出的那样。如果创建了 2 幅图像的直方图，然后将它们相互比较，如果一幅图像是另一幅图像的旋转，则比较将报告图像中没有差异，因为图像的直方图是相同的。另一方面，如果先比较图像，然后创建比较结果的直方图，则图像将准确比较，即使一幅图像是另一幅图像的旋转。

该答案中使用的代码是从code.activestate.com帖子复制/粘贴的，考虑到了第 3 条评论，它纠正了绿色和蓝色像素值的较大权重。

解决方案 7：

首先，我应该指出它们并不相同；b 已被重新压缩并且质量下降。如果您在好的显示器上仔细观察，就会发现这一点。

要确定它们在主观上是否“相同”，您必须执行 Fortran 建议的操作，尽管您必须任意设置“相同”的阈值。为了使 s 独立于图像大小，并更合理地处理通道，我会考虑在两个图像的像素之间计算颜色空间中的 RMS（均方根）欧几里得距离。我现在没有时间写出代码，但基本上对于每个像素，您计算

(R_2 - R_1) ** 2 + (G_2 - G_1) ** 2 + (B_2 - B_1) ** 2

，添加一个

（A_2-A_1）** 2

如果图像有 alpha 通道，则使用 alpha 项，等等。结果是两个图像之间的颜色空间距离的平方。找到所有像素的平均值，然后取结果标量的平方根。然后为该值确定一个合理的阈值。

或者，您可能只是认为具有不同有损压缩的同一原始图像的副本并不是真正“相同”，并坚持使用文件哈希。

解决方案 8：

我测试了这个，它是所有方法中效果最好的，而且速度极快！

def rmsdiff_1997(im1, im2):
    &quot;Calculate the root-mean-square difference between two images&quot;

    h = ImageChops.difference(im1, im2).histogram()

    # calculate rms
    return math.sqrt(reduce(operator.add,
        map(lambda h, i: h*(i**2), h, range(256))
    ) / (float(im1.size[0]) * im1.size[1]))

此处链接供参考

解决方案 9：

您可以使用PIL进行比较（遍历图片的像素/片段并进行比较），或者如果您正在寻找完全相同的副本比较，请尝试比较两个文件的 MD5 哈希值。

解决方案 10：

我尝试了上面提到的 3 种方法以及其他地方的方法。似乎有两种主要的图像比较类型：逐像素和直方图。

我已经尝试了两者，但像素一确实 100% 失败了，正如它实际上应该的那样，因为如果我们将第二张图像移动 1 个像素，所有像素都不会匹配，并且我们将 100% 不匹配。

但直方图比较在理论上应该效果很好，但事实并非如此。

这里有两张视图稍微偏移的图像，直方图看起来有 99% 相似，但算法产生的结果是“非常不同”

居中

相同，但偏移约 15º

4种不同的算法结果：

完美匹配：错误
像素差异：115816402
直方图比较：83.69564286668303
历史比较：1744.8160719686186

将第一幅图像（居中 QR）与 100% 不同的图像进行相同的比较：

完全不同的图像和直方图

算法结果：

完美匹配：错误
像素差异：207893096
直方图比较：104.30194643642095
历史比较：6875.766716148522

任何关于如何以更精确和更实用的方式测量两幅图像差异的建议都将不胜感激。目前，这些算法似乎都无法产生可用的结果，因为略有不同的图像与 100% 不同的图像的结果非常相似/接近。

from PIL import Image
    from PIL import ImageChops
    from functools import reduce
    import numpy
    import sys
    import math
    import operator

# Just checking if images are 100% the same


def equal(im1, im2):
    img1 = Image.open(im1)
    img2 = Image.open(im2)
    return ImageChops.difference(img1, img2).getbbox() is None


def histCompare(im1, im2):
    h1 = Image.open(im1).histogram()
    h2 = Image.open(im2).histogram()

    rms = math.sqrt(reduce(operator.add, map(lambda a, b: (a - b)**2, h1, h2)) / len(h1))
    return rms

# To get a measure of how similar two images are, we calculate the root-mean-square (RMS)
# value of the difference between the images. If the images are exactly identical,
# this value is zero. The following function uses the difference function,
# and then calculates the RMS value from the histogram of the resulting image.


def rmsdiff_1997(im1, im2):
    #&quot;Calculate the root-mean-square difference between two images&quot;
    img1 = Image.open(im1)
    img2 = Image.open(im2)

    h = ImageChops.difference(img1, img2).histogram()

    # calculate rms
    return math.sqrt(reduce(operator.add,
                            map(lambda h, i: h * (i**2), h, range(256))
                            ) / (float(img1.size[0]) * img1.size[1]))

# Pixel by pixel comparison to see if images are reasonably similar.


def countDiff(im1, im2):
    s = 0
    img1 = Image.open(im1)
    img2 = Image.open(im2)

    if img1.size != img2.size or img1.getbands() != img2.getbands():
        return -1

    for band_index, band in enumerate(img1.getbands()):
        m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
        m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
        s += numpy.sum(numpy.abs(m1 - m2))

    return s


print(&quot;[Same Image]&quot;)
print(&quot;Perfect match:&quot;, equal(&quot;data/start.jpg&quot;, &quot;data/start.jpg&quot;))
print(&quot;Pixel difference:&quot;, countDiff(&quot;data/start.jpg&quot;, &quot;data/start.jpg&quot;))
print(&quot;Histogram Comparison:&quot;, rmsdiff_1997(&quot;data/start.jpg&quot;, &quot;data/start.jpg&quot;))
print(&quot;HistComparison:&quot;, histCompare(&quot;data/start.jpg&quot;, &quot;data/start.jpg&quot;))

print(&quot;
[Same Position]&quot;)
print(&quot;Perfect match:&quot;, equal(&quot;data/start.jpg&quot;, &quot;data/end.jpg&quot;))
print(&quot;Pixel difference:&quot;, countDiff(&quot;data/start.jpg&quot;, &quot;data/end.jpg&quot;))
print(&quot;Histogram Comparison:&quot;, rmsdiff_1997(&quot;data/start.jpg&quot;, &quot;data/end.jpg&quot;))
print(&quot;HistComparison:&quot;, histCompare(&quot;data/start.jpg&quot;, &quot;data/end.jpg&quot;))

print(&quot;
[~5º off]&quot;)
print(&quot;Perfect match:&quot;, equal(&quot;data/start.jpg&quot;, &quot;data/end2.jpg&quot;))
print(&quot;Pixel difference:&quot;, countDiff(&quot;data/start.jpg&quot;, &quot;data/end2.jpg&quot;))
print(&quot;Histogram Comparison:&quot;, rmsdiff_1997(&quot;data/start.jpg&quot;, &quot;data/end2.jpg&quot;))
print(&quot;HistComparison:&quot;, histCompare(&quot;data/start.jpg&quot;, &quot;data/end2.jpg&quot;))

print(&quot;
[~15º off]&quot;)
print(&quot;Perfect match:&quot;, equal(&quot;data/start.jpg&quot;, &quot;data/end3.jpg&quot;))
print(&quot;Pixel difference:&quot;, countDiff(&quot;data/start.jpg&quot;, &quot;data/end3.jpg&quot;))
print(&quot;Histogram Comparison:&quot;, rmsdiff_1997(&quot;data/start.jpg&quot;, &quot;data/end3.jpg&quot;))
print(&quot;HistComparison:&quot;, histCompare(&quot;data/start.jpg&quot;, &quot;data/end3.jpg&quot;))

print(&quot;
[100% different]&quot;)
print(&quot;Perfect match:&quot;, equal(&quot;data/start.jpg&quot;, &quot;data/end4.jpg&quot;))
print(&quot;Pixel difference:&quot;, countDiff(&quot;data/start.jpg&quot;, &quot;data/end4.jpg&quot;))
print(&quot;Histogram Comparison:&quot;, rmsdiff_1997(&quot;data/start.jpg&quot;, &quot;data/end4.jpg&quot;))
print(&quot;HistComparison:&quot;, histCompare(&quot;data/start.jpg&quot;, &quot;data/end4.jpg&quot;))

解决方案 11：

一个简单的解决方案numpy

img1 = Image.open(img1_path)
img2 = Image.open(img2_path)

然后使用array_equal

np.array_equal(img1, img2)

如果所有渠道的回报True都完全相同

解决方案 12：

import cv2
import numpy as np

original = cv2.imread(&quot;Test.jpg&quot;)
duplicate = cv2.imread(&quot;Test1.jpg&quot;)

# 1) Check if 2 images are equals
if original.shape == duplicate.shape:
    print(&quot;The images have same size and channels&quot;)
    difference = cv2.subtract(original, duplicate)
    b, g, r = cv2.split(difference)


    if cv2.countNonZero(b) == 0 and cv2.countNonZero(g) == 0 and cv2.countNonZero(r) == 0:
        print(&quot;The images are completely Equal&quot;)
        
cv2.imshow(&quot;Original&quot;, original)
cv2.imshow(&quot;Duplicate&quot;, duplicate)
cv2.waitKey(0)
cv2.destroyAllWindows()

解决方案 13：

如果您想检查两幅图像是否相同，此代码可能会对您有所帮助。

 import binascii
 
 
 with open(&#039;pic1.png&#039;, &#039;rb&#039;) as f:
     content1 = f.read()
 with open(&#039;pic2.png&#039;, &#039;rb&#039;) as f:
     content2 = f.read()
 if content1 == content2:
     print(&quot;same&quot;)
 else:
     print(&quot;not same&quot;)