如何比较两个具有相同元素但不同顺序的 JSON 对象是否相等？-IT科技

摘要：问题描述：如何在 Python 中测试两个 JSON 对象是否相等，而不管列表的顺序？例如 ...JSON 文档a：{ "errors": [ {"error": "invalid", "field": &...

问题描述：

如何在 Python 中测试两个 JSON 对象是否相等，而不管列表的顺序？

例如 ...

JSON 文档a：

{
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": false
}

JSON 文档b：

{
    "success": false,
    "errors": [
        {"error": "required", "field": "name"},
        {"error": "invalid", "field": "email"}
    ]
}

a并且b应该比较相等，即使"errors"列表的顺序不同。

解决方案 1：

如果您想要两个具有相同元素但顺序不同的对象进行比较，那么显而易见的做法就是比较它们的排序副本 - 例如，对于由 JSON 字符串a和表示的字典b：

import json

a = json.loads("""
{
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": false
}
""")

b = json.loads("""
{
    "success": false,
    "errors": [
        {"error": "required", "field": "name"},
        {"error": "invalid", "field": "email"}
    ]
}
""")

>>> sorted(a.items()) == sorted(b.items())
False

...但这是行不通的，因为在每种情况下，"errors"顶级字典的项目都是以不同顺序包含相同元素的列表，并且sorted()不会尝试对可迭代对象的“顶级”之外的任何内容进行排序。

为了解决这个问题，我们可以定义一个ordered函数，它将以递归方式对找到的任何列表进行排序（并将字典转换为(key, value)成对的列表，以便它们可排序）：

def ordered(obj):
    if isinstance(obj, dict):
        return sorted((k, ordered(v)) for k, v in obj.items())
    if isinstance(obj, list):
        return sorted(ordered(x) for x in obj)
    else:
        return obj

如果我们将此函数应用于a和b，结果将相等：

>>> ordered(a) == ordered(b)
True

解决方案 2：

另一种方法是使用json.dumps(X, sort_keys=True)选项：

import json
a, b = json.dumps(a, sort_keys=True), json.dumps(b, sort_keys=True)
a == b # a normal string comparison

这适用于嵌套字典和列表。

解决方案 3：

对其进行解码并将它们与 mgilson 注释进行比较。

对于字典来说，只要键和值匹配，顺序就无关紧要。（Python 中的字典没有顺序）

>>> {'a': 1, 'b': 2} == {'b': 2, 'a': 1}
True

但是列表中的顺序很重要；排序将解决列表的问题。

>>> [1, 2] == [2, 1]
False
>>> [1, 2] == sorted([2, 1])
True

>>> a = '{"errors": [{"error": "invalid", "field": "email"}, {"error": "required", "field": "name"}], "success": false}'
>>> b = '{"errors": [{"error": "required", "field": "name"}, {"error": "invalid", "field": "email"}], "success": false}'
>>> a, b = json.loads(a), json.loads(b)
>>> a['errors'].sort()
>>> b['errors'].sort()
>>> a == b
True

以上示例适用于问题中的 JSON。有关一般解决方案，请参阅 Zero Piraeus 的答案。

解决方案 4：

更新：请参阅https://eggachecat.github.io/jycm-json-diff-viewer/了解现场演示！现在它有一个 JS 原生实现。

隶属关系：我是这个库的作者。

是的！您可以使用jycm

from jycm.helper import make_ignore_order_func
from jycm.jycm import YouchamaJsonDiffer

a = {
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": False
}
b = {
    "success": False,
    "errors": [
        {"error": "required", "field": "name"},
        {"error": "invalid", "field": "email"}
    ]
}
ycm = YouchamaJsonDiffer(a, b, ignore_order_func=make_ignore_order_func([
    "^errors",
]))
ycm.diff()
assert ycm.to_dict(no_pairs=True) == {} # aka no diff

对于更复杂的例子（深层结构中的价值变化）

from jycm.helper import make_ignore_order_func
from jycm.jycm import YouchamaJsonDiffer

a = {
    "errors": [
        {"error": "invalid", "field": "email"},
        {"error": "required", "field": "name"}
    ],
    "success": True
}

b = {
    "success": False,
    "errors": [
        {"error": "required", "field": "name-1"},
        {"error": "invalid", "field": "email"}
    ]
}
ycm = YouchamaJsonDiffer(a, b, ignore_order_func=make_ignore_order_func([
    "^errors",
]))
ycm.diff()
assert ycm.to_dict() == {
    'just4vis:pairs': [
        {'left': 'invalid', 'right': 'invalid', 'left_path': 'errors->[0]->error', 'right_path': 'errors->[1]->error'},
        {'left': {'error': 'invalid', 'field': 'email'}, 'right': {'error': 'invalid', 'field': 'email'},
         'left_path': 'errors->[0]', 'right_path': 'errors->[1]'},
        {'left': 'email', 'right': 'email', 'left_path': 'errors->[0]->field', 'right_path': 'errors->[1]->field'},
        {'left': {'error': 'invalid', 'field': 'email'}, 'right': {'error': 'invalid', 'field': 'email'},
         'left_path': 'errors->[0]', 'right_path': 'errors->[1]'},
        {'left': 'required', 'right': 'required', 'left_path': 'errors->[1]->error',
         'right_path': 'errors->[0]->error'},
        {'left': {'error': 'required', 'field': 'name'}, 'right': {'error': 'required', 'field': 'name-1'},
         'left_path': 'errors->[1]', 'right_path': 'errors->[0]'},
        {'left': 'name', 'right': 'name-1', 'left_path': 'errors->[1]->field', 'right_path': 'errors->[0]->field'},
        {'left': {'error': 'required', 'field': 'name'}, 'right': {'error': 'required', 'field': 'name-1'},
         'left_path': 'errors->[1]', 'right_path': 'errors->[0]'},
        {'left': {'error': 'required', 'field': 'name'}, 'right': {'error': 'required', 'field': 'name-1'},
         'left_path': 'errors->[1]', 'right_path': 'errors->[0]'}
    ],
    'value_changes': [
        {'left': 'name', 'right': 'name-1', 'left_path': 'errors->[1]->field', 'right_path': 'errors->[0]->field',
         'old': 'name', 'new': 'name-1'},
        {'left': True, 'right': False, 'left_path': 'success', 'right_path': 'success', 'old': True, 'new': False}
    ]
}

其结果可以表示为
在此处输入图片描述

解决方案 5：

你可以编写自己的 equals 函数：

如果满足以下条件，则字典相等：1）所有键都相等；2）所有值都相等
如果所有项目都相等且顺序相同，则列表相等
如果a == b

因为您正在处理 json，所以您将拥有标准的 python 类型：dict，list等等，因此您可以进行硬类型检查if type(obj) == 'dict':等。

粗略示例（未经测试）：

def json_equals(jsonA, jsonB):
    if type(jsonA) != type(jsonB):
        # not equal
        return False
    if type(jsonA) == dict:
        if len(jsonA) != len(jsonB):
            return False
        for keyA in jsonA:
            if keyA not in jsonB or not json_equal(jsonA[keyA], jsonB[keyA]):
                return False
    elif type(jsonA) == list:
        if len(jsonA) != len(jsonB):
            return False
        for itemA, itemB in zip(jsonA, jsonB):
            if not json_equal(itemA, itemB):
                return False
    else:
        return jsonA == jsonB

解决方案 6：

对于以下两个字典“dictWithListsInValue”和“reorderedDictWithReorderedListsInValue”，它们只是彼此的重新排序版本

dictObj = {"foo": "bar", "john": "doe"}
reorderedDictObj = {"john": "doe", "foo": "bar"}
dictObj2 = {"abc": "def"}
dictWithListsInValue = {'A': [{'X': [dictObj2, dictObj]}, {'Y': 2}], 'B': dictObj2}
reorderedDictWithReorderedListsInValue = {'B': dictObj2, 'A': [{'Y': 2}, {'X': [reorderedDictObj, dictObj2]}]}
a = {"L": "M", "N": dictWithListsInValue}
b = {"L": "M", "N": reorderedDictWithReorderedListsInValue}

print(sorted(a.items()) == sorted(b.items()))  # gives false

给了我错误的结果，即 false 。

所以我创建了自己的 customstom ObjectComparator，如下所示：

def my_list_cmp(list1, list2):
    if (list1.__len__() != list2.__len__()):
        return False

    for l in list1:
        found = False
        for m in list2:
            res = my_obj_cmp(l, m)
            if (res):
                found = True
                break

        if (not found):
            return False

    return True


def my_obj_cmp(obj1, obj2):
    if isinstance(obj1, list):
        if (not isinstance(obj2, list)):
            return False
        return my_list_cmp(obj1, obj2)
    elif (isinstance(obj1, dict)):
        if (not isinstance(obj2, dict)):
            return False
        exp = set(obj2.keys()) == set(obj1.keys())
        if (not exp):
            # print(obj1.keys(), obj2.keys())
            return False
        for k in obj1.keys():
            val1 = obj1.get(k)
            val2 = obj2.get(k)
            if isinstance(val1, list):
                if (not my_list_cmp(val1, val2)):
                    return False
            elif isinstance(val1, dict):
                if (not my_obj_cmp(val1, val2)):
                    return False
            else:
                if val2 != val1:
                    return False
    else:
        return obj1 == obj2

    return True


dictObj = {"foo": "bar", "john": "doe"}
reorderedDictObj = {"john": "doe", "foo": "bar"}
dictObj2 = {"abc": "def"}
dictWithListsInValue = {'A': [{'X': [dictObj2, dictObj]}, {'Y': 2}], 'B': dictObj2}
reorderedDictWithReorderedListsInValue = {'B': dictObj2, 'A': [{'Y': 2}, {'X': [reorderedDictObj, dictObj2]}]}
a = {"L": "M", "N": dictWithListsInValue}
b = {"L": "M", "N": reorderedDictWithReorderedListsInValue}

print(my_obj_cmp(a, b))  # gives true

这给了我正确的预期输出！

逻辑很简单：

如果对象类型为“列表”，则将第一个列表的每个项目与第二个列表的项目进行比较，直到找到为止，如果在浏览第二个列表后未找到该项目，则“found”将为 false。返回“found”值

否则，如果要比较的对象是“dict”类型，则比较两个对象中所有相应键的值。（执行递归比较）

否则只需调用 obj1 == obj2 。默认情况下，它适用于字符串和数字的对象，并且eq () 已适当定义。

（请注意，可以通过删除在对象 2 中找到的项目来进一步改进算法，以便对象 1 的下一个项目不会将其自身与对象 2 中已经找到的项目进行比较）

解决方案 7：

对于想要调试两个 JSON 对象（通常有一个引用和一个目标）的其他人，您可以使用以下解决方案。它将列出从目标到引用的不同/不匹配的“路径”。

level选项用于选择您想要查看的深度。

show_variables可以打开选项来显示相关变量。

def compareJson(example_json, target_json, level=-1, show_variables=False):
  _different_variables = _parseJSON(example_json, target_json, level=level, show_variables=show_variables)
  return len(_different_variables) == 0, _different_variables

def _parseJSON(reference, target, path=[], level=-1, show_variables=False):  
  if level > 0 and len(path) == level:
    return []
  
  _different_variables = list()
  # the case that the inputs is a dict (i.e. json dict)  
  if isinstance(reference, dict):
    for _key in reference:      
      _path = path+[_key]
      try:
        _different_variables += _parseJSON(reference[_key], target[_key], _path, level, show_variables)
      except KeyError:
        _record = ''.join(['[%s]'%str(p) for p in _path])
        if show_variables:
          _record += ': %s <--> MISSING!!'%str(reference[_key])
        _different_variables.append(_record)
  # the case that the inputs is a list/tuple
  elif isinstance(reference, list) or isinstance(reference, tuple):
    for index, v in enumerate(reference):
      _path = path+[index]
      try:
        _target_v = target[index]
        _different_variables += _parseJSON(v, _target_v, _path, level, show_variables)
      except IndexError:
        _record = ''.join(['[%s]'%str(p) for p in _path])
        if show_variables:
          _record += ': %s <--> MISSING!!'%str(v)
        _different_variables.append(_record)
  # the actual comparison about the value, if they are not the same, record it
  elif reference != target:
    _record = ''.join(['[%s]'%str(p) for p in path])
    if show_variables:
      _record += ': %s <--> %s'%(str(reference), str(target))
    _different_variables.append(_record)

  return _different_variables

解决方案 8：

import json

#API response sample
# some JSON:

x = '{ "name":"John", "age":30, "city":"New York"}'

# parse x json to Python dictionary:
y = json.loads(x)

#access Python dictionary
print(y["age"])


# expected json as dictionary
thisdict = { "name":"John", "age":30, "city":"New York"}
print(thisdict)


# access Python dictionary
print(thisdict["age"])

# Compare Two access Python dictionary

if thisdict == y:
    print ("dict1 is equal to dict2")
else:
    print ("dict1 is not equal to dict2")

解决方案 9：

我意识到这不是问题所在，但我来到这个页面寻找一个可以向我显示两个 json 文件之间的差异的程序。我在这里没有找到它，所以我不得不自己写一个。它并不完美，它会创建两个文件而不是 1 个 - 一个用于第一个文件中在第二个文件中缺失或不同的项目，另一个用于第二个文件中在第一个文件中缺失或不同的项目。所以事不宜迟：

import sys
import json


def get_different_items(object1, object2):
    result = {}
    for key in object1.keys():
        if not key in object2:
            result[key] = object1.get(key)
        else:
            value1 = object1.get(key)
            value2 = object2.get(key)
            if type(value1) == dict and type(value2) == dict:
                inner_diff = get_different_items(value1, value2)
                if inner_diff is not None:
                    result[key] = inner_diff
            elif type(value1) == list and type(value2) == list:
                array_diff = get_difference_in_array(value1, value2)
                if array_diff is not None:
                    result[key] = array_diff
            elif value1 != value2:
                result[key] = value1
    if len(result) == 0:
        return None
    else:
        return result


def get_difference_in_array(array1, array2):
    result = []
    for i in range(0, len(array1)):
        value1 = array1[i]
        value2 = array2[i]
        if type(value1) == dict and type(value2) == dict:
            inner_diff = get_different_items(value1, value2)
            if inner_diff is not None:
                result.append(inner_diff)
        elif value1 != value2:
            result.append(value1)
    if len(result) == 0:
        return None
    else:
        return result


if __name__ == '__main__':
    if len(sys.argv) != 4:
        print(f"Usage: {sys.argv[0]} <first-file> <second-file> <output-file-prefix>")
        print("    first-file and second-file are something like thing1 and thing2. The files read will be thing1.json and thing2.json")
        print("    output-file-prefix is something like 'res'. Two files will be created - res1.json and res2.json")
        print("    res1.json will list the items in thing1 which are not in thing2 or are different to the ones in thing2.")
        print("    res2.json will list the items in thing2 which are not in thing1 or are different to the ones in thing1.")
        exit(0)

    with open(sys.argv[1] + ".json") as f:
        object1 = json.load(f)
    with open(sys.argv[2] + ".json") as f:
        object2 = json.load(f)

    result = get_different_items(object1, object2)
    if result is not None:
        with open(sys.argv[3] + "1.json", 'w') as output:
            json.dump(result, output, indent=4, sort_keys=True)
        result = get_different_items(object2, object1)
        with open(sys.argv[3] + "2.json", 'w') as output:
            json.dump(result, output, indent=4, sort_keys=True)
    else:
        print("Files are identical")

解决方案 10：

我编写了一个算法来对包含嵌套元素（包括列表和字典）的 Python 字典进行排序。这对于比较两个实际上相同但嵌套列表元素顺序不同的 JSON 很有用。

JSON_VALUE = list | dict | str | int | float


def rec_sort(dict_or_list_or_value: JSON_VALUE) -> JSON_VALUE:
    if type(dict_or_list_or_value) is dict:
        ret_dict = {}
        for k in sorted(dict_or_list_or_value.keys()):
            ret_dict[k] = rec_sort(dict_or_list_or_value[k])
        return ret_dict
    elif type(dict_or_list_or_value) is list:
        ret_list = []
        for x in dict_or_list_or_value:
            if type(x) is dict:
                ret_list.append(json.dumps(rec_sort(x), sort_keys=True))
            elif type(x) is list:
                ret_list.append(json.dumps(rec_sort(x), sort_keys=True))
            else:
                ret_list.append(json.dumps(x, sort_keys=True))

        ret_list = sorted(ret_list)
        return [json.loads(x) for x in ret_list]

    else:
        return dict_or_list_or_value

a == b # False
rec_sort(a) == rec_sort(b) # True