Python 中字典的深度合并

2024-12-06 08:40:00
admin
原创
80
摘要:问题描述:我需要合并多个词典,例如:dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}} dict2 = {2:{"c":{"C"}}, 3:{"d&qu...

问题描述:

我需要合并多个词典,例如:

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}

dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}

和树叶A B C一样D`{"info1":"value", "info2":"value2"}`

字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{C}}}}}

在我的例子中,它代表一个目录/文件结构,其中节点是文档,叶子是文件。

我想合并它们以获得:

 dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

我不确定如何使用 Python 轻松地做到这一点。


解决方案 1:

这实际上非常棘手 - 特别是如果您希望在出现不一致时出现有用的错误消息,同时正确接受重复但一致的条目(这里没有其他答案可以做到这一点......)

假设您没有大量的条目,递归函数是最简单的:

def merge(a: dict, b: dict, path=[]):
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] != b[key]:
                raise Exception('Conflict at ' + '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

请注意,这会发生变化a- 的内容b被添加到a(也会返回)。如果您想保留,a可以这样调用它merge(dict(a), b)

agf 指出(如下),您可能有两个以上的字典,在这种情况下您可以使用:

from functools import reduce
reduce(merge, [dict1, dict2, dict3...])

所有内容都将被添加到dict1

注意:我编辑了我的初始答案以改变第一个参数;这使得“减少”更容易解释

解决方案 2:

您可以尝试mergedeep


安装

$ pip3 install mergedeep

用法

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

有关选项的完整列表,请查看文档!

解决方案 3:

这是一个使用生成器的简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

这将打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

解决方案 4:

这个问题的一个问题是,字典的值可以是任意复杂的数据。基于这些和其他答案,我想出了以下代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s
" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

我的用例是合并 YAML 文件,其中我只需要处理可能的数据类型的子集。因此我可以忽略元组和其他对象。对我来说,合理的合并逻辑意味着

  • 替换标量

  • 附加列表

  • 通过添加缺失的键和更新现有的键来合并字典

其余一切以及不可预见的情况都会导致错误。

解决方案 5:

词典与词典的合并

因为这是一个典型问题(尽管存在某些非普遍性),所以我提供了典型的 Pythonic 方法来解决这个问题。

最简单的情况:“叶子是嵌套的字典,以空字典结尾”:

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

这是递归最简单的情况,我推荐两种简单的方法:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

我相信我更喜欢第二种,而不是第一种,但请记住,第一种的原始状态必须从其起源重建。用法如下:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

复杂情况:“叶子属于任何其他类型:”

因此,如果它们以字典结尾,那么合并末尾的空字典就是一个简单的例子。如果不是,那就没那么简单了。如果是字符串,你如何合并它们?集合可以类似地更新,所以我们可以进行这种处理,但我们会失去合并的顺序。那么顺序重要吗?

因此,如果没有更多的信息,最简单的方法就是如果两个值都不是字典,就给它们标准更新处理:即第二个字典的值将覆盖第一个字典,即使第二个字典的值是 None 而第一个字典的值是一个包含大量信息的字典。

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections.abc import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items():
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

现在

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

返回

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

应用于原始问题:

我必须删除字母周围的花括号并将它们放在单引号中以使其成为合法的 Python(否则它们将成为 Python 2.7+ 中的设置文字)并附加一个缺失的括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

现在rec_merge(dict1, dict2)返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

这与原始问题的预期结果相匹配(更改后,例如将 更改{A}'A')。

解决方案 6:

基于@andrew cooke。此版本处理嵌套的字典列表,还允许更新值的选项

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

解决方案 7:

这个简单的递归过程将把一个字典合并到另一个字典中,同时覆盖冲突的键:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

解决方案 8:

如果有人想用另一种方法解决这个问题,这里是我的解决方案。

优点:简短、声明性、功能性风格(递归、不变异)。

潜在缺点:这可能不是您想要的合并。请参阅文档字符串以了解语义。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

解决方案 9:

根据@andrew cooke 的回答。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

解决方案 10:

简短而甜蜜:

from collections.abc import MutableMapping as Map

def nested_update(d, v):
    """
    Nested update of dict-like 'd' with dict-like 'v'.
    """

    for key in v:
        if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
            nested_update(d[key], v[key])
        else:
            d[key] = v[key]

它的工作原理类似于(并基于)Python 的方法。它会在就地更新字典时dict.update返回None(如果您愿意,您可以随时添加)。输入的键将覆盖任何现有的键(它不会尝试解释字典的内容)。return d`dvd`

它也适用于其他(“类似字典”)映射。

例子:

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
nested_update(people, {'pete': {'age': 41}})

# Pete's age was merged in
print(people)
{'pete': {'gender': 'male', 'age': 41}, 'mary': {'age': 34}}

Python 的常规dict.update方法得出的结果如下:

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
people.update({'pete': {'age': 41}})

# We lost Pete's gender here!
print(people) 
{'pete': {'age': 41}, 'mary': {'age': 34}}

解决方案 11:

如果您的字典级别未知,那么我建议使用递归函数:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

解决方案 12:

概述

以下方法将字典深度合并的问题细分为:

  1. 参数化的浅合并函数merge(f)(a,b),使用一个函数f来合并两个字典ab

  2. f与以下函数一起使用的递归合并函数merge


执行

合并两个(非嵌套)字典的函数可以用很多种方式编写。我个人喜欢

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

定义适当的递归合并函数的一个好方法f是使用multipledispatch,它允许定义根据参数类型沿不同路径评估的函数。

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

例子

要合并两个嵌套的字典,只需使用merge(f)例如:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

笔记:

这种方法的优点是:

  • 该函数由多个较小的函数构建而成,每个函数只执行一项操作,这使得代码更易于推理和测试

  • 行为不是硬编码的,但可以根据需要进行更改和扩展,从而提高代码重用性(见下面的示例)。


定制

一些答案还考虑了包含列表的字典,例如其他(可能嵌套)字典。在这种情况下,可能需要映射列表并根据位置合并它们。这可以通过向合并函数添加另一个定义来完成f

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

解决方案 13:

安德鲁·库克的回答有点问题:在某些情况下,b当你修改返回的字典时,它会修改第二个参数。具体来说,这是因为这一行:

if key in a:
    ...
else:
    a[key] = b[key]

如果b[key]dict,它将被简单地分配给a,这意味着对该的任何后续修改dict都会影响ab

a={}
b={'1':{'2':'b'}}
c={'1':{'3':'c'}}
merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

为了修复这个问题,该行必须替换为:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
else:
    a[key] = b[key]

在哪里clone_dict

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
        else:
            clone[key] = value
    return

尽管如此。这显然没有考虑到listset其他东西,但我希望它能说明在尝试合并时的陷阱dicts

为了完整起见,这是我的版本,您可以传递多个dicts

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
            else:
                clone[key] = value
        return

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    pass
                else:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
            else:
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                else:
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})

解决方案 14:

我有一个迭代解决方案 - 对于大字典和许多字典(例如 json 等)效果更好:

import collections


def merge_dict_with_subdicts(dict1: dict, dict2: dict) -> dict:
    """
    similar behaviour to builtin dict.update - but knows how to handle nested dicts
    """
    q = collections.deque([(dict1, dict2)])
    while len(q) > 0:
        d1, d2 = q.pop()
        for k, v in d2.items():
            if k in d1 and isinstance(d1[k], dict) and isinstance(v, dict):
                q.append((d1[k], v))
            else:
                d1[k] = v

    return dict1

请注意,如果它们不是两个字典,这将使用 d2 中的值来覆盖 d1。(与 python 的相同dict.update()

一些测试:

def test_deep_update():
    d = dict()
    merge_dict_with_subdicts(d, {"a": 4})
    assert d == {"a": 4}

    new_dict = {
        "b": {
            "c": {
                "d": 6
            }
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 4,
        "b": {
            "c": {
                "d": 6
            }
        }
    }

    new_dict = {
        "a": 3,
        "b": {
            "f": 7
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d == {
        "a": 3,
        "b": {
            "c": {
                "d": 6
            },
            "f": 7
        }
    }

    # test a case where one of the dicts has dict as value and the other has something else
    new_dict = {
        'a': {
            'b': 4
        }
    }
    merge_dict_with_subdicts(d, new_dict)
    assert d['a']['b'] == 4

我已经测试了大约 1200 个字典 - 这种方法花费了 0.4 秒,而递归解决方案花费了约 2.5 秒。

解决方案 15:

正如许多其他答案所指出的那样,递归算法在这里最有意义。通常,在使用递归时,最好创建新值,而不是尝试修改任何输入数据结构。

我们需要定义每个合并步骤会发生什么。如果两个输入都是字典,这很容易:我们从每一侧复制唯一键,并递归合并重复键的值。这是导致问题的基本情况。如果我们为此提取一个单独的函数,将更容易理解逻辑。作为占位符,我们可以将两个值包装在一个元组中:

def merge_leaves(x, y):
    return (x, y)

现在我们的逻辑核心是这样的:

def merge(x, y):
    if not(isinstance(x, dict) and isinstance(y, dict)):
        return merge_leaves(x, y)
    x_keys, y_keys = x.keys(), y.keys()
    result = { k: merge(x[k], y[k]) for k in x_keys & y_keys }
    result.update({k: x[k] for k in x_keys - y_keys})
    result.update({k: y[k] for k in y_keys - x_keys})
    return result

我们来测试一下:

>>> x = {'a': {'b': 'c', 'd': 'e'}, 'f': 1, 'g': {'h', 'i'}, 'j': None}
>>> y = {'a': {'d': 'e', 'h': 'i'}, 'f': {'b': 'c'}, 'g': 1, 'k': None}
>>> merge(x, y)
{'f': (1, {'b': 'c'}), 'g': ({'h', 'i'}, 1), 'a': {'d': ('e', 'e'), 'b': 'c', 'h': 'i'}, 'j': None, 'k': None}
>>> x # The originals are unmodified.
{'a': {'b': 'c', 'd': 'e'}, 'f': 1, 'g': {'h', 'i'}, 'j': None}
>>> y
{'a': {'d': 'e', 'h': 'i'}, 'f': {'b': 'c'}, 'g': 1, 'k': None}

我们可以轻松修改叶子合并规则,例如:

def merge_leaves(x, y):
    try:
        return x + y
    except TypeError:
        return Ellipsis

并观察效果:

>>> merge(x, y)
{'f': Ellipsis, 'g': Ellipsis, 'a': {'d': 'ee', 'b': 'c', 'h': 'i'}, 'j': None, 'k': None}

我们还可以通过使用第三方库根据输入类型进行调度来解决此问题。例如,使用multipledispatch,我们可以执行以下操作:

@dispatch(dict, dict)
def merge(x, y):
    x_keys, y_keys = x.keys(), y.keys()
    result = { k: merge(x[k], y[k]) for k in x_keys & y_keys }
    result.update({k: x[k] for k in x_keys - y_keys})
    result.update({k: y[k] for k in y_keys - x_keys})
    return result

@dispatch(str, str)
def merge(x, y):
    return x + y

@dispatch(tuple, tuple)
def merge(x, y):
    return x + y

@dispatch(list, list)
def merge(x, y):
    return x + y

@dispatch(int, int):
def merge(x, y):
    raise ValueError("integer value conflict")

@dispatch(object, object):
    return (x, y)

这使得我们无需编写自己的类型检查即可处理各种叶类型特殊情况的组合,并且还可以替换主递归函数中的类型检查。

解决方案 16:

您可以使用包merge中的函数toolz,例如:

>>> import toolz
>>> dict1 = {1: {'a': 'A'}, 2: {'b': 'B'}}
>>> dict2 = {2: {'c': 'C'}, 3: {'d': 'D'}}
>>> toolz.merge_with(toolz.merge, dict1, dict2)
{1: {'a': 'A'}, 2: {'c': 'C'}, 3: {'d': 'D'}}

解决方案 17:

此版本的函数将考虑 N 个字典,并且只考虑字典——不能传递任何不适当的参数,否则会引发 TypeError。合并本身会考虑键冲突,并且不会覆盖合并链下游字典中的数据,而是会创建一组值并附加到该值中;不会丢失任何数据。

它可能不是页面上最有效的,但它是最全面的,并且当你合并 2 到 N 个字典时你不会丢失任何信息。

def merge_dicts(*dicts):
    if not reduce(lambda x, y: isinstance(y, dict) and x, dicts, True):
        raise TypeError, "Object in *dicts not of type dict"
    if len(dicts) < 2:
        raise ValueError, "Requires 2 or more dict objects"


    def merge(a, b):
        for d in set(a.keys()).union(b.keys()):
            if d in a and d in b:
                if type(a[d]) == type(b[d]):
                    if not isinstance(a[d], dict):
                        ret = list({a[d], b[d]})
                        if len(ret) == 1: ret = ret[0]
                        yield (d, sorted(ret))
                    else:
                        yield (d, dict(merge(a[d], b[d])))
                else:
                    raise TypeError, "Conflicting key:value type assignment"
            elif d in a:
                yield (d, a[d])
            elif d in b:
                yield (d, b[d])
            else:
                raise KeyError

    return reduce(lambda x, y: dict(merge(x, y)), dicts[1:], dicts[0])

print merge_dicts({1:1,2:{1:2}},{1:2,2:{3:1}},{4:4})

输出:{1:[1,2],2:{1:2,3:1},4:4}

解决方案 18:

由于 dictviews 支持集合操作,我能够大大简化 jterrace 的答案。

def merge(dict1, dict2):
    for k in dict1.keys() - dict2.keys():
        yield (k, dict1[k])

    for k in dict2.keys() - dict1.keys():
        yield (k, dict2[k])

    for k in dict1.keys() & dict2.keys():
        yield (k, dict(merge(dict1[k], dict2[k])))

任何将字典与非字典组合的尝试(从技术上讲,具有“keys”方法的对象和不具有“keys”方法的对象)都会引发 AttributeError。这包括对函数的初始调用和递归调用。这正是我想要的,所以我保留了它。您可以轻松捕获递归调用引发的 AttributeError,然后产生您想要的任何值。

解决方案 19:

下面的函数将 b 合并到 a 中。

def mergedicts(a, b):
    for key in b:
        if isinstance(a.get(key), dict) or isinstance(b.get(key), dict):
            mergedicts(a[key], b[key])
        else:
            a[key] = b[key]

解决方案 20:

当然,代码将取决于您解决合并冲突的规则。这是一个可以接受任意数量的参数并将它们递归合并到任意深度的版本,而无需使用任何对象变异。它使用以下规则来解决合并冲突:

  • 字典优先于非字典值({"foo": {...}}优先于{"foo": "bar"}

  • 后面的参数优先于前面的参数(如果按顺序合并{"a": 1}{"a", 2}{"a": 3},结果将是{"a": 3}

try:
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    """                                                                             
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    """                                                                             
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
        else:                                                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  

解决方案 21:

还有另一个细微的变化:

这是一个纯 Python3 集合的深度更新函数。它通过一次循环一层来更新嵌套字典,并调用自身来更新下一层字典的值:

def deep_update(dict_original, dict_update):
    if isinstance(dict_original, dict) and isinstance(dict_update, dict):
        output=dict(dict_original)
        keys_original=set(dict_original.keys())
        keys_update=set(dict_update.keys())
        similar_keys=keys_original.intersection(keys_update)
        similar_dict={key:deep_update(dict_original[key], dict_update[key]) for key in similar_keys}
        new_keys=keys_update.difference(keys_original)
        new_dict={key:dict_update[key] for key in new_keys}
        output.update(similar_dict)
        output.update(new_dict)
        return output
    else:
        return dict_update

一个简单的例子:

x={'a':{'b':{'c':1, 'd':1}}}
y={'a':{'b':{'d':2, 'e':2}}, 'f':2}

print(deep_update(x, y))
>>> {'a': {'b': {'c': 1, 'd': 2, 'e': 2}}, 'f': 2}

解决方案 22:

那么其他答案怎么样?!这个答案也避免了突变/副作用:

def merge(dict1, dict2):
    output = {}

    # adds keys from `dict1` if they do not exist in `dict2` and vice-versa
    intersection = {**dict2, **dict1}

    for k_intersect, v_intersect in intersection.items():
        if k_intersect not in dict1:
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = v_dict2

        elif k_intersect not in dict2:
            output[k_intersect] = v_intersect

        elif isinstance(v_intersect, dict):
            v_dict2 = dict2[k_intersect]
            output[k_intersect] = merge(v_intersect, v_dict2)

        else:
            output[k_intersect] = v_intersect

    return output

dict1 = {1:{"a":{"A"}}, 2:{"b":{"B"}}}
dict2 = {2:{"c":{"C"}}, 3:{"d":{"D"}}}
dict3 = {1:{"a":{"A"}}, 2:{"b":{"B"},"c":{"C"}}, 3:{"d":{"D"}}}

assert dict3 == merge(dict1, dict2)

解决方案 23:

返回合并而不影响输入字典。

def _merge_dicts(dictA: Dict = {}, dictB: Dict = {}) -> Dict:
    # it suffices to pass as an argument a clone of `dictA`
    return _merge_dicts_aux(dictA, dictB, copy(dictA))


def _merge_dicts_aux(dictA: Dict = {}, dictB: Dict = {}, result: Dict = {}, path: List[str] = None) -> Dict:

    # conflict path, None if none
    if path is None:
        path = []

    for key in dictB:

        # if the key doesn't exist in A, add the B element to A
        if key not in dictA:
            result[key] = dictB[key]

        else:
            # if the key value is a dict, both in A and in B, merge the dicts
            if isinstance(dictA[key], dict) and isinstance(dictB[key], dict):
                _merge_dicts_aux(dictA[key], dictB[key], result[key], path + [str(key)])

            # if the key value is the same in A and in B, ignore
            elif dictA[key] == dictB[key]:
                pass

            # if the key value differs in A and in B, raise error
            else:
                err: str = f"Conflict at {'.'.join(path + [str(key)])}"
                raise Exception(err)

    return result

受到@andrew cooke解决方案的启发

解决方案 24:

如果您同意让第二个字典中的值优先于第一个字典中的值,那么可以像下面这样轻松地完成此操作。

def merge_dict(bottom: dict, top: dict) -> dict:

    ret = {}

    for _tmp in (bottom, top):
        for k, v in _tmp.items():
            if isinstance(v, dict):
                if k not in ret:
                    ret[k] = v
                else:
                    ret[k] = merge_dict(ret[k], v)
            else:
                ret[k] = _tmp[k]
    return ret

例子:

d_bottom = {
    'A': 'bottom-A',
    'B': 'bottom-B',
    'D': {
        'DA': 'bottom-DA',
        'DB': 'bottom-DB',
        'DC': {
            "DCA": "bottom-DCA",
            "DCB": "bottom-DCB",
            "DCC": "bottom-DCC",
            "DCD": {
                "DCDA": 'bottom-DCDA',
                "DCDB": 'bottom-DCDB'
            }
        }
    }
}

d_top = {
    'A': 'top-A',
    'B': 'top-B',
    'D': {
        'DA': 'top-DA',
        'DB': 'top-DB',
        'DC': {
            'DCA': 'top-DCA',
            "DCD": {
                "DCDA": "top-DCDA"
            }
        },
        'DD': 'top-DD'
    },
    'C': {
        'CA': 'top-CA',
        'CB': {
            'CBA': 'top-CBA',
            'CBB': 'top-CBB',
            'CBC': {
                'CBCA': 'top-CBCA',
                'CBCB': {
                    'CBCBA': 'top-CBCBA'
                }
            }
        }
    }
}

print(json.dumps(merge_dict(d_bottom, d_top), indent=4))

输出:

{
    "A": "top-A",
    "B": "top-B",
    "D": {
        "DA": "top-DA",
        "DB": "top-DB",
        "DC": {
            "DCA": "top-DCA",
            "DCB": "bottom-DCB",
            "DCC": "bottom-DCC",
            "DCD": {
                "DCDA": "top-DCDA",
                "DCDB": "bottom-DCDB"
            }
        },
        "DD": "top-DD"
    },
    "C": {
        "CA": "top-CA",
        "CB": {
            "CBA": "top-CBA",
            "CBB": "top-CBB",
            "CBC": {
                "CBCA": "top-CBCA",
                "CBCB": {
                    "CBCBA": "top-CBCBA"
                }
            }
        }
    }
}

如果有两个以上的字典需要合并,则可以将它与functools.reduce()一起使用。

解决方案 25:

我有两个字典(ab),每个字典都可以包含任意数量的嵌套字典。我想以递归方式合并它们,并使b优先于a

将嵌套字典视为树,我想要的是:

  • 进行更新a,以便到每个叶子的每条路径都b将表示在a

  • a如果在相应的路径中找到叶子,则覆盖子树b

    • 保持不变的是,所有b叶节点都保持为叶。

现有的答案对我来说有点复杂,并且遗漏了一些细节。我实现了以下内容,它通过了我的数据集的单元测试。

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b:
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

示例(已格式化以便清晰):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    },
    2 : {'d': 'green'},
    3: 'e'
  }

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'
  }


  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

b需要维护的路径包括:

  • 1 -> 'b' -> 'white'

  • 2 -> 'd' -> 'black'

  • 3 -> 'e'

a具有以下独特且不冲突的路径:

  • 1 -> 'a' -> 'red'

  • 1 -> 'c' -> 'orange' -> 'dog'

所以它们仍然出现在合并的地图中。

解决方案 26:

这是我提出的一个解决方案,它以递归方式合并字典。传递给函数的第一个字典是主字典 - 其中的值将覆盖第二个字典中相同键的值。

def merge(dict1: dict, dict2: dict) -> dict:
    merged = dict1

    for key in dict2:
        if type(dict2[key]) == dict:
            merged[key] = merge(dict1[key] if key in dict1 else {}, dict2[key])
        else:
            if key not in dict1.keys():
                merged[key] = dict2[key]

    return merged

解决方案 27:

这是一个递归解决方案,类似于上面写的解决方案,但它不会引发任何异常,只是合并两个字典。

def merge(ans: dict,  a:dict ):
keys = a.keys()
for key in keys:
    if key in ans:
        ans[key] = merge(ans[key], a[key])
        return ans
    else:
        ans[key] = a[key]
return ans

解决方案 28:

我一直在测试您的解决方案,并决定在我的项目中使用这个:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

将函数作为参数传递是扩展 jterrace 解决方案使其像所有其他递归解决方案一样运行的关键。

解决方案 29:

我能想到的最简单的方法是:

#!/usr/bin/python

from copy import deepcopy
def dict_merge(a, b):
    if not isinstance(b, dict):
        return b
    result = deepcopy(a)
    for k, v in b.iteritems():
        if k in result and isinstance(result[k], dict):
                result[k] = dict_merge(result[k], v)
        else:
            result[k] = deepcopy(v)
    return result

a = {1:{"a":'A'}, 2:{"b":'B'}}
b = {2:{"c":'C'}, 3:{"d":'D'}}

print dict_merge(a,b)

输出:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

解决方案 30:

我这里有另一个稍微不同的解决方案:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

默认情况下,它会根据第二个字典中的值来解决冲突,但您可以轻松地覆盖它,使用一些巫术您甚至可以抛出异常。:)。

相关推荐
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   997  
  在项目管理领域,CDCP(Certified Data Center Professional)认证评审是一个至关重要的环节,它不仅验证了项目团队的专业能力,还直接关系到项目的成功与否。在这一评审过程中,沟通技巧的运用至关重要。有效的沟通不仅能够确保信息的准确传递,还能增强团队协作,提升评审效率。本文将深入探讨CDCP...
华为IPD流程   34  
  IPD(Integrated Product Development,集成产品开发)是一种以客户需求为核心、跨部门协同的产品开发模式,旨在通过高效的资源整合和流程优化,提升产品开发的成功率和市场竞争力。在IPD培训课程中,掌握关键成功因素是确保团队能够有效实施这一模式的核心。以下将从五个关键成功因素展开讨论,帮助企业和...
IPD项目流程图   40  
  华为IPD(Integrated Product Development,集成产品开发)流程是华为公司在其全球化进程中逐步构建和完善的一套高效产品开发管理体系。这一流程不仅帮助华为在技术创新和产品交付上实现了质的飞跃,还为其在全球市场中赢得了显著的竞争优势。IPD的核心在于通过跨部门协作、阶段性评审和市场需求驱动,确保...
华为IPD   39  
  华为作为全球领先的通信技术解决方案提供商,其成功的背后离不开一套成熟的管理体系——集成产品开发(IPD)。IPD不仅是一种产品开发流程,更是一种系统化的管理思想,它通过跨职能团队的协作、阶段评审机制和市场需求驱动的开发模式,帮助华为在全球市场中脱颖而出。从最初的国内市场到如今的全球化布局,华为的IPD体系在多个领域展现...
IPD管理流程   71  
热门文章
项目管理软件有哪些?
云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用