【Python】collections模块

介绍

这个模块实现了一些专门化的容器，提供了对 Python 的通用内建容器 dict、list、set 和 tuple 的补充。

类	作用
namedtuple()	一个工厂函数，用来创建元组的子类，子类的字段是有名称的。
deque	类似列表的容器，但 append 和 pop 在其两端的速度都很快。
ChainMap	类似字典的类，用于创建包含多个映射的单个视图。
Counter	用于计数 hashable 对象的字典子类
OrderedDict	字典的子类，能记住条目被添加进去的顺序。
defaultdict	字典的子类，通过调用用户指定的工厂函数，为键提供默认值。
UserDict	封装了字典对象，简化了字典子类化
UserList	封装了列表对象，简化了列表子类化
UserString	封装了字符串对象，简化了字符串子类化

ChainMap

ChainMap 类将多个映射迅速地链到一起，这样它们就可以作为一个单元处理。这通常比创建一个新字典再重复地使用 update() 要快得多。

class collections.ChainMap(*maps)

基本用法

1
2
3
4
5
6
7


from collections import ChainMap

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}

cm = ChainMap(dict1, dict2)
print(cm)  # ChainMap({'a': 1, 'b': 2}, {'b': 3, 'c': 4})

ChainMap 的键值查找规则

1
2
3


print(cm['a'])  # 1  （来自 dict1）
print(cm['b'])  # 2  （dict1 覆盖了 dict2 的 'b'）
print(cm['c'])  # 4  （来自 dict2）

ChainMap 按照从左到右（即从第一个字典到最后一个字典）的顺序进行查找。
b 在 dict1 和 dict2 都存在，但 dict1 里的值优先级更高。

如果查找一个不存在的键，会报 KeyError：

1

print(cm['d'])  # KeyError: 'd'

ChainMap 的修改规则

修改 ChainMap 只会影响第一个字典

1
2
3


cm['b'] = 99
print(dict1)  # {'a': 1, 'b': 99}
print(dict2)  # {'b': 3, 'c': 4}

直接修改 cm[‘b’] 只会影响 dict1，不会影响 dict2。
ChainMap 只允许修改第一个字典中的值。
新增键值对也会加到第一个字典中

ChainMap 的动态特性

ChainMap 动态反映底层字典的变化：

1
2


dict1['a'] = 42
print(cm['a'])  # 42  （dict1 改变后，ChainMap 也会更新）

ChainMap 的方法

maps - 访问所有字典

1

print(cm.maps)  # [{'a': 42, 'b': 99, 'd': 100}, {'b': 3, 'c': 4}]

new_child() - 创建新的 ChainMap

1
2


cm2 = cm.new_child({'e': 5})
print(cm2)  # ChainMap({'e': 5}, {'a': 42, 'b': 99, 'd': 100}, {'b': 3, 'c': 4})

这样 cm2 里 e 只会影响最前面的新字典，cm 不受影响。
这在作用域管理中很有用（比如 Python 解释器的变量查找）。

ChainMap 的应用场景

配置管理

在应用程序中，我们通常有默认配置，但也允许用户提供自定义配置：

1
2
3
4
5
6
7


default_config = {'theme': 'light', 'font': 'Arial', 'timeout': 30}
user_config = {'theme': 'dark', 'timeout': 60}

config = ChainMap(user_config, default_config)

print(config['theme'])  # dark  （用户配置覆盖默认配置）
print(config['font'])   # Arial （用户未提供，使用默认值）

这样就能优先使用用户配置，如果没有，就用默认值。

变量作用域

Python 解释器在查找变量时，会按局部 -> 全局 -> 内置的顺序进行查找，这与 ChainMap 类似：

1
2
3
4
5
6
7


global_scope = {'x': 10, 'y': 20}
local_scope = {'y': 5, 'z': 30}

env = ChainMap(local_scope, global_scope)
print(env['x'])  # 10  （来自全局作用域）
print(env['y'])  # 5   （局部作用域优先）
print(env['z'])  # 30  （来自局部作用域）

命令行参数解析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


import argparse

# 定义命令行参数
parser = argparse.ArgumentParser()
parser.add_argument('--timeout', type=int)
parser.add_argument('--theme', type=str)

args = parser.parse_args([])  # 模拟不传递参数
cli_args = {k: v for k, v in vars(args).items() if v is not None}

# 默认配置
default_settings = {'theme': 'light', 'timeout': 30}
config = ChainMap(cli_args, default_settings)

print(config['theme'])  # light
print(config['timeout'])  # 30

defaultdict

defaultdict 是 Python collections 模块中的一个字典子类，它和普通的 dict 很相似，但有一个关键的区别：当访问不存在的键时，defaultdict 不会报 KeyError，而是返回一个默认值，这个默认值由一个工厂函数提供。

defaultdict 基本用法

1
2
3
4
5
6
7


from collections import defaultdict

# 创建一个默认值为 list 的 defaultdict
d = defaultdict(list)

# 访问不存在的键，返回一个新的空列表
print(d["key"])  # 输出：[]

对比普通字典：

1
2


d = {}
print(d["key"])  # ❌ KeyError

defaultdict 的参数

1

defaultdict(default_factory[, ...])

default_factory：用于生成默认值的可调用对象（如 list、int、set、lambda 等）。
其他参数和 dict 一样。

常见默认工厂

默认值为 int（适用于计数器）

1
2
3
4
5
6


d = defaultdict(int)

d["apple"] += 1
d["banana"] += 1

print(d)  # {'apple': 1, 'banana': 1}

默认值为 list（适用于分组）

1
2
3
4
5
6
7
8


d = defaultdict(list)

d["fruits"].append("apple")
d["fruits"].append("banana")
d["vegetables"].append("carrot")

print(d)
# {'fruits': ['apple', 'banana'], 'vegetables': ['carrot']}

默认值为 set（去重分组）

1
2
3
4
5
6
7
8


d = defaultdict(set)

d["fruits"].add("apple")
d["fruits"].add("banana")
d["fruits"].add("apple")  # 重复添加，不会重复存储

print(d)
# {'fruits': {'apple', 'banana'}}

OrderedDict

OrderedDict 是 collections 模块中的有序字典，它继承自 dict，但能够保持键值对的插入顺序（在 Python 3.7+ 的 dict 也默认保持顺序，但 OrderedDict 仍有一些额外功能）

创建 OrderedDict

1
2
3
4
5
6
7
8
9


from collections import OrderedDict

# 普通 dict（Python 3.7+ 默认保持顺序）
d1 = {"a": 1, "b": 2, "c": 3}
print(d1)  # {'a': 1, 'b': 2, 'c': 3}

# OrderedDict（显式使用）
d2 = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
print(d2)  # OrderedDict([('a', 1), ('b', 2), ('c', 3)])

特点：

Python 3.6 及以下：普通 dict 不会维持插入顺序，OrderedDict 可以。
Python 3.7+：普通 dict 默认维持顺序，但 OrderedDict 仍然提供额外方法。

OrderedDict 关键特性

按照插入顺序迭代

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


d = OrderedDict()
d["x"] = 10
d["y"] = 20
d["z"] = 30

for k, v in d.items():
    print(k, v)

# 输出：
# x 10
# y 20
# z 30  ✅ 顺序保持不变

支持 move_to_end 调整顺序

move_to_end(key, last=True) 可以移动指定键到末尾或开头：

1
2
3
4
5
6
7


d = OrderedDict([("a", 1), ("b", 2), ("c", 3)])

d.move_to_end("a")  # 把 'a' 移到末尾
print(d)  # OrderedDict([('b', 2), ('c', 3), ('a', 1)])

d.move_to_end("c", last=False)  # 把 'c' 移到开头
print(d)  # OrderedDict([('c', 3), ('b', 2), ('a', 1)])

支持 popitem()取出(FIFO/LIFO)

popitem(last=True):

last=True（默认）➡ 后进先出（LIFO）
last=False ➡ 先进先出（FIFO）

1
2
3
4
5


d = OrderedDict([("a", 1), ("b", 2), ("c", 3)])

print(d.popitem())       # ('c', 3)  ✅ 默认 LIFO
print(d.popitem(last=False))  # ('a', 1)  ✅ FIFO
print(d)  # OrderedDict([('b', 2)])

OrderedDict 应用场景

由于 OrderedDict 维护顺序，它可以实现 LRU（最近最少使用）缓存：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


class LRUCache:
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return -1
        self.cache.move_to_end(key)  # 最近访问的移动到末尾
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最早插入的元素（FIFO）

# 示例
cache = LRUCache(2)
cache.put(1, "A")
cache.put(2, "B")
print(cache.cache)  # OrderedDict([(1, 'A'), (2, 'B')])

cache.get(1)  # 访问 1，使其变成最近使用
cache.put(3, "C")  # 淘汰最久未使用的 2
print(cache.cache)  # OrderedDict([(1, 'A'), (3, 'C')])