数据结构与内置类型

list 底层实现

Python list 是动态数组（类似 C++ vector）：

连续内存块存储对象指针
预分配额外空间（避免频繁扩容）
扩容策略：新容量约为当前的 1.125x + 6

import sys
 
lst = []
prev_size = sys.getsizeof(lst)
for i in range(20):
    lst.append(i)
    curr_size = sys.getsizeof(lst)
    if curr_size != prev_size:
        print(f"len={len(lst)}, size={curr_size}")
        prev_size = curr_size
# 可观察到扩容发生的时机：4, 8, 16, 25...

list 复杂度

操作	复杂度	说明
`append(x)`	O(1) 均摊	偶尔扩容 O(n)
`pop()`	O(1)	末尾删除
`pop(0)`	O(n)	移动所有元素！
`insert(i, x)`	O(n)	移动后续元素
`x in lst`	O(n)	线性搜索
`lst[i]`	O(1)	随机访问
`sort()`	O(n log n)	Timsort

需要频繁头部操作？用 collections.deque！

from collections import deque
dq = deque([1, 2, 3])
dq.appendleft(0)   # O(1)
dq.popleft()       # O(1)

dict 底层实现

Python dict 是哈希表（开放寻址法）：

Python 3.7+ 保证插入顺序（有序）
负载因子超过 2/3 时扩容（容量翻倍）
哈希冲突用探测序列解决

d = {}
d["key"] = "value"
# 内部过程：
# 1. hash("key") → 哈希值
# 2. 哈希值 % 表大小 → 槽位索引
# 3. 若槽位空则插入；若冲突则探测下一个槽位

dict 复杂度

操作	平均	最坏
`d[key]`	O(1)	O(n)
`d[key] = val`	O(1)	O(n)
`key in d`	O(1)	O(n)
`del d[key]`	O(1)	O(n)

最坏情况（O(n））极少发生，需大量哈希冲突才会触发。

常用 dict 技巧

from collections import defaultdict, Counter, OrderedDict
 
# defaultdict：访问不存在的 key 时自动创建默认值
graph = defaultdict(list)
graph["A"].append("B")  # 不需要先判断 key 是否存在
 
# Counter：计数器
from collections import Counter
words = ["apple", "banana", "apple", "cherry"]
count = Counter(words)
print(count.most_common(2))  # [('apple', 2), ('banana', 1)]
 
# dict 合并（Python 3.9+）
a = {"x": 1}
b = {"y": 2}
merged = a | b  # {'x': 1, 'y': 2}

set 底层实现

set 也是哈希表，只存 key 不存 value：

s = {1, 2, 3, 4}
print(2 in s)    # O(1)
 
# 集合运算
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
print(a & b)     # {3, 4}  交集
print(a | b)     # {1, 2, 3, 4, 5, 6}  并集
print(a - b)     # {1, 2}  差集
print(a ^ b)     # {1, 2, 5, 6}  对称差

内置排序：Timsort

Python 的 sort() 和 sorted() 使用 Timsort（归并 + 插入排序）：

时间复杂度：O(n log n)
空间复杂度：O(n)
稳定排序（相等元素保持原始顺序）

# key 参数
students = [("Alice", 90), ("Bob", 85), ("Charlie", 90)]
students.sort(key=lambda x: (-x[1], x[0]))  # 按分数降序，同分按姓名升序
# [('Alice', 90), ('Charlie', 90), ('Bob', 85)]
 
# operator.attrgetter / itemgetter 更高效
from operator import itemgetter
students.sort(key=itemgetter(1), reverse=True)

collections 模块

from collections import namedtuple, deque, defaultdict, Counter, OrderedDict
 
# namedtuple：轻量级对象，有字段名的 tuple
Point = namedtuple("Point", ["x", "y"])
p = Point(1, 2)
print(p.x, p.y)   # 1 2（可读性强，内存同 tuple）
 
# deque：双端队列，O(1) 两端操作
dq = deque(maxlen=3)  # 固定大小，满了自动丢弃旧元素
dq.extend([1, 2, 3])
dq.append(4)
print(dq)  # deque([2, 3, 4], maxlen=3)

heapq（优先队列）

Python 的 heapq 是最小堆：

import heapq
 
# 基本操作
heap = []
heapq.heappush(heap, 3)
heapq.heappush(heap, 1)
heapq.heappush(heap, 2)
print(heapq.heappop(heap))  # 1（最小值）
 
# 最大堆：存负数
max_heap = []
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -1)
print(-heapq.heappop(max_heap))  # 3
 
# 实用函数
nums = [3, 1, 4, 1, 5, 9, 2, 6]
print(heapq.nlargest(3, nums))   # [9, 6, 5]
print(heapq.nsmallest(3, nums))  # [1, 1, 2]

面试高频题

Q: 为什么 list 不能作为 dict 的 key？

dict 的 key 必须是可哈希的（hashable），要求：

实现 __hash__
实现 __eq__
两个相等的对象必须有相同的哈希值

list 是可变的，如果修改后哈希值变了，dict 就找不到了。tuple 可以（如果元素也都可哈希）。

Q: list vs tuple 如何选择？

tuple：数据不应被修改（坐标、RGB值）；可作 dict key；解包更清晰
list：需要增删改；同质元素集合

Q: 如何找 list 中第 k 大的元素（不排序）？

import heapq
 
def kth_largest(nums, k):
    # O(n log k) 时间，O(k) 空间
    return heapq.nlargest(k, nums)[-1]
 
# 或者用 partition（快速选择，O(n) 平均）

Notes

Explorer

05-数据结构与内置类型