【Python】Python正则表达式re模块

介绍

Python 正则表达式（Regular Expression，简称 regex）是一种强大的文本处理工具，主要用于字符串匹配、查找、替换和分割。Python 的 re 模块提供了对正则表达式的支持。

导入 re 模块

在 Python 中使用正则表达式时，需要先导入 re 模块：

1

import re

正则表达式基本语法

正则表达式使用特殊字符（元字符 metacharacters）来定义匹配模式。

符号	说明	示例
`.`	匹配任意字符（除换行符）	`a.c` 可匹配 `abc`, `a+c`
`^`	匹配字符串开头	`^Hello` 匹配 `"Hello world"`
`$`	匹配字符串结尾	`end$` 匹配 `"the end"`
`*`	匹配前一个字符 0 次或多次	`ab*c` 匹配 `"ac"`, `"abc"`
`+`	匹配前一个字符 1 次或多次	`ab+c` 匹配 `"abc"`, `"abbc"`
`?`	匹配前一个字符 0 或 1 次	`ab?c` 匹配 `"ac"`, `"abc"`
`{n}`	匹配前一个字符 n 次	`a{3}` 匹配 `"aaa"`
`{n,}`	至少匹配 n 次	`a{2,}` 匹配 `"aa"`, `"aaa"`
`{n,m}`	匹配 n 到 m 次	`a{2,4}` 匹配 `"aa"`, `"aaa"`
`[]`	匹配字符类中的任意字符	`[aeiou]` 匹配任何元音字母
`\d`	匹配数字（0-9）	`\d+` 匹配 `"123"`, `"456"`
`\D`	匹配非数字	`\D+` 匹配 `"abc"`

常用预定义字符集

符号	作用	示例
`\d`	匹配数字（0-9）	`\d+` 匹配 `"123"`, `"456"`
`\D`	匹配非数字	`\D+` 匹配 `"abc"`, `"hello"`
`\w`	匹配单词字符（字母、数字、下划线）	`\w+` 匹配 `"hello_123"`
`\W`	匹配非单词字符	`\W+` 匹配 `",.!@"`
`\s`	匹配空白字符（空格、制表符 `\t`、换行 `\n`）	`\s+` 匹配 `" "`
`\S`	匹配非空白字符	`\S+` 匹配 `"hello"`

re 模块的常用函数

re.match() - 从字符串开头开始匹配

1
2
3
4
5
6
7
8


import re

pattern = r"hello"
text = "hello world"
match = re.match(pattern, text)

if match:
    print("匹配成功:", match.group())  # 输出: hello

re.search() - 搜索整个字符串

1
2
3
4
5
6
7
8


import re

pattern = r"world"
text = "hello world"
search_result = re.search(pattern, text)

if search_result:
    print("匹配成功:", search_result.group())  # 输出: world

re.findall() - 返回所有匹配项的列表

1
2
3
4
5
6


import re

pattern = r"\d+"
text = "订单编号 12345，金额 67890"
matches = re.findall(pattern, text)
print(matches)  # 输出: ['12345', '67890']

re.finditer() - 返回所有匹配项的迭代器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


import re

pattern = r"\d+"
text = "订单编号 12345，金额 67890"
matches = re.finditer(pattern, text)

for match in matches:
    print("匹配项:", match.group())
# 输出:
# 匹配项: 12345
# 匹配项: 67890

re.sub() - 替换匹配项

1
2
3
4
5
6


import re

pattern = r"\d+"
text = "价格 100 元"
new_text = re.sub(pattern, "200", text)
print(new_text)  # 输出: 价格 200 元

re.split() - 根据匹配项分割字符串

1
2
3
4
5
6


import re

pattern = r"\s+"
text = "Hello   world Python"
split_text = re.split(pattern, text)
print(split_text)  # 输出: ['Hello', 'world', 'Python']

使用 re.compile() 提高性能

re.compile() 可以预编译正则表达式，提高匹配效率，适用于需要重复匹配的情况。

1
2
3
4
5
6
7
8


import re

pattern = re.compile(r"\d+")
text = "订单编号 12345，金额 67890"

# 复用编译后的正则表达式
print(pattern.findall(text))  # 输出: ['12345', '67890']
print(pattern.sub("X", text))  # 输出: 订单编号 X，金额 X

贪婪匹配与非贪婪匹配

默认情况下，正则表达式是贪婪匹配（匹配尽可能多的字符）。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


import re

text = '<div>Hello</div><div>World</div>'

# 贪婪匹配（匹配最大可能的部分）
print(re.findall(r'<div>.*</div>', text))
# 输出: ['<div>Hello</div><div>World</div>']

# 非贪婪匹配（匹配最小可能的部分）
print(re.findall(r'<div>.*?</div>', text))
# 输出: ['<div>Hello</div>', '<div>World</div>']