语法篇（9）：python中的正则表达式 · 我的python小册

# [正则表达式在Python中的应用][1] [1]: http://www.runoob.com/python/python-reg-expressions.html [TOC] Python提供re模块，包含所有正则表达式的功能。 ```python import re s = 'ABC\\-001' # Python的字符串 # 对应的正则表达式字符串变成：'ABC\-001' 因为Python的字符串本身也用\转义 # 强烈建议使用Python的r前缀，就不用考虑转义的问题了 s = r'ABC\-001' # 'ABC\-001' # match()方法判断是否匹配，如果匹配成功，返回一个Match对象，否则返回None。 re.match(r'^\d{3}\-\d{3,8}$', '010-12345') # <_sre.SRE_Match object; span=(0, 9), match='010-12345'> re.match(r'^\d{3}\-\d{3,8}$', '010 12345') # 用正则按一个或多个空格切分字符串 re.split(r'\s+', 'a b c') # ['a', 'b', 'c'] # 根据一个或多个，以及空格切分 re.split(r'[\s\,]+', 'a,b, c d') # ['a', 'b', 'c', 'd'] # 找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表 pattern = re.compile(r'\d+') # 查找数字 result1 = pattern.findall('runoob 123 google 456') # ['123', '456'] ``` ## 分组同其它语言一样，0代表全部，1代表第一个分组，2代表第二个分组；以此类推； ```python m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345') m.groups() # ('010', '12345') m.group(0) # '010-12345' m.group(1) # '010' m.group(2) # '12345' ``` > 注意！注意！！！match尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就返回none。 ```python m = re.match(r'^(\d{3})-(\d{3,8})$', 'z010-12345') # none n = re.search(r'(\d{3})-(\d{3,8})$', 'z010-12345'); # <_sre.SRE_Match object; span=(1, 10), match='010-12345'> ``` ## 贪婪匹配同其它语言一样，正则匹配默认是贪婪匹配 ```python re.match(r'^(\d+)(0*)$', '102300').groups() # ('102300', '') 默认第二组匹配不到 re.match(r'^(\d+?)(0*)$', '102300').groups() # ('1023', '00') 加个?就可以让\d+采用非贪婪匹配 re.match(r'^(\d+?)$', '102300').groups() # ('102300',) 如果后面没有的话，也会匹配完 ``` ## 提前编译当我们在Python中使用正则表达式时，re模块内部会干两件事情： 1. 编译正则表达式，如果正则表达式的字符串本身不合法，会报错； 2. 用编译后的正则表达式去匹配字符串 ```python re_telephone = re.compile(r'^(\d{3})-(\d{3,8})$') re_telephone.match('010-12345').groups() # ('010', '12345') ``` ## 检索和替换 ```python phone = "2004-959-559 # 这是一个国外电话号码" # 删除字符串中的 Python注释 num = re.sub(r'#.*$', "", phone) print(num) ```