Python 正則表達式模組re(Regular expression) -3 (匹配 & 結果)

上一篇介紹了pattern的樣式設定
現在繼續探討在設定之後匹配的方式以及結果的輸出

python:3.6.1

********************************************************************************
首先先說一下一個偷懶的小撇步
在前幾篇我們都是先re.compile字串樣式
但是其實可以直接在匹配函示中直接compile
pattern = re.compile(r'a') match_object = re.findall(pattern,'abc') #其實會等同下列 match_object = re.findall(r'a','abc')
使用函式時,有時回傳值不一定是字串,而是匹配物件(<_sre.SRE_Match object>),此時會需要用到group函示輸出,這點也會在後面說明。


匹配方式:

re.escape(string):除了英文字母、數字和_以外,對所有字進行反斜線處理,可用在變數上;回傳值為字串
match_object = re.escape('1234@gmail.com')
print(pattern) #->'1234\\@gmail\\.com'

re.search(pattern,string[,flags=0]):查找整個字符串;回傳值為匹配物件或None
match_object = re.search('abc', 'bcabcabcabcabc')
print(match_object)
#-> <_sre.SRE_Match object; span=(2, 5), match='abc'>
print(match_object.group())
#->'abc'

re.match(pattern, string[, flags=0]):同search,但是從字符串的開始處進行匹配;回傳值為匹配物件或None
match_object = re.match('abc', 'bcabcabcabcabc')
print(match_object)
#-> None

match_object.pos:決定匹配起始位置(PS:compile跟match等不可合併)
pattern = re.compile(r'(\d+)a')
match_object = pattern.match('1234a1234a1234',pos=5)
print(match_object)
#-> <_sre.SRE_Match object; span=(5, 10), match='1234a'>
match_object = re.match(r'(\d+)a','1234a1234a1234',pos=5)
#-> TypeError: match() got an unexpected keyword argument 'pos'

match_object.endpos:決定匹配結束位置(PS:compile跟match等不可合併)
pattern = re.compile(r'(\d+)a')
match_object = pattern.match('1234a1234a1234',pos=5,endpos=6)
print(m)
#-> None

re.findall(pattern, string):找出所有匹配項;回傳值為list
match_object = re.findall('abc', 'bcabcabcabcabc')
print(match_object)
#-> ['abc', 'abc', 'abc', 'abc']

re.fullmatch(pattern, string, flags=0):需完全符合,前後也不可多其他字符;回傳值為匹配物件或None
match_object = re.fullmatch(r'hello world', 'hello world')
print(match_object)
#-> <_sre.SRE_Match object; span=(0, 11), match='hello world'>

re.split(pattern, string[, maxsplit=0[, flags=0]]):分割,pattern為分割字元;回傳值為list
match_object = re.split(r'\d', 'ab2cd5d')
print(match_object)
#-> ['ab', 'cd', 'd']

re.sub(pattern, repl, string[, count=0[, flags=0]]):repl為取代字串,string為被取代字串,count為被取代次數;回傳值為字串
match_object = re.sub(r'\d','_', 'ab2cd5d')
print(match_object)
# -> 'ab_cd_d'

re.subn(pattern, repl, string[, count=0[, flags=0]]):同 sub;回傳值 (字串, 取代次數)
match_object = re.subn(r'\d','_', 'ab2cd5d')
print(match_object)
#->('ab_cd_d', 2)



匹配物件(<_sre.SRE_Match object>)輸出:

match_object.expand(template):如果匹配失敗將會回傳AttributeError
match_object = re.match(r'(\w*) (\w*)','hello world!')
print(match_object.expand(r'\2 \1'))
#-> 'world hello'

match_object.group([group1, ...]):回傳分組內的匹配內容
match_object = re.match(r'(\w*) (\w*)(?P<tt>.*)','hello world!!!') print(match_object.group())
#-> 'hello world!!!'
print(match_object.group(0))
#-> 'hello world!!!'
print(match_object.group(1))
#-> 'hello'
print(match_object.group(3))
#-> '!!!'
print(match_object.group('tt'))
#-> '!!!'

match_object.groups(default=None):將分組輸出為tuple
match_object = re.match(r'(\w*) (\w*)(?P<tt>.*)','hello world!!!') print(match_object.groups())
#-> ('hello', 'world', '!!!')

match_object.groupdict(default=None):將分組輸出為dict(需命名)
match_object = re.match(r'(\w*) (\w*)(?P<tt>.*)','hello world!!!') print(match_object.groupdict())
#-> {'tt': '!!!'}

match_object.start([group]):輸出第n分組的第一個字的索引
match_object = re.match(r'(\d*) (\d*)(?P<tt>.*)','012345 789!!') print(match_object.start())
#-> 0
print(match_object.start(2))
#-> 7

match_object.end([group]):輸出第n分組的最後一個字的索引
match_object = re.match(r'(\d*) (\d*)(?P<tt>.*)','012345 789!!')
print(match_object.end())
#-> 12
print(match_object.end(1))
#-> 6

match_object.span([group]):輸出(開始索引,結尾索引)
match_object = re.match(r'(\d*) (\d*)(?P<tt>.*)','012345 789!!')
print(match_object.span())
#-> (0, 12)
print(match_object.span(3))
#-> (10, 12)
print(match_object.span('tt'))
#-> (10, 12)

match_object.lastindex:回傳分組的最後一個索引直
match_object = re.search(r'.(\d)(\d)(\d)(\d)(a)','1234a1234a1234') print(match_object.lastindex)
#-> 5
match_object = re.search(r'.(\d+)(a)','1234a1234a1234')
print(match_object.lastindex)
#-> 2

match_object.lastgroup:回傳分組(有命名)的最後一個名稱(key)
match_object = re.search(r'(?P<first>\d*)(?P<second>a)','1234a1234a1234') print(match_object.lastgroup)
#-> second

match_object.string:回傳被匹配的字串
match_object = re.search(r'(?P<first>\d*)(?P<second>a)','1234a1234a1234') print(match_object.string)
#-> 1234a1234a1234
print(match_object.group())
#-> 1234a
print(match_object.groups())
#-> ('1234', 'a')


關於正則表達式re(Regular expression)的說明大概就到這裡
希望大家善用re讓自己coding更方便

留言

這個網誌中的熱門文章