re.search 和 re.match 有什么区别？-IT科技

摘要：问题描述：Python 模块中的search()和函数有什么区别？match()`re`我读过Python 2 文档（Python 3 文档），但我似乎从来没有记住它。解决方案 1：re.match`^`位于字符串的开头。这与换行符无关，因此与在模式中使用不同。正如re.match 文档所述：如果字符串开头的...

问题描述：

Python 模块中的search()和函数有什么区别？match()`re`

我读过Python 2 文档（Python 3 文档），但我似乎从来没有记住它。

解决方案 1：

re.match`^`位于字符串的开头。这与换行符无关，因此与在模式中使用不同。

正如re.match 文档所述：

如果字符串开头的零个或多个字符
与正则表达式模式匹配，则返回相应的MatchObject实例。None如果字符串与模式不匹配，则返回；请注意，这与零长度匹配不同。
注意：如果您想在字符串中的任何位置找到匹配项，请使用search()
。

re.search搜索整个字符串，如文档所述：

扫描字符串以查找正则表达式模式产生匹配的位置，并返回相应的MatchObject实例。None如果字符串中没有与模式匹配的位置，则返回；请注意，这与在字符串中的某个位置找到零长度匹配不同。

因此，如果您需要匹配字符串的开头，或者匹配整个字符串，请使用match。它更快。否则使用search。

该文档有一个专门针对vs.的部分match`search`，其中还涵盖多行字符串：

Python 提供了两种基于正则表达式的不同原始操作：仅在字符串的开头match检查匹配项
，而检查
字符串中任何位置的匹配项（这是 Perl 默认执行的操作）。search
请注意，
即使使用以开头的正则表达式，match也可能不同于:仅在字符串开头匹配，或者在
模式中也紧跟在换行符之后。“ ”操作仅当模式在字符串开头匹配
（无论模式如何）或由可选参数给出的起始位置匹配
（无论其前面是否有换行符）时才会成功。search`'^''^'MULTILINEmatchpos`

现在，说得够多了。是时候看一些示例代码了：

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

解决方案 2：

search⇒ 在字符串的任意位置找到某些内容并返回匹配对象。

match⇒ 在字符串的开头找到一些内容并返回匹配对象。

解决方案 3：

匹配比搜索快得多，因此，如果您处理数百万个样本，您可以执行 regex.match((.?)word(.?)) 而不是 regex.search("word")，并获得巨大的性能。

上述被接受的答案下来自@ivan_bilan 的这条评论让我开始思考这种黑客攻击是否真的可以加快速度，所以让我们来看看你能真正获得多少吨的性能。

我准备了以下测试套件：

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了 10 次测量（1M、2M、...、10M 个字），得到了以下图表：

匹配与搜索正则表达式速度测试线图

如您所见，搜索模式'python'比匹配模式更快'(.*?)python(.*?)'。

Python 很聪明。不要试图变得更聪明。

解决方案 4：

re.search 在整个字符串中搜索模式，而不re.match搜索模式；如果没有，它别无选择，只能在字符串的开头进行匹配。

解决方案 5：

不同之处在于，re.match()它会误导任何习惯于Perl、grep或sed正则表达式匹配的人，而re.search()不会误导任何人。 :-)

更严肃地说，正如 John D. Cook 所说，re.match()“表现得好像每个模式都带有 ^ 前缀。”换句话说，re.match('pattern')等于re.search('^pattern')。因此它锚定了模式的左侧。但它也不锚定模式的右侧：这仍然需要终止符$。

坦白说，鉴于上述情况，我认为re.match()应该弃用。我很想知道应该保留它的理由。

解决方案 6：

您可以参考以下示例来了解re.match并重新搜索的工作原理

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match会回来none，但re.search会回来abc。

解决方案 7：

更简短：

search扫描整个字符串。
match仅扫描字符串的开头。

以下 Ex 说明了这一点：

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

解决方案 8：

re.match 尝试匹配字符串开头的模式。re.search 尝试匹配整个字符串中的模式，直到找到匹配项。

解决方案 9：

快速回答

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

解决方案 10：

re.match位于字符串的开头，而则re.search扫描整个字符串。因此在下面的例子中，x和y匹配相同的内容。

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('Apat', s)    # <--- match at the beginning

如果字符串不包含换行符，A和^本质上是相同的；差异在多行字符串中显示出来。在下面的例子中，re.match永远不会匹配第二行，而re.search可以使用正确的正则表达式（和标志）。

s = "1
2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('A2', s, re.M)    # no match  <--- mimics `re.match`

中还有另一个函数re，re.fullmatch()用于扫描整个字符串，因此它位于字符串的开头和结尾。因此在下面的例子中，x和y匹配z相同的内容。

x = re.match('patZ', s)     # <--- already anchored at the beginning; must match end
y = re.search('ApatZ', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

根据Jeyekomon 的回答（并使用他们的设置），使用 perfplot 库，我绘制了 timeit 测试的结果，以查看：

re.search如果是“模仿者”，他们如何比较re.match？（第一个情节）
re.match如果是“模仿者”，他们如何比较re.search？（第二个情节）
如果将相同的模式传递给它们，它们会如何比较？（最后一个图）

请注意，最后一个模式不会产生相同的输出（因为re.match它固定在字符串的开头。）

表演情节

第一个图显示，如果像一样使用，match速度会更快。第二个图支持@Jeyekomon 的答案，并显示如果像一样使用，速度会更快。最后一个图显示，如果它们扫描相同的模式，两者之间的差异很小。search`matchsearchmatch`search

用于生成性能图的代码。

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();