不区分大小写的正则表达式不需要重新编译？-IT科技

不区分大小写的正则表达式不需要重新编译？

2025-01-13 08:53:00

admin

原创

107

摘要：问题描述：在 Python 中，我可以使用以下命令编译不区分大小写的正则表达式re.compile：>>> s = 'TeSt' >>> casesensitive = re.compile('test') >>> ignorecase = re.comp...

问题描述：

在 Python 中，我可以使用以下命令编译不区分大小写的正则表达式re.compile：

>>> s = 'TeSt'
>>> casesensitive = re.compile('test')
>>> ignorecase = re.compile('test', re.IGNORECASE)
>>> 
>>> print casesensitive.match(s)
None
>>> print ignorecase.match(s)
<_sre.SRE_Match object at 0x02F0B608>

有没有办法做同样的事情，但不使用。我在文档中re.compile找不到类似 Perli后缀（例如）的东西。m/test/i

解决方案 1：

传递re.IGNORECASE给、或的flags参数：search`match`sub

re.search('test', 'TeSt', re.IGNORECASE)
re.match('test', 'TeSt', re.IGNORECASE)
re.sub('test', 'xxxx', 'Testing', flags=re.IGNORECASE)

解决方案 2：

您还可以使用不带 IGNORECASE 标志的搜索/匹配执行不区分大小写的搜索（在 Python 2.7.3 中测试）：

re.search(r'(?i)test', 'TeSt').group()    ## returns 'TeSt'
re.match(r'(?i)test', 'TeSt').group()     ## returns 'TeSt'

解决方案 3：

不区分大小写的标记(?i)可以直接合并到正则表达式模式中：

>>> import re
>>> s = 'This is one Test, another TEST, and another test.'
>>> re.findall('(?i)test', s)
['Test', 'TEST', 'test']

解决方案 4：

您还可以在模式编译期间定义不区分大小写：

pattern = re.compile('FIle:/+(.*)', re.IGNORECASE)

解决方案 5：

进口

import re

在运行时处理中：

RE_TEST = r'test'
if re.match(RE_TEST, 'TeSt', re.IGNORECASE):

需要注意的是，不使用re.compile是很浪费的。每次调用上面的 match 方法时，正则表达式都会被编译。这也是其他编程语言中的错误做法。下面是更好的做法。

在应用程序初始化中：

self.RE_TEST = re.compile('test', re.IGNORECASE)

在运行时处理中：

if self.RE_TEST.match('TeSt'):

解决方案 6：

要执行不区分大小写的操作，请提供 re.IGNORECASE

>>> import re
>>> test = 'UPPER TEXT, lower text, Mixed Text'
>>> re.findall('text', test, flags=re.IGNORECASE)
['TEXT', 'text', 'Text']

如果我们想替换与大小写匹配的文本……

>>> def matchcase(word):
        def replace(m):
            text = m.group()
            if text.isupper():
                return word.upper()
            elif text.islower():
                return word.lower()
            elif text[0].isupper():
                return word.capitalize()
            else:
                return word
        return replace

>>> re.sub('text', matchcase('word'), test, flags=re.IGNORECASE)
'UPPER WORD, lower word, Mixed Word'

解决方案 7：

对于不区分大小写的正则表达式（Regex）：有两种方法可以在代码中添加：

flags=re.IGNORECASE

Regx3GList = re.search("(WCDMA:)((d*)(,?))*", txt, re.IGNORECASE)

不区分大小写的标记(?i)

Regx3GList = re.search("**(?i)**(WCDMA:)((d*)(,?))*", txt)

解决方案 8：

#'re.IGNORECASE' for case insensitive results short form re.I
#'re.match' returns the first match located from the start of the string. 
#'re.search' returns location of the where the match is found 
#'re.compile' creates a regex object that can be used for multiple matches

 >>> s = r'TeSt'   
 >>> print (re.match(s, r'test123', re.I))
 <_sre.SRE_Match object; span=(0, 4), match='test'>
 # OR
 >>> pattern = re.compile(s, re.I)
 >>> print(pattern.match(r'test123'))
 <_sre.SRE_Match object; span=(0, 4), match='test'>

解决方案 9：

如果你想替换但仍保留以前的 str 的样式，这是可能的。

例如：突出显示字符串“test asdasd TEST asd tEst asdasd”。

sentence = "test asdasd TEST asd tEst asdasd"
result = re.sub(
  '(test)', 
  r'<b></b>',  #  here indicates first matching group.
  sentence, 
  flags=re.IGNORECASE)

测试asdasd测试asd测试asdasd

解决方案 10：

我建议使用(?i:string_region_to_ignore_case)而不是(?i)。这种方法可以以更挑剔但更清晰的方式处理区分大小写的问题。例如：

rex = re.findall (r'J(?i:ohn) S(?i:mith)',
      "John smith ; JOHN SMITH; john Smith; John Smith")
#Result:
['JOHN SMITH', 'John Smith']

解决方案 11：

(?i)使用以下有效标志匹配模式的其余部分：i 修饰符：不区分大小写匹配（忽略 [a-zA-Z] 的大小写）

>>> import pandas as pd
>>> s = pd.DataFrame({ 'a': ["TeSt"] })
>>> r = s.replace(to_replace=r'(?i)test', value=r'TEST', regex=True)
>>> print(r)
      a
0  TEST