有没有简单的方法可以删除字符串中的多个空格？-IT科技

摘要：问题描述：假设此字符串：The fox jumped over the log. 变成：The fox jumped over the log. 实现此目的的最简单方法（1-2 行）是什么，无需拆分和进入列表？解决方案 1：>>> import re >>> ...

问题描述：

假设此字符串：

The   fox jumped   over    the log.

变成：

The fox jumped over the log.

实现此目的的最简单方法（1-2 行）是什么，无需拆分和进入列表？

解决方案 1：

>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'

解决方案 2：

foo是你的字符串：

" ".join(foo.split())

但请注意，这会删除“所有空白字符（空格、制表符、换行符、回车符、换页符）”（感谢hhsaffar，请参阅评论）。即，`"this is a test
"实际上最终会变成"this is a test"`。

解决方案 3：

import re
s = "The   fox jumped   over    the log."
re.sub("ss+" , " ", s)

或者

re.sub("ss+", " ", s)

因为逗号前的空格在PEP 8中被列为令人讨厌的东西，正如用户Martin Thoma在评论中提到的那样。

解决方案 4：

使用带有“\s”的正则表达式并执行简单的 string.split() 也会删除其他空格 - 如换行符、回车符、制表符。除非需要这样做，否则只需删除多个空格，我会提供这些示例。

我使用了11 个段落、1000 个单词、6665 字节的 Lorem Ipsum来进行真实的时间测试，并在整个过程中使用了随机长度的额外空格：

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

该单行命令基本上会删除所有前导/尾随空格，并保留前导/尾随空格（但只有一个；-）。

# setup = '''

import re

def while_replace(string):
    while '  ' in string:
        string = string.replace('  ', ' ')

    return string

def re_replace(string):
    return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
    split_string = string.split(' ')

    # To account for leading/trailing spaces that would simply be removed
    beg = ' ' if not split_string[ 0] else ''
    end = ' ' if not split_string[-1] else ''

    # versus simply ' '.join(item for item in string.split(' ') if item)
    return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

注意： “while版本”复制了original_string，因为我相信一旦在第一次运行中修改，后续运行就会更快（即使只是一点点）。由于这会增加时间，我将这个字符串副本添加到其他两个，以便时间仅显示逻辑上的差异。请记住，主stmtontimeit实例只会执行一次；我最初这样做的方式是，循环while在同一个标签上工作original_string，因此第二次运行，就没有什么可做的了。现在设置的方式是调用一个函数，使用两个不同的标签，这不是问题。我已向assert所有工作者添加了语句，以验证我们每次迭代都会更改某些内容（对于那些可能持怀疑态度的人）。例如，更改为此，它会中断：

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
    original_string = original_string.replace('  ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over
        the log.' # trivial

Python 2.7.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
     re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
    proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
     re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
    proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
     re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
    proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
     re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
    proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
     re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
    proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193    

Python 2.7.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
     re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
    proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866    

Python 3.2.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
     re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
    proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053    

Python 3.3.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
     re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
    proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318

对于简单的字符串，while 循环似乎是最快的，其次是 Python 的字符串拆分/连接，正则表达式排在最后。

对于非平凡字符串，似乎还有更多需要考虑的地方。32 位 2.7？正则表达式可以解决！2.7 64 位？循环while是最好的，而且优势明显。32 位 3.2，使用“适当”的join。64 位 3.3，使用循环while。再来一次。

最后，人们可以在需要的时候/需要的地方提高绩效，但最好始终记住以下口头禅：

让它发挥作用
做对事
加快速度

我不是律师，YMMV，买者自慎！

解决方案 5：

我必须同意 Paul McGuire 的评论。对我来说，

' '.join(the_string.split())

比直接使用正则表达式要好得多。

我的测量结果（Linux 和 Python 2.5）表明，拆分然后合并比执行“re.sub(...)”快近五倍，如果预编译一次正则表达式并多次执行该操作，速度仍是三倍。而且，无论从哪方面来看，它都更容易理解——更加Pythonic。

解决方案 6：

与之前的解决方案类似，但更具体：用一个空格替换两个或多个空格：

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('s{2,}', ' ', s)
'The fox jumped over the log.'

解决方案 7：

我已经尝试了下面的方法，它甚至适用于以下极端情况：

str1='          I   live    on    earth           '

' '.join(str1.split())

但如果你更喜欢使用正则表达式，可以这样做：

re.sub('s+', ' ', str1)

尽管为了删除尾随和结尾的空格必须进行一些预处理。

解决方案 8：

import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
  # trims all white spaces
print('Remove all space:',re.sub(r"s+", "", Text), sep='') 
# trims left space
print('Remove leading space:', re.sub(r"^s+", "", Text), sep='') 
# trims right space
print('Remove trailing spaces:', re.sub(r"s+$", "", Text), sep='')  
# trims both
print('Remove leading and trailing spaces:', re.sub(r"^s+|s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='')

结果：如代码所示

"Remove all space:Youcanselectbelowtrimsforremovingwhitespace!!BRAliakbar"
"Remove leading space:You can select below trims for removing white space!!   BR Aliakbar"     
"Remove trailing spaces: You can select below trims for removing white space!!   BR Aliakbar"
"Remove leading and trailing spaces:You can select below trims for removing white space!!   BR Aliakbar"
"Remove more than one space: You can select below trims for removing white space!! BR Aliakbar"

解决方案 9：

一个简单的灵魂

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('s+',' ', s)
The fox jumped over the log.

解决方案 10：

这个正则表达式可以在 Python 3.11 中发挥作用：

re.sub(r's+', ' ', text)

这个线程接受的答案对我来说在 Mac 上的 Python 3.11 中不起作用：

re.sub(' +', ' ', 'The     quick brown    fox') # does not work for me

解决方案 11：

您还可以在 Pandas DataFrame 中使用字符串拆分技术，而无需使用 .apply(..)，如果您需要对大量字符串快速执行操作，这很有用。以下是一行代码：

df['message'] = (df['message'].str.split()).str.join(' ')

解决方案 12：

Python 开发人员的解决方案：

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))

输出：

`Original string: Python Exercises Are Challenging Exercises
Without extra spaces: Python Exercises Are Challenging Exercises`

解决方案 13：

import re
string = re.sub('[     
]+', ' ', 'The     quick brown                

                         fox')

这将删除所有制表符、换行符和多个空格，只留下单个空格。

解决方案 14：

这个正是你想要的

old_string = 'The   fox jumped   over    the log '
new_string = " ".join(old_string.split())
print(new_string)

将结果

The fox jumped over the log.

解决方案 15：

一行代码即可删除句子前、句子后和句子内的所有多余空格：

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))

解释：

将整个字符串拆分成一个列表。
从列表中过滤空元素。
用一个空格重新连接剩余元素*

*其余元素应为单词或带标点符号的单词等。我没有对此进行广泛的测试，但这应该是一个很好的起点。祝一切顺利！

解决方案 16：

对于用户生成的字符串，您可以获得的最快速度是：

if '  ' in text:
    while '  ' in text:
        text = text.replace('  ', ' ')

短路使它比pythonlarry的综合答案略快。如果您追求效率并严格寻求清除单个空格种类的多余空格，请选择此选项。

解决方案 17：

" ".join(foo.split())就所提问题而言，这并不完全正确，因为它还会完全删除单个前导和/或尾随空格。因此，如果它们也需要用 1 个空格替换，则应执行以下操作：

" ".join(('*' + foo + '*').split()) [1:-1]

当然，它不太优雅。

解决方案 18：

在某些情况下，希望将每个连续出现的空格字符替换为该字符的单个实例。您可以使用带有反向引用的正则表达式来执行此操作。

(s){1,}匹配任何空格字符，后跟该字符的一个或多个匹配项。现在，您需要做的就是指定第一个组 ( ) 作为匹配的替换。

将其包装在函数中：

import re

def normalize_whitespace(string):
    return re.sub(r'(s){1,}', r'', string)

>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line             


Second    line')
'First line     
Second line'

解决方案 19：

另一种选择：

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[     ]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

解决方案 20：

因为@pythonlarry在这里询问缺少基于生成器的版本

groupby 连接很简单。Groupby 将使用相同的键对元素进行连续分组。并返回每个组的键对和元素列表。因此，当键是空格时，将返回空格，否则将返回整个组。

from itertools import groupby
def group_join(string):
  return ''.join(' ' if chr==' ' else ''.join(times) for chr,times in groupby(string))

group by 变体很简单，但速度很慢。现在介绍生成器变体。这里我们使用迭代器和字符串，并生成除字符后面的字符之外的所有字符。

def generator_join_generator(string):
  last=False
  for c in string:
    if c==' ':
      if not last:
        last=True
        yield ' '
    else:
      last=False
    yield c

def generator_join(string):
  return ''.join(generator_join_generator(string))

所以我用一些其他的 lorem ipsum 来测量时间。

while_replace 0.015868543065153062
重新替换 0.22579886706080288
适当的连接 0.40058281796518713
群组加入 5.53206754301209
生成器_连接 1.6673167790286243

其中 Hello 和 World 之间有 64KB 的空格隔开

while_replace 2.991308711003512
重新替换 0.08232860406860709
正确_连接 6.294375243945979
群组加入 2.4320066600339487
生成器_连接 6.329648651066236

别忘了原句

while_replace 0.002160938922315836
重新替换 0.008620491018518806
适当的连接 0.005650000995956361
群组加入 0.028368217987008393
生成器连接 0.009435956948436797

有趣的是，对于几乎空间仅有的字符串组连接来说，时间并没有那么糟糕，总是显示每次运行一千次的七次的中位数。

解决方案 21：

相当令人惊讶的是，没有人发布比所有其他已发布的解决方案快得多的简单函数。如下所示：

def compactSpaces(s):
    os = ""
    for c in s:
        if c != " " or (os and os[-1] != " "):
            os += c 
    return os

解决方案 22：

def unPretty(S):
   # Given a dictionary, JSON, list, float, int, or even a string...
   # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
   return ' '.join(str(S).replace('
', ' ').replace('
', '').split())

解决方案 23：

string = 'This is a             string full of spaces          and taps'
string = string.split(' ')
while '' in string:
    string.remove('')
string = ' '.join(string)
print(string)

结果：

这是一个充满空格和抽头的字符串

解决方案 24：

要删除空格，考虑单词之间的前导、尾随和额外空格，使用：

(?<=s) +|^ +(?=s)| (?= +[
])

第一个or处理前导空格，第二个or处理字符串开头的前导空格，最后一个处理尾随空格。

为了证明使用情况，此链接将为您提供测试。

https://regex101.com/r/meBYli/4

这将与re.split函数一起使用。

解决方案 25：

我还没有读过很多其他的例子，但我刚刚创建了这种用于合并多个连续空格字符的方法。

它不使用任何库，虽然脚本长度相对较长，但实现并不复杂：

def spaceMatcher(command):
    """
    Function defined to consolidate multiple whitespace characters in
    strings to a single space
    """
    # Initiate index to flag if more than one consecutive character
    iteration
    space_match = 0
    space_char = ""
    for char in command:
      if char == " ":
          space_match += 1
          space_char += " "
      elif (char != " ") & (space_match > 1):
          new_command = command.replace(space_char, " ")
          space_match = 0
          space_char = ""
      elif char != " ":
          space_match = 0
          space_char = ""
   return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

解决方案 26：

这确实会并且将会做到::)

# python... 3.x
import operator
...
# line: line of text
return " ".join(filter(lambda a: operator.is_not(a, ""), line.strip().split(" ")))

解决方案 27：

有史以来最简单的解决方案！

a = 'The   fox jumped   over    the log.'
while '  ' in a: a = a.replace('  ', ' ')
print(a)

输出：

The fox jumped over the log.