每第 n 个字符拆分字符串
- 2024-11-25 08:50:00
- admin 原创
- 163
问题描述:
如何按第 n 个字符分割一个字符串?
'1234567890' → ['12', '34', '56', '78', '90']
对于带有列表的相同问题,请参阅如何将列表拆分成大小相同的块?。
解决方案 1:
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']
解决方案 2:
为了完整起见,您可以使用正则表达式来执行此操作:
>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']
对于奇数个字符,您可以这样做:
>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']
您还可以执行以下操作,以简化更长块的正则表达式:
>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']
re.finditer
如果字符串很长,您可以使用它来逐块生成。
解决方案 3:
Python 中已经有一个内置函数用于实现这一点。
>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']
这是文档字符串wrap
所说的内容:
>>> help(wrap)
'''
Help on function wrap in module textwrap:
wrap(text, width=70, **kwargs)
Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
'''
解决方案 4:
将元素分组为 n 长度组的另一种常见方法:
>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']
此方法直接来自于 的文档zip()
。
解决方案 5:
我认为这比 itertools 版本更短且更易读:
def split_by_n(seq, n):
'''A generator to divide a sequence into chunks of n units.'''
while seq:
yield seq[:n]
seq = seq[n:]
print(list(split_by_n('1234567890', 2)))
解决方案 6:
使用PyPI 中的more-itertools:
>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']
解决方案 7:
我喜欢这个解决方案:
s = '1234567890'
o = []
while s:
o.append(s[:2])
s = s[2:]
解决方案 8:
您可以使用grouper()
以下菜谱itertools
:
Python 2.x:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Python 3.x:
from itertools import zip_longest
def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')
这些函数内存效率高,并且适用于任何可迭代对象。
解决方案 9:
这可以通过一个简单的 for 循环来实现。
a = '1234567890a'
result = []
for i in range(0, len(a), 2):
result.append(a[i : i + 2])
print(result)
输出看起来像 ['12', '34', '56', '78', '90', 'a']
解决方案 10:
我陷入了同样的境地。
这对我有用:
x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
my_list.append(x[i:i+n])
print(my_list)
输出:
['12', '34', '56', '78', '90']
解决方案 11:
尝试一下:
s = '1234567890'
print([s[idx:idx+2] for idx in range(len(s)) if idx % 2 == 0])
输出:
['12', '34', '56', '78', '90']
解决方案 12:
尝试以下代码:
from itertools import islice
def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))
s = '1234567890'
print list(split_every(2, list(s)))
解决方案 13:
与往常一样,对于那些喜欢单行的人来说:
n = 2
line = "this is a line split into n characters"
line = [line[i * n:i * n+n] for i, blah in enumerate(line[::n])]
解决方案 14:
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
解决方案 15:
短字符串的简单递归解决方案:
def split(s, n):
if len(s) < n:
return []
else:
return [s[:n]] + split(s[n:], n)
print(split('1234567890', 2))
或者采用以下形式:
def split(s, n):
if len(s) < n:
return []
elif len(s) == n:
return [s]
else:
return split(s[:n], n) + split(s[n:], n)
,更明确地说明了递归方法中典型的分而治之模式(尽管实际上没有必要这样做)
解决方案 16:
more_itertools.sliced
之前已经提到过。下面是库中的另外四个选项more_itertools
:
s = "1234567890"
["".join(c) for c in mit.grouper(2, s)]
["".join(c) for c in mit.chunked(s, 2)]
["".join(c) for c in mit.windowed(s, 2, step=2)]
["".join(c) for c in mit.split_after(s, lambda x: int(x) % 2 == 0)]
后面每个选项都会产生以下输出:
['12', '34', '56', '78', '90']
讨论选项的文档:grouper
,,,chunked
`windowed`split_after
解决方案 17:
解决方案如下groupby
:
from itertools import groupby, chain, repeat, cycle
text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)
输出:
['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']
解决方案 18:
另一种解决方案是使用groupby
和index//n
作为对字母进行分组的键:
from itertools import groupby
text = "abcdefghij"
n = 3
result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
result.append("".join(chunk))
# result = ['abc', 'def', 'ghi', 'j']
解决方案 19:
从 Python 3.12 开始,该itertools
库现在包含迭代器batched()
。
>>> from itertools import batched
>>> s = '1234567890'
>>> [''.join(batch) for batch in batched(s, 2)]
['12', '34', '56', '78', '90']
解决方案 20:
这些答案都很好,而且可行,但是语法太神秘了...为什么不写一个简单的函数呢?
def SplitEvery(string, length):
if len(string) <= length: return [string]
sections = len(string) / length
lines = []
start = 0;
for i in range(sections):
line = string[start:start+length]
lines.append(line)
start += length
return lines
简单来说就是:
text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)
# output: ['12', '34', '56', '78', '90']
解决方案 21:
您可以在 Github 上找到包含更新解决方案的完整文章。
注意:解决方案是针对 Python3.10+ 编写的
使用列表推导和切片:这是一种简单直接的方法,我们可以使用 Python 的切片功能将字符串拆分为 n 个字符的块。我们可以使用列表推导以步长 n 迭代字符串,并将字符串从当前索引切片到当前索引加 n。
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters.
This function uses list comprehension and slicing to split the string into groups.
It includes error handling to check if `n` is a positive integer.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings, where each string is a group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use list comprehension and slicing to split the string into groups of `n` characters.
return [s[i:i + n] for i in range(0, len(s), n)]
使用 re (regex) 模块:Python 的 re 模块提供了一个名为 findall() 的函数,可用于查找字符串中某个模式的所有出现位置。我们可以将此函数与匹配任意 n 个字符的正则表达式一起使用,以将字符串拆分为 n 个字符的块。
import re
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters.
This function uses the `re.findall()` function from the `re` (regex) module to solve the problem.
It includes error handling to check if `n` is a positive integer.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings, where each string is a group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use `re.findall()` to split the string into groups of `n` characters.
return re.findall(f'.{{1,{n}}}', s)
使用 textwrap 模块:Python 中的 textwrap 模块提供了一个名为 wrap() 的函数,该函数可用于将字符串拆分为指定宽度的输出行列表。我们可以使用此函数将字符串拆分为 n 个字符的块。
import textwrap
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters.
This function uses the `textwrap.wrap()` function from the `textwrap` module to solve the problem.
It includes error handling to check if `n` is a positive integer.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings, where each string is a group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use `textwrap.wrap()` to split the string into groups of `n` characters.
return textwrap.wrap(s, n)
使用循环和字符串连接:我们还可以通过手动循环字符串并一次将 n 个字符连接到新字符串来解决此问题。一旦我们有 n 个字符,我们就可以将新字符串添加到列表中,并将新字符串重置为空字符串。
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters.
This function uses a loop and string concatenation to solve the problem.
It includes error handling to check if `n` is a positive integer.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings, where each string is a group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Initialize an empty list to store the groups.
result = []
# Initialize an empty string to store the current group.
group = ''
# Iterate over each character in the string.
for c in s:
group += c # Add the current character to the current group.
# If the current group has `n` characters, add it to the result and reset the group.
if len(group) == n:
result.append(group)
group = ''
# If there are any remaining characters in the group, add it to the result.
if group:
result.append(group)
return result
使用生成器函数:我们可以创建一个生成器函数,该函数以字符串和数字 n 作为输入,并从字符串中生成 n 个字符的块。这种方法节省内存,因为它不需要一次将所有块存储在内存中。
from typing import Generator
def split_string_into_groups(string: str, n: int) -> Generator[str, None, None]:
"""
Generator function to split a string into groups of `n` consecutive characters.
Args:
string (str): The input string to be split.
n (int): The size of the groups.
Yields:
str: The next group of `n` characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> list(split_string_into_groups("HelloWorld", 3))
['Hel', 'loW', 'orl', 'd']
>>> list(split_string_into_groups("Python", 2))
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Iterate over the string with a step size of `n`.
for i in range(0, len(string), n):
# Yield the next group of `n` characters.
yield string[i:i + n]
使用 itertools:Python 中的 itertools 模块提供了一个名为 islice() 的函数,可用于对可迭代对象进行切片。我们可以使用此函数将字符串拆分为 n 个字符的块。
from itertools import islice
from typing import Iterator
def split_string_into_groups(s: str, n: int) -> Iterator[str]:
"""
Splits a string into groups of `n` consecutive characters using itertools.islice().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
Iterator[str]: An iterator that yields each group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> list(split_string_into_groups("HelloWorld", 3))
['Hel', 'loW', 'orl', 'd']
>>> list(split_string_into_groups("Python", 2))
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Create an iterator from the string.
it = iter(s)
# Use itertools.islice() to yield groups of `n` characters from the iterator.
while True:
group = ''.join(islice(it, n))
if not group:
break
yield group
使用 numpy:我们也可以使用 numpy 库来解决这个问题。我们可以将字符串转换为 numpy 数组,然后使用 reshape() 函数将数组拆分为 n 个字符的块。
import numpy as np
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using numpy.reshape().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Convert the string to a list of characters
chars = list(s)
# Add extra empty strings only if the length of `s` is not a multiple of `n`
if len(s) % n != 0:
chars += [''] * (n - len(s) % n)
# Reshape the array into a 2D array with the number of groups as the number of rows and n as the number of columns
arr = np.array(chars).reshape(-1, n)
# Convert each row of the 2D array back to a string and add it to the result list
result = [''.join(row).rstrip() for row in arr]
return result
使用 pandas:Python 中的 pandas 库提供了一个名为 groupby() 的函数,可用于将数组拆分为多个 bin。我们可以使用此函数将字符串拆分为 n 个字符的块。
import pandas as pd
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a given string into groups of `n` consecutive characters.
This function uses the pandas library to convert the string into a pandas Series,
then uses the groupby method to group the characters into groups of `n` characters.
The groups are then converted back to a list of strings.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings, where each string is a group of `n` consecutive characters from the input string.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Convert the string to a pandas Series
s = pd.Series(list(s))
# Use pandas groupby to group the characters
# The index of each character is divided by `n` using integer division,
# which groups the characters into groups of `n` characters.
groups = s.groupby(s.index // n).agg(''.join)
# Convert the result back to a list and return it
return groups.tolist()
使用 more_itertools:more_itertools 库提供了一个名为 chunked() 的函数,可用于将可迭代对象拆分为指定大小的块。我们可以使用此函数将字符串拆分为 n 个字符的块。
import more_itertools
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using more_itertools.chunked().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use more_itertools.chunked() to split the string into chunks of `n` characters.
chunks = more_itertools.chunked(s, n)
# Convert each chunk to a string and add it to the result list.
result = [''.join(chunk) for chunk in chunks]
return result
使用 toolz:toolz 库提供了一个名为partition_all()的函数,该函数可用于将可迭代对象拆分为指定大小的块。我们可以使用此函数将字符串拆分为 n 个字符的块。
import toolz
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using toolz.partition_all().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use toolz.partition_all() to split the string into chunks of `n` characters.
chunks = toolz.partition_all(n, s)
# Convert each chunk to a string and add it to the result list.
result = [''.join(chunk) for chunk in chunks]
return result
使用 cytoolz: cytoolz 库提供了一个名为partition_all()的函数,该函数可用于将可迭代对象拆分为指定大小的块。我们可以使用此函数将字符串拆分为n个字符的块。
from cytoolz import partition_all
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using cytoolz.partition_all().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use cytoolz.partition_all() to split the string into chunks of `n` characters.
chunks = partition_all(n, s)
# Convert each chunk to a string and add it to the result list.
result = [''.join(chunk) for chunk in chunks]
return result
使用 itertools:itertools 库提供了一个名为 zip_longest 的函数,可用于将可迭代对象拆分为指定大小的块。我们可以使用此函数将字符串拆分为 n 个字符的块。
from itertools import zip_longest
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using itertools.zip_longest().
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
List[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use itertools.zip_longest() to split the string into chunks of `n` characters.
args = [iter(s)] * n
chunks = zip_longest(*args, fillvalue='')
# Convert each chunk to a string and add it to the result list.
result = [''.join(chunk) for chunk in chunks]
return result
使用 list + map + join + zip:我们还可以使用 list 函数、map 函数、join 方法和 zip 函数来解决这个问题。我们可以使用 map 函数以步长 n 迭代字符串,并将字符串从当前索引切片到当前索引加 n。然后,我们可以使用 zip 函数将块组合成元组列表,并使用 join 方法将元组连接成字符串列表。
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using list, map, join, and zip.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Use list, map, join, and zip to split the string into chunks of `n` characters.
result = [''.join(chunk) for chunk in zip(*[iter(s)] * n)]
# If the string length is not a multiple of `n`, add the remaining characters to the result.
remainder = len(s) % n
if remainder != 0:
result.append(s[-remainder:])
return result
使用递归和切片:我们也可以使用递归和切片来解决这个问题。我们可以定义一个递归函数,该函数以字符串和数字 n 作为输入,并返回一个由 n 个字符组成的块列表。该函数可以将字符串切成 n 个字符的块,并使用剩余的字符串递归调用自身,直到字符串为空。
def split_string_into_groups(s: str, n: int) -> list[str]:
"""
Splits a string into groups of `n` consecutive characters using recursion with slicing.
Args:
s (str): The input string to be split.
n (int): The size of the groups.
Returns:
list[str]: A list of strings where each string is a group of `n` consecutive characters.
Raises:
ValueError: If `n` is not a positive integer.
Examples:
>>> split_string_into_groups("HelloWorld", 3)
['Hel', 'loW', 'orl', 'd']
>>> split_string_into_groups("Python", 2)
['Py', 'th', 'on']
"""
# Check if `n` is a positive integer.
if n <= 0:
raise ValueError("The group size must be a positive integer")
# Base case: if the length of the string is less than or equal to `n`, return a list containing `s`.
if len(s) <= n:
return [s]
# Recursive case: split the string into two parts and recursively call `split_string_into_groups` on the rest of the string.
return [s[:n]] + split_string_into_groups(s[n:], n)