如何用 python 进行 sed 式文本替换?
- 2024-11-07 08:55:00
- admin 原创
- 30
问题描述:
我想在此文件中启用所有 apt 存储库
cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance
## modifications made here will not survive a re-bundle.
## if you wish to make changes you can:
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
## or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d
#
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main
## Major bug fix updates produced after the final release of the
## distribution.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner
deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse
使用 sed 这是一个简单的,sed -i 's/^# deb/deb/' /etc/apt/sources.list
什么是最优雅(“pythonic”)的方式来做到这一点?
解决方案 1:
您可以这样做:
with open("/etc/apt/sources.list", "r") as sources:
lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
for line in lines:
sources.write(re.sub(r'^# deb', 'deb', line))
with 语句确保文件正确关闭,并且在"w"
模式下重新打开文件会在写入之前清空文件。re.sub(pattern, replace, string) 相当于 sed/perl 中的 s/pattern/replace/。
编辑:修复示例中的语法
解决方案 2:
编写一个sed
纯 Python 的、没有外部命令或其他依赖项的自开发替代品是一项艰巨的任务,却充满了危险。谁会想到呢?
尽管如此,这是可行的。这也是可取的。我们都经历过这种情况:“我需要处理一些纯文本文件,但我只有 Python、两根塑料鞋带和一罐发霉的地堡级樱桃酒。救命啊。”
在这个答案中,我们提供了一个一流的解决方案,将之前答案的优点拼凑在一起,而没有所有令人不快的缺点。正如 plundra 指出的那样,David Miller 的顶级答案以非原子方式写入所需文件,因此引发了竞争条件(例如,来自其他线程和/或试图同时读取该文件的进程)。这很糟糕。Plundra 的优秀答案解决了这个问题,但又引入了更多问题——包括许多致命的编码错误、一个严重的安全漏洞(未能保留原始文件的权限和其他元数据),以及过早优化用低级字符索引替换正则表达式。这也很糟糕。
棒极了,团结起来!
import re, shutil, tempfile
def sed_inplace(filename, pattern, repl):
'''
Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
`sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
'''
# For efficiency, precompile the passed regular expression.
pattern_compiled = re.compile(pattern)
# For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
# writing with updating). This is usually a good thing. In this case,
# however, binary writing imposes non-trivial encoding constraints trivially
# resolved by switching to text writing. Let's do that.
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(filename) as src_file:
for line in src_file:
tmp_file.write(pattern_compiled.sub(repl, line))
# Overwrite the original file with the munged temporary file in a
# manner preserving file attributes (e.g., permissions).
shutil.copystat(filename, tmp_file.name)
shutil.move(tmp_file.name, filename)
# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^# deb', 'deb')
解决方案 3:
massedit.py ( http://github.com/elmotec/massedit ) 为您搭建了框架,您只需编写正则表达式即可。它仍处于测试阶段,但我们正在寻求反馈。
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list
将以差异格式显示差异(之前/之后)。
添加 -w 选项将更改写入原始文件:
python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list
或者,您现在可以使用 api:
>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)
解决方案 4:
这是一种非常不同的方法,我不想编辑我的其他答案。嵌套with
,因为我不使用 3.1(Where with A() as a, B() as b:
works)。
更改 sources.list 可能有点过头,但我想将其放在那里以方便将来搜索。
#!/usr/bin/env python
from shutil import move
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(delete=False) as tmp_sources:
with open("sources.list") as sources_file:
for line in sources_file:
if line.startswith("# deb"):
tmp_sources.write(line[2:])
else:
tmp_sources.write(line)
move(tmp_sources.name, sources_file.name)
这应确保不会出现其他人读取文件的竞争条件。哦,当你不用正则表达式时,我更喜欢 str.startswith(...)。
解决方案 5:
如果我想要像sed 这样的东西,那么我通常只需sed
使用sh库来调用它自己。
from sh import sed
sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])
当然,也有缺点。比如,本地安装的版本sed
可能与您测试的版本不同。就我而言,这种事情可以在另一层轻松处理(例如通过事先检查目标环境,或使用已知版本的 sed 在 docker 映像中进行部署)。
解决方案 6:
尝试pysed:
pysed -r '# deb' 'deb' /etc/apt/sources.list
解决方案 7:
你可以做类似的事情:
p = re.compile("^# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()
或者(恕我直言,从组织的角度来看这更好)你可以将你的分成sources.list
几部分(一个条目/一个存储库)并将它们放在/etc/apt/sources.list.d/
解决方案 8:
如果您确实想使用sed
命令而不安装新的 Python 模块,那么您可以简单地执行以下操作:
import subprocess
subprocess.call("sed command")
解决方案 9:
Cecil Curry 的回答很棒,但是他的回答只适用于多行正则表达式。多行正则表达式很少使用,但有时很方便。
这是对他的 sed_inplace 函数的改进,如果需要,它可以与多行正则表达式一起运行。
警告:在多行模式下,它将读取整个文件,然后执行正则表达式替换,因此您只需要在较小的文件上使用此模式 - 在多行模式下运行时,不要尝试在千兆字节大小的文件上运行它。
import re, shutil, tempfile
def sed_inplace(filename, pattern, repl, multiline = False):
'''
Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
`sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
'''
re_flags = 0
if multiline:
re_flags = re.M
# For efficiency, precompile the passed regular expression.
pattern_compiled = re.compile(pattern, re_flags)
# For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
# writing with updating). This is usually a good thing. In this case,
# however, binary writing imposes non-trivial encoding constraints trivially
# resolved by switching to text writing. Let's do that.
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(filename) as src_file:
if multiline:
content = src_file.read()
tmp_file.write(pattern_compiled.sub(repl, content))
else:
for line in src_file:
tmp_file.write(pattern_compiled.sub(repl, line))
# Overwrite the original file with the munged temporary file in a
# manner preserving file attributes (e.g., permissions).
shutil.copystat(filename, tmp_file.name)
shutil.move(tmp_file.name, filename)
from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^([user]$
[ ]*name = ).*$(
[ ]*email = ).*', r'John Doejdoe@example.com', multiline=True)
解决方案 10:
不确定是否优雅,但至少应该相当易读。对于 sources.list,事先读取所有行是可以的,对于更大的内容,您可能需要在循环时“就地”更改。
#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
# Read all the lines
lines = sources_file.readlines()
# Rewind and truncate
sources_file.seek(0)
sources_file.truncate()
# Loop through the lines, adding them back to the file.
for line in lines:
if line.startswith("# deb"):
sources_file.write(line[2:])
else:
sources_file.write(line)
编辑:使用with
-statement 可以更好地处理文件。还忘记在截断之前倒带。
解决方案 11:
这是一个单模块的 Python 替代品perl -p
:
# Provide compatibility with `perl -p`
# Usage:
#
# python -mloop_over_stdin_lines '<program>'
# In, `<program>`, use the variable `line` to read and change the current line.
# Example:
#
# python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'
# From the perlrun documentation:
#
# -p causes Perl to assume the following loop around your
# program, which makes it iterate over filename arguments
# somewhat like sed:
#
# LINE:
# while (<>) {
# ... # your program goes here
# } continue {
# print or die "-p destination: $!
";
# }
#
# If a file named by an argument cannot be opened for some
# reason, Perl warns you about it, and moves on to the next
# file. Note that the lines are printed automatically. An
# error occurring during printing is treated as fatal. To
# suppress printing use the -n switch. A -p overrides a -n
# switch.
#
# "BEGIN" and "END" blocks may be used to capture control
# before or after the implicit loop, just as in awk.
#
import re
import sys
for line in sys.stdin:
exec(sys.argv[1], globals(), locals())
try:
print line,
except:
sys.exit('-p destination: $!
')
解决方案 12:
我希望能够查找和替换文本,同时在插入的内容中包含匹配的组。我编写了这个简短的脚本来实现这一点:
https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17
它的关键部分是这样的:
print(re.sub(pattern, template, text).rstrip("
"))
以下是其工作原理的一个例子:
# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (d+))"
# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"
# The text to operate on
text = "cat 976 is my favorite"
调用上述函数可得到以下结果:
turtle 976 is my favorite
解决方案 13:
[以上所有答案均无效!]
我有一个大约 1000 行的文件中多个键值替换的情况。替换后文件结构应保持不变。例如:
key1=value_tobe_replaced1
key2=value_tobe_replaced1
. .
. .
key1000=value_tobe_replaced1000
我尝试过:
来自@elmotec 为 massedit 投票的答案。
来自@Cecil Curry 的回答。
来自@Keithel 的回答。
这三个答案确实对我有很大帮助,但经过测试,我发现第一和第二个答案的成本接近 40-50 秒。第三个不适合多次替换,所以我修复了它。
注意:继续之前请先参考答案。
这是我的代码:
线路更换方式:
start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(abs_keypair_file) as kf:
for line in kf:
line_to_write = ''
match_flag = False
for (key, value) in tuple_list:
# print ' %s = %r' % (key, value)
if not re.search(patten, line, flags=re.I):
continue
line_to_write = re.sub(r'$({})'.format(key), value, line, flags=re.I)
match_flag = True
if not match_flag:
line_to_write = line
tmp_file.write(line_to_write)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)
time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:42.533879
文件替换模式:
start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
with open(abs_keypair_file) as kf:
text = kf.read()
for (key, value) in tuple_list:
text = re.sub(patten, value, text, flags=re.M|re.I)
tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)
time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:00.348458
因此,我建议如果您符合我的情况并且您的文件大小不是太大,您可以遵循file replacement mode
。
如果文件很大,该如何替换?我不知道。
希望这有帮助。
解决方案 14:
cat test.txt | python -c "import sys,re;[sys.stdout.write(re.sub('S', 's', line)) for line in sys.stdin]"
- 2024年20款好用的项目管理软件推荐,项目管理提效的20个工具和技巧
- 2024年开源项目管理软件有哪些?推荐5款好用的项目管理工具
- 项目管理软件有哪些?推荐7款超好用的项目管理工具
- 项目管理软件哪个最好用?盘点推荐5款好用的项目管理工具
- 项目管理软件有哪些最好用?推荐6款好用的项目管理工具
- 项目管理软件有哪些,盘点推荐国内外超好用的7款项目管理工具
- 2024项目管理软件排行榜(10类常用的项目管理工具全推荐)
- 项目管理软件排行榜:2024年项目经理必备5款开源项目管理软件汇总
- 2024年常用的项目管理软件有哪些?推荐这10款国内外好用的项目管理工具
- 项目管理必备:盘点2024年13款好用的项目管理软件