字符串格式化：% vs. .format vs. f-string 文字-IT科技

摘要：问题描述：有多种字符串格式化方法：Python<2.6："Hello %s" % namePython 2.6+:（"Hello {}".format(name) 使用str.format）Python 3.6+:（f"{name}" ...

问题描述：

有多种字符串格式化方法：

Python<2.6："Hello %s" % name
Python 2.6+:（"Hello {}".format(name) 使用str.format）
Python 3.6+:（f"{name}" 使用 f 字符串）

哪一个更好？适合什么情况？

以下几种方法的结果都是一样的，那么有什么区别呢？

name = "Alice"

"Hello %s" % name
"Hello {0}".format(name)
f"Hello {name}"

# Using named arguments:
"Hello %(kwarg)s" % {'kwarg': name}
"Hello {kwarg}".format(kwarg=name)
f"Hello {name}"

字符串格式化何时运行？如何避免运行时性能损失？

如果您试图关闭一个重复的问题，而该问题只是在寻找一种格式化字符串的方法，请使用如何将变量的值放入字符串中？。

解决方案 1：

回答你的第一个问题....format在很多方面似乎都更复杂。令人讨厌的%是它既可以接受变量也可以接受元组。你会认为下面的方法总是有效的：

"Hello %s" % name

然而，如果name恰好是(1, 2, 3)，它将抛出TypeError。为了保证它总是打印，你需要做

"Hello %s" % (name,)   # supply the single argument as a single-item tuple

这太丑了。.format没有这些问题。此外，在您给出的第二个示例中，示例.format看起来更简洁。

仅将其用于与 Python 2.5 的向后兼容。

回答第二个问题，字符串格式化与任何其他操作同时发生 - 当评估字符串格式化表达式时。 Python 不是一种懒惰的语言，它会在调用函数之前评估表达式，因此表达式log.debug("some debug info: %s" % some_info)将首先评估字符串，例如"some debug info: roflcopters are active"，然后将该字符串传递给log.debug()。

解决方案 2：

据我所知，模运算符（％）无法做到这一点：

tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

结果

12 22222 45 22222 103 22222 6 22222

非常有用。

另一点：format()作为一个函数，可以用作其他函数的参数：

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)   

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8,  minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '
'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

结果：

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

解决方案 3：

假设您正在使用 Python 的logging模块，您可以将字符串格式化参数作为参数传递给.debug()方法，而不是自己进行格式化：

log.debug("some debug info: %s", some_info)

这样可以避免进行格式化，除非记录器实际记录了某些内容。

解决方案 4：

从 Python 3.6 (2016) 开始，您可以使用f 字符串来替换变量：

>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

注意f"前缀。如果你在 Python 3.5 或更早版本中尝试此操作，你将得到一个SyntaxError。

请参阅https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings

解决方案 5：

PEP 3101%建议用 Python 3 中的新、高级字符串格式替换该运算符，并将其作为默认设置。

解决方案 6：

但请小心，刚才我尝试%用.format现有代码替换所有内容时发现了一个问题：'{}'.format(unicode_string)将尝试对 unicode_string 进行编码，并且可能会失败。

只需查看这个 Python 交互式会话日志：

Python 2.7.2 (default, Aug 27 2012, 19:52:55) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'xd0xb9'
; u
u'/u0439'

s只是一个字符串（在 Python3 中称为‘字节数组’）并且u是一个 Unicode 字符串（在 Python3 中称为‘字符串’）：

; '%s' % s
'xd0xb9'
; '%s' % u
u'/u0439'

当你将 Unicode 对象作为参数传递给%运算符时，即使原始字符串不是 Unicode，它也会产生一个 Unicode 字符串：

; '{}'.format(s)
'xd0xb9'
; '{}'.format(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'/u0439' in position 0: ordinal not in range(256)

但该.format函数将引发“UnicodeEncodeError”：

; u'{}'.format(s)
u'xd0xb9'
; u'{}'.format(u)
u'/u0439'

并且仅当原始字符串是 Unicode 时，它才能正确使用 Unicode 参数。

; '{}'.format(u'i')
'i'

或者如果参数字符串可以转换为字符串（所谓的“字节数组”）

解决方案 7：

%`format`比我的测试结果更好。

测试代码：

Python 2.7.2：

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

结果：

> format: 0.470329046249
> %: 0.357107877731

Python 3.5.2

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

结果

> format: 0.5864730989560485
> %: 0.013593495357781649

在 Python2 中，差异很小，而在 Python3 中，%速度要快得多format。

感谢@Chris Cogdon 提供的示例代码。

编辑1：

2019 年 7 月在 Python 3.7.2 中再次测试。

结果：

> format: 0.86600608
> %: 0.630180146

差别不大。我猜 Python 正在逐渐进步。

编辑2：

在有人在评论中提到python 3的f字符串后，我在python 3.7.2下对以下代码进行了测试：

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))

结果：

format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943

看来 f-string 仍然比慢%，但是比更好format。

解决方案 8：

还有另一个优点.format（我在答案中没有看到）：它可以采用对象属性。

In [12]: class A(object):
   ....:     def __init__(self, x, y):
   ....:         self.x = x
   ....:         self.y = y
   ....:         

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

或者，作为关键字参数：

In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'

%据我所知这是不可能的。

解决方案 9：

正如我今天发现的，通过格式化字符串的旧方法%并不支持DecimalPython 的十进制定点和浮点运算模块。

示例（使用 Python 3.3.5）：

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

输出：

0.00000000000000000000000312375239000000009907464850 0.00000000000000000000000312375239000000000000000000

当然可能有解决方法，但您仍然可以考虑format()立即使用该方法。

解决方案 10：

如果你的 python >= 3.6，F 字符串格式的文字就是你的新朋友。

它更简单，更干净，性能更好。

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

解决方案 11：

附注：使用新样式的日志格式无需牺牲性能。您可以将任何对象传递给logging.debug、logging.info等，只要它们实现了__str__魔术方法。当日志模块决定必须发出您的消息对象（无论它是什么）时，它会str(message_object)在发出消息对象之前调用。因此，您可以这样做：

import logging


class NewStyleLogMessage(object):
    def __init__(self, message, *args, **kwargs):
        self.message = message
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        args = (i() if callable(i) else i for i in self.args)
        kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

        return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))


def expensive_func():
    # Do something that takes a long time...
    return 'foo'

# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

Python 3 文档 ( https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles ) 中对此进行了描述。不过，它也可以用于 Python 2.6 ( https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages )。

使用此技术的优点之一是，除了格式样式无关之外，它还允许使用惰性值，例如expensive_func上面的函数。这为此处 Python 文档中给出的建议提供了一种更优雅的替代方案：https ://docs.python.org/2.6/library/logging.html#optimization 。

解决方案 12：

格式化正则表达式时可能会有所%帮助。例如，

'{type_names} [a-z]{2}'.format(type_names='triangle|square')

加注IndexError。在这种情况下，您可以使用：

'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}

这样可以避免将正则表达式写成'{type_names} [a-z]{{2}}'。当您有两个正则表达式时，这会很有用，其中一个单独使用而没有格式，但两者的连接都是格式化的。

解决方案 13：

我想补充一点，从 3.6 版开始，我们可以像下面这样使用 fstring

foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")

这给予

我的名字是约翰·史密斯

所有内容都转换为字符串

mylist = ["foo", "bar"]
print(f"mylist = {mylist}")

结果：

mylist = ['foo'，'bar']

您可以传递函数，就像其他格式的方法一样

print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

例如给予

您好，日期为：2018 年 4 月 16 日

解决方案 14：

Python 3.6.7比较：

#!/usr/bin/env python
import timeit

def time_it(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{0:.10f} seconds".format(t1 - t0))
    return wrapper


@time_it
def new_new_format(s):
    print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")


@time_it
def new_format(s):
    print("new_format:", "{0} {1} {2} {3} {4}".format(*s))


@time_it
def old_format(s):
    print("old_format:", "%s %s %s %s %s" % s)


def main():
    samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),) 
    for s in samples:
        new_new_format(s)
        new_format(s)
        old_format(s)
        print("-----")


if __name__ == '__main__':
    main()

输出：

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

解决方案 15：

对于 Python 版本 >= 3.6（请参阅PEP 498）

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha      beta'

解决方案 16：

但有一件事是，如果你有嵌套的花括号，它将无法用于格式，但%可以起作用。

例子：

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>>