Add a string prefix to each value in a pandas string column

2025-02-14 09:49:00
admin
原创
48
摘要:问题描述:I would like to prepend a string to the start of each value in a said column of a pandas dataframe. I am currently using:df.ix[(df['col'] != False), '...

问题描述:

I would like to prepend a string to the start of each value in a said column of a pandas dataframe. I am currently using:

df.ix[(df['col'] != False), 'col'] = 'str' + df[(df['col'] != False), 'col']

This seems an inelegant method. Do you know any other way (which maybe also adds the character to rows where that column is 0 or NaN)?

As an example, I would like to turn:

    col
1     a
2     0

into:

       col
1     stra
2     str0

解决方案 1:

df['col'] = 'str' + df['col'].astype(str)

Example:

>>> df = pd.DataFrame({'col':['a',0]})
>>> df
  col
0   a
1   0
>>> df['col'] = 'str' + df['col'].astype(str)
>>> df
    col
0  stra
1  str0

解决方案 2:

As an alternative, you can also use an apply combined with format (or better with f-strings) which I find slightly more readable if one e.g. also wants to add a suffix or manipulate the element itself:

df = pd.DataFrame({'col':['a', 0]})

df['col'] = df['col'].apply(lambda x: "{}{}".format('str', x))

which also yields the desired output:

    col
0  stra
1  str0

If you are using Python 3.6+, you can also use f-strings:

df['col'] = df['col'].apply(lambda x: f"str{x}")

yielding the same output.

The f-string version is almost as fast as @RomanPekar's solution (python 3.6.4):

df = pd.DataFrame({'col':['a', 0]*200000})

%timeit df['col'].apply(lambda x: f"str{x}")
117 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit 'str' + df['col'].astype(str)
112 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using format, however, is indeed far slower:

%timeit df['col'].apply(lambda x: "{}{}".format('str', x))
185 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

解决方案 3:

You can use pandas.Series.map :

df['col'].map('str{}'.format)

In this example, it will apply the word str before all your values.

解决方案 4:

If you load you table file with dtype=str

or convert column type to string df['a'] = df['a'].astype(str)

then you can use such approach:

df['a']= 'col' + df['a'].str[:]

This approach allows prepend, append, and subset string of df.

Works on Pandas v0.23.4, v0.24.1. Don't know about earlier versions.

解决方案 5:

Another solution with .loc:

df = pd.DataFrame({'col': ['a', 0]})
df.loc[df.index, 'col'] = 'string' + df['col'].astype(str)

This is not as quick as solutions above (>1ms per loop slower) but may be useful in case you need conditional change, like:

mask = (df['col'] == 0)
df.loc[mask, 'col'] = 'string' + df['col'].astype(str)

解决方案 6:

Contributing to prefixing columns while controlling NaNs for things like human readable values on csv export.

"_" + df['col1'].replace(np.nan,'').astype(str)

Example:

import sys
import platform
import pandas as pd
import numpy as np

print("python {}".format(platform.python_version(), sys.executable))
print("pandas {}".format(pd.__version__))
print("numpy {}".format(np.__version__))

df = pd.DataFrame({
    'col1':["1a","1b","1c",np.nan],
    'col2':["2a","2b",np.nan,"2d"], 
    'col3':[31,32,33,34],
    'col4':[np.nan,42,43,np.nan]})

df['col1_prefixed'] = "_" + df['col1'].replace(np.nan,'no value').astype(str)
df['col4_prefixed'] = "_" + df['col4'].replace(np.nan,'no value').astype(str)

print(df)
python 3.7.3
pandas 1.2.3
numpy 1.18.5
  col1 col2  col3  col4 col1_prefixed col4_prefixed
0   1a   2a    31   NaN           _1a     _no value
1   1b   2b    32  42.0           _1b         _42.0
2   1c  NaN    33  43.0           _1c         _43.0
3  NaN   2d    34   NaN     _no value     _no value

(Sorry for the verbosity, I found this Q while working on an unrelated column type issue and this is my reproduction code)

解决方案 7:

You can use radd() to element-wise add a string to each value in a column (N.B. make sure to convert the column into a string column using astype() if the column contains mixed types). An example:

df = pd.DataFrame({'col': ['a', 0]})
df['col'] = df['col'].astype('string').radd('str')

which outputs

    col
0  stra
1  str0

It has two advantages over concatenation via +:

  1. Null handling: If the column contains NaN values, + simply returns NaN. For example:

df = pd.DataFrame({'col': ['a', float('nan')]})
df['col'] = 'str' + df['col']

which outputs

    col
0  stra
1   NaN

which forces you to handle the NaN later using fillna() etc.

However, with radd(), you can directly pass fill_value= kwarg to handle the NaN values in one function call. For the above example, we can pass fill_value='' to treat NaN values as an empty string, so that when we add the prefix string, we get a column of strings:

df['col'] = df['col'].radd('str', fill_value='')

which outputs

    col
0  stra
1   str

As a side note, there's a difference between using astype(str) and astype('string'); one important difference is related to null handling; you can read more about that here.

  1. Method chaining: If you were adding prefixes to strings in a column as part of a method in a pipeline, it might be important to be able to do it using method chaining. + forces you to move out of the pipeline whereas radd clearly shows that the prepending prefixes come after a chain of methods. For example, we can do the following:

df = pd.DataFrame({'col': ['a', 0]})
df.reset_index().astype({'col': 'string'}).radd({'index': 0, 'col': 'str'})

which outputs

  index   col
0     0  stra
1     1  str0
相关推荐
  政府信创国产化的10大政策解读一、信创国产化的背景与意义信创国产化,即信息技术应用创新国产化,是当前中国信息技术领域的一个重要发展方向。其核心在于通过自主研发和创新,实现信息技术应用的自主可控,减少对外部技术的依赖,并规避潜在的技术制裁和风险。随着全球信息技术竞争的加剧,以及某些国家对中国在科技领域的打压,信创国产化显...
工程项目管理   1565  
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   1354  
  信创国产芯片作为信息技术创新的核心领域,对于推动国家自主可控生态建设具有至关重要的意义。在全球科技竞争日益激烈的背景下,实现信息技术的自主可控,摆脱对国外技术的依赖,已成为保障国家信息安全和产业可持续发展的关键。国产芯片作为信创产业的基石,其发展水平直接影响着整个信创生态的构建与完善。通过不断提升国产芯片的技术实力、产...
国产信创系统   21  
  信创生态建设旨在实现信息技术领域的自主创新和安全可控,涵盖了从硬件到软件的全产业链。随着数字化转型的加速,信创生态建设的重要性日益凸显,它不仅关乎国家的信息安全,更是推动产业升级和经济高质量发展的关键力量。然而,在推进信创生态建设的过程中,面临着诸多复杂且严峻的挑战,需要深入剖析并寻找切实可行的解决方案。技术创新难题技...
信创操作系统   27  
  信创产业作为国家信息技术创新发展的重要领域,对于保障国家信息安全、推动产业升级具有关键意义。而国产芯片作为信创产业的核心基石,其研发进展备受关注。在信创国产芯片的研发征程中,面临着诸多复杂且艰巨的难点,这些难点犹如一道道关卡,阻碍着国产芯片的快速发展。然而,科研人员和相关企业并未退缩,积极探索并提出了一系列切实可行的解...
国产化替代产品目录   28  
热门文章
项目管理软件有哪些?
云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用