pandas 使用 startswith 从 Dataframe 中选择-IT科技

pandas 使用 startswith 从 Dataframe 中选择

2025-02-13 08:36:00

admin

原创

摘要：问题描述：这有效（使用 Pandas 12 dev）table2=table[table['SUBDIVISION'] =='INVERNESS'] 然后我意识到我需要使用“以...开头”来选择字段，因为我漏掉了一堆。因此，我按照 Pandas 文档尽可能地尝试了criteria = table['SUBDI...

问题描述：

这有效（使用 Pandas 12 dev）

table2=table[table['SUBDIVISION'] =='INVERNESS']

然后我意识到我需要使用“以...开头”来选择字段，因为我漏掉了一堆。因此，我按照 Pandas 文档尽可能地尝试了

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

并得到 AttributeError:'float' 对象没有属性'startswith'

所以我尝试了另一种语法，结果相同

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

参考http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
第 4 节：列表推导和Series的map方法也可以用于产生更复杂的条件：

我错过了什么？

解决方案 1：

您可以使用str.startswithDataFrame 方法来提供更一致的结果：

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

并且布尔索引可以正常工作（我更喜欢使用loc，但没有它也一样有效）：

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

。

看起来 Series/column 中至少有一个元素是浮点数，它没有 startswith 方法，因此出现 AttributeError，列表推导应该会引发相同的错误...

解决方案 2：

检索以所需字符串开头的所有行

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

检索包含所需字符串的所有行

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]

解决方案 3：

使用 startswith 来获取特定列值

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]

解决方案 4：

您可以apply轻松地使用任何字符串匹配函数逐个元素地应用于您的列。

table2=table[table['SUBDIVISION'].apply(lambda x: x.startswith('INVERNESS'))]

假设您的“SUBDIVISION”列属于正确类型（字符串）

编辑：修复缺失的括号

解决方案 5：

这也可以通过以下方式实现query：

table.query('SUBDIVISION.str.startswith("INVERNESS").values')