摘要：问题描述：我正在尝试使用 Pandas DataFrame 对象在 pyplot 中制作一个简单的散点图，但想要一种绘制两个变量的有效方法，但符号由第三列 (key) 决定。我尝试了使用 df.groupby 的各种方法，但没有成功。下面是一个示例 df 脚本。这会根据“key1”为标记着色，但我想看到带有“...

问题描述：

我正在尝试使用 Pandas DataFrame 对象在 pyplot 中制作一个简单的散点图，但想要一种绘制两个变量的有效方法，但符号由第三列 (key) 决定。我尝试了使用 df.groupby 的各种方法，但没有成功。下面是一个示例 df 脚本。这会根据“key1”为标记着色，但我想看到带有“key1”类别的图例。我接近了吗？谢谢。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
plt.show()

解决方案 1：

您可以将其scatter用于此，但这需要您的具有数值key1，并且您将没有图例，正如您所注意到的。

最好只用于plot像这样的离散类别。例如：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

在此处输入图片描述

如果您希望看起来像默认pandas样式，则只需使用 pandas 样式表更新rcParams并使用其颜色生成器即可。（我也稍微调整了图例）：

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')

fig, ax = plt.subplots()
ax.set_color_cycle(colors)
ax.margins(0.05)
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend(numpoints=1, loc='upper left')

plt.show()

在此处输入图片描述

解决方案 2：

使用Seaborn ( pip install seaborn) 作为一行代码即可轻松完成

sns.scatterplot(x_vars="one", y_vars="two", data=df, hue="key1")
：

import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1974)

df = pd.DataFrame(
    np.random.normal(10, 1, 30).reshape(10, 3),
    index=pd.date_range('2010-01-01', freq='M', periods=10),
    columns=('one', 'two', 'three'))
df['key1'] = (4, 4, 4, 6, 6, 6, 8, 8, 8, 8)

sns.scatterplot(x="one", y="two", data=df, hue="key1")

在此处输入图片描述

以下是供参考的数据框：

在此处输入图片描述

由于您的数据中有三个变量列，因此您可能希望使用以下命令绘制所有成对维度：

sns.pairplot(vars=["one","two","three"], data=df, hue="key1")

在此处输入图片描述

https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/是另一种选择。

解决方案 3：

有了plt.scatter，我只能想到一个：使用代理艺术家：

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
x=ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)

ccm=x.get_cmap()
circles=[Line2D(range(1), range(1), color='w', marker='o', markersize=10, markerfacecolor=item) for item in ccm((array([4,6,8])-4.0)/4)]
leg = plt.legend(circles, ['4','6','8'], loc = "center left", bbox_to_anchor = (1, 0.5), numpoints = 1)

结果是：

在此处输入图片描述

解决方案 4：

您可以使用 df.plot.scatter，并将一个数组传递给 c= 参数来定义每个点的颜色：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
colors = np.where(df["key1"]==4,'r','-')
colors[df["key1"]==6] = 'g'
colors[df["key1"]==8] = 'b'
print(colors)
df.plot.scatter(x="one",y="two",c=colors)
plt.show()

在此处输入图片描述

解决方案 5：

从 matplotlib 3.1 开始，您可以使用。自动图例创建.legend_elements()中显示了一个示例。优点是可以使用单个散点调用。

在这种情况下：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)


fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
ax.legend(*sc.legend_elements())
plt.show()

在此处输入图片描述

如果键不是直接以数字形式给出，则它看起来像

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), 
                  index = pd.date_range('2010-01-01', freq = 'M', periods = 10), 
                  columns = ('one', 'two', 'three'))
df['key1'] = list("AAABBBCCCC")

labels, index = np.unique(df["key1"], return_inverse=True)

fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = index, alpha = 0.8)
ax.legend(sc.legend_elements()[0], labels)
plt.show()

在此处输入图片描述

解决方案 6：

您还可以尝试专注于声明性可视化的Altair或ggpot 。

import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

Altair 代码

from altair import Chart
c = Chart(df)
c.mark_circle().encode(x='x', y='y', color='label')

在此处输入图片描述

ggplot 代码

from ggplot import *
ggplot(aes(x='x', y='y', color='label'), data=df) +\ngeom_point(size=50) +\ntheme_bw()

在此处输入图片描述

解决方案 7：

这相当 hacky，但你可以用它one1一次性Float64Index完成所有事情：

df.set_index('one').sort_index().groupby('key1')['two'].plot(style='--o', legend=True)

在此处输入图片描述

请注意，从 0.20.3 开始，需要对索引进行排序，并且图例有点不稳定。

解决方案 8：

seaborn 有一个包装函数scatterplot，可以更有效地完成此操作。

sns.scatterplot(data = df, x = 'one', y = 'two', data =  'key1'])

如何按类别创建散点图[重复]

问题描述：

解决方案 1：

解决方案 2：

解决方案 3：

解决方案 4：

解决方案 5：

解决方案 6：

Altair 代码

ggplot 代码

解决方案 7：

解决方案 8：

云端的项目管理软件