如何访问 NumPy 多维数组的第 i 列？-IT科技

摘要：问题描述：鉴于：test = np.array([[1, 2], [3, 4], [5, 6]]) test[i]给出第 i行（例如[1, 2]）。我如何访问第i列？（例如[1, 3, 5]）。此外，这会是一个昂贵的操作吗？解决方案 1：和：test = np.array([[1, 2], [3, 4], [...

问题描述：

鉴于：

test = np.array([[1, 2], [3, 4], [5, 6]])

test[i]给出第 i行（例如[1, 2]）。我如何访问第i列？（例如[1, 3, 5]）。此外，这会是一个昂贵的操作吗？

解决方案 1：

和：

test = np.array([[1, 2], [3, 4], [5, 6]])

要访问第 0 列：

>>> test[:, 0]
array([1, 3, 5])

要访问第 0 行：

>>> test[0, :]
array([1, 2])

这在NumPy 参考的 1.4 节（索引）中有介绍。这很快，至少根据我的经验。这肯定比在循环中访问每个元素要快得多。

解决方案 2：

>>> test[:,0]
array([1, 3, 5])

此命令为您提供一个行向量，如果您只想循环遍历它，那么就可以了，但如果您想与其他维度为 3xN 的数组进行 hstack，那么您将

ValueError: all the input arrays must have same number of dimensions

尽管

>>> test[:,[0]]
array([[1],
       [3],
       [5]])

为您提供一个列向量，以便您可以执行连接或 hstack 操作。

例如

>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])

解决方案 3：

如果你想一次访问多个列，你可以这样做：

>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

解决方案 4：

您还可以转置并返回一行：

In [4]: test.T[0]
Out[4]: array([1, 3, 5])

解决方案 5：

尽管问题已经得到解答，但我还是想提一下一些细微差别。

假设你对数组的第一列感兴趣

arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])

正如您从其他答案中已经知道的那样，要以“行向量”（形状数组）的形式获取它(3,)，可以使用切片：

arr_col1_view = arr[:, 1]         # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr

要检查一个数组是视图还是另一个数组的副本，您可以执行以下操作：

arr_col1_view.base is arr  # True
arr_col1_copy.base is arr  # False

参见ndarray.base。

除了两者之间明显的区别（修改arr_col1_view会影响arr）之外，遍历它们各自的字节步数也不同：

arr_col1_view.strides[0]  # 8 bytes
arr_col1_copy.strides[0]  # 4 bytes

参见步伐和这个答案。

为什么这很重要？想象一下，你有一个非常大的数组，A而不是arr：

A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1] 
A_col1_copy = A[:, 1].copy()

并且你想计算第一列所有元素的总和，即A_col1_view.sum()或A_col1_copy.sum()。使用复制的版本要快得多：

%timeit A_col1_view.sum()  # ~248 µs
%timeit A_col1_copy.sum()  # ~12.8 µs

这是由于之前提到的步数不同造成的：

A_col1_view.strides[0]  # 40000 bytes
A_col1_copy.strides[0]  # 4 bytes

尽管使用列副本似乎更好，但情况并非总是如此，因为制作副本也需要时间，并且会占用更多内存（在本例中，我花了大约 200 µs 来创建A_col1_copy）。但是，如果我们首先需要副本，或者我们需要对数组的特定列执行许多不同的操作，并且我们可以牺牲内存来提高速度，那么制作副本就是可行的方法。

在我们主要对列感兴趣的情况下，按列主序（“F”）而不是行主序（“C”）（默认顺序）创建数组可能是一个好主意，然后像之前一样进行切片以获取列而不复制它：

A = np.asfortranarray(A)   # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0]     # 4 bytes

%timeit A_col1_view.sum()  # ~12.6 µs vs ~248 µs

现在，在列视图上执行求和运算（或任何其他运算）与在列副本上执行该运算一样快。

最后让我注意，转置数组和使用行切片与在原始数组上使用列切片相同，因为转置只需交换原始数组的形状和步幅即可完成。

A[:, 1].strides[0]    # 40000 bytes
A.T[1, :].strides[0]  # 40000 bytes

解决方案 6：

要获得多个独立的列，只需：

> test[:,[0,2]]

你将获得第 0 列和第 2 列

解决方案 7：

>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol
5L

然后您可以通过以下方式选择第 2 至第 4 列：

>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])

解决方案 8：

这不是多维的。它是二维数组。您可以在其中访问所需的列。

test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b]  # you can provide index in place of a and b

解决方案 9：

这个问题已经得到解答，但需要注意的是查看与复制。

如果使用标量（常规索引）对数组进行索引，则结果是一个视图（x如下），这意味着对所做的任何更改都x将反映在上，test因为x只是的不同视图test。

test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, 1]
x[:] = 100        # <---- this does affects test

test
array([[  1, 100],
       [  3, 100],
       [  5, 100]])

但是，如果使用列表/类似数组（高级索引）对数组进行索引，则结果是一个副本，这意味着对的任何更改都x不会影响test。

test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, [1]]
x[:] = 100        # <---- this does not affect test

test
array([[1, 2],
       [3, 4],
       [5, 6]])

一般来说，使用切片索引将返回一个视图：

test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, :2]
x[:] = 100

test
array([[100, 100],
       [100, 100],
       [100, 100]])

但使用数组进行索引将返回一份副本：

test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, np.r_[:2]]
x[:] = 100

test
array([[1, 2],
       [3, 4],
       [5, 6]])

常规索引速度极快，而高级索引速度则要慢得多（也就是说，它仍然几乎是即时的，而且肯定不会成为程序的瓶颈）。

解决方案 10：

我只是想澄清一下，harmand 在 mtrw 最高分答案下的评论令人困惑。他说：

这创建了一个副本，是否可以获得引用，就像我获得对某一列的引用一样，此引用中的任何更改都会反映在原始数组中。

虽然实际上这个代码

import numpy as np

arr = np.array([[1, 2], [3, 4], [5, 6]])

barr = arr[:, 1]

print(barr)

barr[1] = 8

print(arr)

打印出来

[[1 2]
 [3 8]
 [5 6]]

如果您能在 mtrw 的答案下的评论中注意到这一点，我将不胜感激，因为我的声誉还太低。