在 numpy 数组中查找最近的值-IT科技

摘要：问题描述：如何在 numpy 数组中找到最接近的值？示例：np.find_nearest(array, value) 解决方案 1：import numpy as np def find_nearest(array, value): array = np.asarray(array) idx ...

问题描述：

如何在 numpy 数组中找到最接近的值？示例：

np.find_nearest(array, value)

解决方案 1：

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

使用示例：

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

print(find_nearest(array, value=0.5))
# 0.568743859261

解决方案 2：

如果您的数组已排序并且非常大，这是一个更快的解决方案：

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

这可以扩展到非常大的数组。如果您不能假设数组已经排序，您可以轻松修改上面的方法以进行排序。对于小数组来说，这有点过头了，但是一旦数组变大，这种方法就会快得多。

解决方案 3：

稍加修改，上述答案就适用于任意维数的数组（1d、2d、3d、...）：

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

或者，写成一行：

a.flat[np.abs(a - a0).argmin()]

解决方案 4：

答案摘要：如果有一个已排序的数组array，那么二分法代码（如下所示）执行速度最快。对于大型数组，速度快约 100-1000 倍，对于小型数组，速度快约 2-100 倍。它也不需要 numpy。如果您有一个未排序的数组，array那么如果array很大，应该首先考虑使用 O(n logn) 排序，然后二分法，如果array很小，那么方法 2 似乎是最快的。

首先，您应该澄清您所说的最近值是什么意思。通常，人们希望在横坐标中使用间隔，例如 array=[0,0.7,2.1]，value=1.95，答案将是 idx=1。我怀疑您需要这种情况（否则，一旦找到间隔，就可以使用后续条件语句非常轻松地修改以下内容）。我会注意到，执行此操作的最佳方法是使用二分法（我将首先提供 - 请注意，它根本不需要 numpy，并且比使用 numpy 函数更快，因为它们执行冗余操作）。然后，我将提供与其他用户在此处提供的其他方法的时间比较。

二分法：

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

现在我将从其他答案中定义代码，它们每个都返回一个索引：

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

现在我将对代码进行计时：
请注意，方法 1、2、4、5 不能正确给出间隔。方法 1、2、4 四舍五入到数组中的最近点（例如 >=1.5 -> 2），方法 5 总是向上舍入（例如 1.45 -> 2）。只有方法 3 和 6，当然还有二分法才能正确给出间隔。

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

对于大型数组，二分法给出 4us，而下一个最佳时间为 180us，最长时间为 1.21ms（快约 100 - 1000 倍）。对于较小的数组，它快约 2-100 倍。

解决方案 5：

values如果您需要搜索多个内容（values可以是多维数组），这里有一个@Dimitri 解决方案的快速矢量化版本：

# `values` should be sorted
def get_closest(array, values):
    # make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")
    
    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1
    
    return array[idxs]

基准

for> 比使用@Demitri 解决方案的循环快 100 倍`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

解决方案 6：

这是一个在向量数组中查找最近向量的扩展。

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

解决方案 7：

如果你不想使用 numpy，可以这样做：

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

解决方案 8：

这是一个处理非标量“值”数组的版本：

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

或者，如果输入是标量，则返回数字类型（例如 int、float）：

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

解决方案 9：

这是针对@Ari Onasafari 的 scipy 版本，答案是“在向量数组中找到最近的向量”

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

解决方案 10：

对于大型数组，@Demitri 给出的 (优秀) 答案比当前标记为最佳的答案要快得多。我通过以下两种方式调整了他的精确算法：

无论输入数组是否已排序，下面的函数都可以起作用。
下面的函数返回与最接近的值相对应的输入数组的索引，这有点更为通用。

请注意，下面的函数还处理了特定的边缘情况，这会导致@Demitri 编写的原始函数出现错误。除此之外，我的算法与他的算法完全相同。

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

解决方案 11：

我认为最符合 Python 风格的方法是：

 num = 65 # Input number
 array = np.random.random((10))*100 # Given array 
 nearest_idx = np.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本代码。你可以将其用作函数

解决方案 12：

所有答案都有助于收集信息以编写高效的代码。但是，我编写了一个小型 Python 脚本来针对各种情况进行优化。如果提供的数组已排序，那将是最好的情况。如果搜索指定值的最近点的索引，则bisect模块是最省时的。当搜索与数组相对应的索引时，是最numpy searchsorted高效的。

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

在 [63] 中：%time bisect.bisect_left(xlist, 0.3) CPU 时间：用户 0 ns，系统：0 ns，总计：0 ns 挂钟时间：22.2 µs

np.searchsorted(xar, 0.3, side="left")

在 [64] 中：%time np.searchsorted(xar, 0.3, side="left") CPU 时间：用户 0 ns，系统：0 ns，总计：0 ns 挂钟时间：98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side="left") CPU 时间：用户 4 毫秒，系统：0 纳秒，总计：4 毫秒挂钟时间：1.2 毫秒

如果我们遵循乘法规则，那么 numpy 应该需要约 100 毫秒，这意味着速度快约 83 倍。

解决方案 13：

这是unutbu 答案的矢量化版本：

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

解决方案 14：

可能有助于ndarrays：

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

解决方案 15：

对于那些寻找多个最近的答案的人来说，修改接受的答案：

import numpy as np
def find_nearest(array, value, k):
    array = np.asarray(array)
    idx = np.argsort(abs(array - value))[:k]
    return array[idx]

请参阅：
https://stackoverflow.com/a/66937734/11671779

解决方案 16：

对于二维数组，确定最近元素的 i, j 位置：

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

解决方案 17：

这是一个适用于二维数组的版本，如果用户有 scipy 的cdist函数，则使用它，如果没有，则使用更简单的距离计算。

默认情况下，输出是最接近您输入值的索引，但您可以使用关键字output将其更改为'index'、'value'或之一'both'，其中'value'输出array[index]和'both'输出index, array[index]。

对于非常大的数组，您可能需要使用kind='euclidean'，因为默认的 scipy cdist 函数可能会耗尽内存。

这可能不是绝对最快的解决方案，但已经很接近了。

def find_nearest_2d(array, value, kind='cdist', output='index'):
    # 'array' must be a 2D array
    # 'value' must be a 1D array with 2 elements
    # 'kind' defines what method to use to calculate the distances. Can choose one
    #    of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
    #    arrays. Otherwise, cdist is much faster.
    # 'output' defines what the output should be. Can be 'index' (default) to return
    #    the index of the array that is closest to the value, 'value' to return the
    #    value that is closest, or 'both' to return index,value
    import numpy as np
    if kind == 'cdist':
        try: from scipy.spatial.distance import cdist
        except ImportError:
            print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
            kind = 'euclidean'
    index = np.where(array == value)[0] # Make sure the value isn't in the array
    if index.size == 0:
        if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
        elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
        else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
    if output == 'index': return index
    elif output == 'value': return array[index]
    elif output == 'both': return index,array[index]
    else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")

解决方案 18：

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

解决方案 19：

这个使用numpy searchsorted处理任意数量的查询，因此在对输入数组进行排序后，速度同样快。它也适用于 2d、3d 等常规网格：
在此处输入图片描述

#!/usr/bin/env python3
# keywords: nearest-neighbor regular-grid python numpy searchsorted Voronoi

import numpy as np

#...............................................................................
class Near_rgrid( object ):
    """ nearest neighbors on a Manhattan aka regular grid
    1d:
    near = Near_rgrid( x: sorted 1d array )
    nearix = near.query( q: 1d ) -> indices of the points x_i nearest each q_i
        x[nearix[0]] is the nearest to q[0]
        x[nearix[1]] is the nearest to q[1] ...
        nearpoints = x[nearix] is near q
    If A is an array of e.g. colors at x[0] x[1] ...,
    A[nearix] are the values near q[0] q[1] ...
    Query points < x[0] snap to x[0], similarly > x[-1].

    2d: on a Manhattan aka regular grid,
        streets running east-west at y_i, avenues north-south at x_j,
    near = Near_rgrid( y, x: sorted 1d arrays, e.g. latitide longitude )
    I, J = near.query( q: nq × 2 array, columns qy qx )
    -> nq × 2 indices of the gridpoints y_i x_j nearest each query point
        gridpoints = np.column_stack(( y[I], x[J] ))  # e.g. street corners
        diff = gridpoints - querypoints
        distances = norm( diff, axis=1, ord= )
    Values at an array A definded at the gridpoints y_i x_j nearest q: A[I,J]

    3d: Near_rgrid( z, y, x: 1d axis arrays ) .query( q: nq × 3 array )

    See Howitworks below, and the plot Voronoi-random-regular-grid.
    """

    def __init__( self, *axes: "1d arrays" ):
        axarrays = []
        for ax in axes:
            axarray = np.asarray( ax ).squeeze()
            assert axarray.ndim == 1, "each axis should be 1d, not %s " % (
                    str( axarray.shape ))
            axarrays += [axarray]
        self.midpoints = [_midpoints( ax ) for ax in axarrays]
        self.axes = axarrays
        self.ndim = len(axes)

    def query( self, queries: "nq × dim points" ) -> "nq × dim indices":
        """ -> the indices of the nearest points in the grid """
        queries = np.asarray( queries ).squeeze()  # or list x y z ?
        if self.ndim == 1:
            assert queries.ndim <= 1, queries.shape
            return np.searchsorted( self.midpoints[0], queries )  # scalar, 0d ?
        queries = np.atleast_2d( queries )
        assert queries.shape[1] == self.ndim, [
                queries.shape, self.ndim]
        return [np.searchsorted( mid, q )  # parallel: k axes, k processors
                for mid, q in zip( self.midpoints, queries.T )]

    def snaptogrid( self, queries: "nq × dim points" ):
        """ -> the nearest points in the grid, 2d [[y_j x_i] ...] """
        ix = self.query( queries )
        if self.ndim == 1:
            return self.axes[0][ix]
        else:
            axix = [ax[j] for ax, j in zip( self.axes, ix )]
            return np.array( axix )


def _midpoints( points: "array-like 1d, *must be sorted*" ) -> "1d":
    points = np.asarray( points ).squeeze()
    assert points.ndim == 1, points.shape
    diffs = np.diff( points )
    assert np.nanmin( diffs ) > 0, "the input array must be sorted, not %s " % (
            points.round( 2 ))
    return (points[:-1] + points[1:]) / 2  # floats

#...............................................................................
Howitworks = \n"""
How Near_rgrid works in 1d:
Consider the midpoints halfway between fenceposts | | |
The interval [left midpoint .. | .. right midpoint] is what's nearest each post --

    |   |       |                     |   points
    | . |   .   |          .          |   midpoints
      ^^^^^^               .            nearest points[1]
            ^^^^^^^^^^^^^^^             nearest points[2]  etc.

2d:
    I, J = Near_rgrid( y, x ).query( q )
    I = nearest in `x`
    J = nearest in `y` independently / in parallel.
    The points nearest [yi xj] in a regular grid (its Voronoi cell)
    form a rectangle [left mid x .. right mid x] × [left mid y .. right mid y]
    (in any norm ?)
    See the plot Voronoi-random-regular-grid.

Notes
-----
If a query point is exactly halfway between two data points,
e.g. on a grid of ints, the lines (x + 1/2) U (y + 1/2),
which "nearest" you get is implementation-dependent, unpredictable.

"""

Murky = \n""" NaNs in points, in queries ?
"""

__version__ = "2021-10-25 oct  denis-bz-py"

解决方案 20：

我这里有一个用于排序输入的版本，对于 A 中的某些值，可以找到 B 中最接近元素的索引：

from cmath import inf

import numba
import numpy as np


@numba.njit
def get_indices_of_closest_questioned_points(
    interogators: npt.NDArray,
    questioned: npt.NDArray,
) -> npt.NDArray:
    """For each element in `interogators` get the index of the closest element in set `questioned`.
    """
    res = np.empty(shape=interogators.shape, dtype=np.uint32)
    N = len(interogators)
    M = len(questioned)
    n = m = 0
    closest_left_to_x = -inf
    while n < N and m < M:
        x = interogators[n]
        y = questioned[m]
        if y < x:
            closest_left_to_x = y
            m += 1
        else:
            res[n] = m - (x - closest_left_to_x < y - x)
            n += 1
    while n < N:
        res[n] = M - 1
        n += 1
    return res

排序是一种经过高度优化的操作，根据输入和所用的算法，其运行时间为 O(nlogn) 或 O(n)。上面的代码显然也是 O(n)，numba这使其运行速度更快numpy。

以下是示例用法：

In [12]: get_indices_of_closest_questioned_points(np.array([0,5,10]), np.array([-1,2,6,8,9,10]))
Out[12]: array([0, 2, 5], dtype=uint32)

结果是0 2 5因为 -1 最接近 0，它是第二个数组的第 0 个元素，5 最接近 6，即第二个数组中的第 2 个元素，依此类推。

如果输入为[0]和，则将返回[-1,1]最接近的第一个元素。-1

最好的祝愿，

解决方案 21：

这是一个适用于不同形状的非常简单的答案：

def round_to_nearest_in(a, b):
    n = len(b)
    shp = list(a.shape) + [n]

    broad = np.repeat(a, n).reshape((shp))
    diffs = np.abs(broad - b)

    return b[diffs.argmin(-1)]

例如：

a = np.random.rand(10).reshape((5,2))
b = np.array([-1, 0, 1])
round_to_nearest_in(a, b)

唯一需要注意的是，它将创建一个中间数组，其大小对于大和来说= a.size*len(b)可能会变得相当大。a`b`