来自 os.listdir() 的非字母数字列表顺序-IT科技

摘要：问题描述：我经常使用 python 来处理数据目录。最近，我注意到列表的默认顺序已经变得几乎毫无意义。例如，如果我当前目录包含以下子目录：run01、run02、...run19、run20，然后我使用以下命令生成一个列表：dir = os.listdir(os.getcwd()) 然后我通常会按以下顺序获得...

问题描述：

我经常使用 python 来处理数据目录。最近，我注意到列表的默认顺序已经变得几乎毫无意义。例如，如果我当前目录包含以下子目录：run01、run02、...run19、run20，然后我使用以下命令生成一个列表：

dir = os.listdir(os.getcwd())

然后我通常会按以下顺序获得一个列表：

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

等等。以前顺序是字母数字。但这种新顺序一直伴随着我。

什么决定了这些列表的（显示）顺序？

解决方案 1：

您可以使用内置sorted函数按您想要的方式对字符串进行排序。根据您的描述，

sorted(os.listdir(whatever_directory))

或者，您可以使用.sort列表的方法：

lst = os.listdir(whatever_directory)
lst.sort()

我认为应该可以奏效。

请注意，获取文件名的顺序os.listdir可能完全取决于您的文件系统。

解决方案 2：

我认为顺序与文件系统中文件的索引方式有关。如果您确实想使其遵循某种顺序，您可以在获取文件后始终对列表进行排序。

解决方案 3：

根据文件：

os.listdir（路径）
返回包含 path 所指定目录中条目名称的列表。该列表按任意顺序排列。它不包括特殊条目“.”和“..”即使它们存在于目录中。

顺序是不可靠的，它是文件系统的产物。

要对结果进行排序，请使用sorted(os.listdir(path))。

解决方案 4：

无论出于什么原因，Python 没有附带内置的自然排序方式（即 1、2、10 而不是 1、10、2），因此您必须自己编写：

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

您现在可以使用此函数对列表进行排序：

dirlist = sorted_alphanumeric(os.listdir(...))

问题：
如果您使用上述函数对字符串（例如文件夹名称）进行排序，并希望像 Windows 资源管理器那样对它们进行排序，则在某些极端情况下它将无法正常工作。

如果您的文件夹名称中包含某些“特殊”字符，此排序函数将在 Windows 上返回不正确的结果。例如，此函数将对进行排序1, !1, !a, a，而 Windows 资源管理器将对进行排序!1, 1, !a, a。

因此，如果您想像Windows 资源管理器在 Python 中那样进行排序，您必须通过 ctypes 使用 Windows 内置函数StrCmpLogicalW（这在 Unix 上当然不起作用）：

from ctypes import wintypes, windll
from functools import cmp_to_key

def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

该函数比稍慢sorted_alphanumeric()。

奖励：winsort还可以在 Windows 上对完整路径进行排序。

或者，特别是如果您使用 Unix，您可以使用natsort库 ( pip install natsort) 以正确的方式按完整路径排序（意味着子文件夹位于正确的位置）。

您可以像这样使用它来对完整路径进行排序：

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

从 7.1.0 版本开始，natsort 支持os_sorted内部使用前面提到的 Windows API 或 Linux 排序，应该用它来代替natsorted()。

解决方案 5：

我认为默认情况下顺序是由 ASCII 值决定的。此问题的解决方法如下

dir = sorted(os.listdir(os.getcwd()), key=len)

解决方案 6：

使用natsort库：

对于 Ubuntu 和其他 Debian 版本，使用以下命令安装该库

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

有关如何使用此库的详细信息请参见此处

from natsort import natsorted

files = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
natsorted(files)

[out]:
['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

这不是答案的重复。于 2020-01-27作为编辑natsort添加。

解决方案 7：

aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

正如我的要求一样，我会将其分解成类似row_163.pkl这里的情况，因此也需要根据“_”对其进行拆分。os.path.splitext('row_163.pkl')`('row_163', '.pkl')`

但如果你的要求你可以做类似的事情

sorted(aa, key = lambda x: (int(re.sub('D','',x)),x))

在哪里

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

您还可以执行目录检索sorted(os.listdir(path))

对于类似情况'run01.txt'，或者'run01.csv'你可以这样做

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))

解决方案 8：

这可能只是 C 返回的顺序readdir()。尝试运行此 C 程序：

#include <dirent.h>
#include <stdio.h>

int main(void){
   DIR *dirp;
   struct dirent* de;
   dirp = opendir(".");
   while(de = readdir(dirp)) // Yes, one '='.
        printf("%s
", de->d_name);
   closedir(dirp);
   return 0;
}

构建线应该是这样的gcc -o foo foo.c。

PS 刚刚运行了这个和你的 Python 代码，它们都给了我排序后的输出，所以我无法重现你所看到的内容。

解决方案 9：

来自文档：

该列表按任意顺序排列，并且不包括特殊条目“。”和“..”即使它们存在于目录中。

这意味着顺序可能取决于操作系统/文件系统，没有特别有意义的顺序，因此不能保证是任何特定的顺序。正如许多答案提到的那样：如果愿意，可以对检索到的列表进行排序。

干杯 :)

解决方案 10：

建议的os.listdir和sorted命令组合产生的结果与ls -lLinux 下的命令相同。以下示例验证了这一假设：

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$

ls -l因此，对于想要在其 Python 代码中重现众所周知的命令结果的人来说，sorted( os.listdir( DIR ) )这种方法非常有效。

解决方案 11：

我发现“排序”并不总是按我预期的方式进行。例如，我有一个如下所示的目录，“排序”给了我一个非常奇怪的结果：

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

它似乎首先比较第一个字符，如果那是最大的，那它就是最后一个。

解决方案 12：

In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.

解决方案 13：

要直接回答问题，您可以使用以下代码。

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08']
for file in sorted(dir, key=lambda x:int(x.replace('run', ''))):
    print(file)

它将打印：

run01
run08
run11
run12
run13
run14
run18

此方法使用 Python 内置方法sorted，并通过key参数指定排序标准，即将没有“run”的列表项转换为整数。

解决方案 14：

下面说明了如何像 Windows 一样对目录进行排序：

（使用Windows自带的StrCmpLogicalW比较器功能）

import os, ctypes, functools

natSort = functools.cmp_to_key(ctypes.windll.shlwapi.StrCmpLogicalW) # Natural Sorting

def listDir(path): # list a directory exactly like Windows Explorer
    items = os.listdir(path) # unsorted listdir
    items.sort(key=natSort) # sorted listdir (folders just need grouping before files)
    folders,files = [],[] # separate folders and files
    for i in items: (folders if os.path.isdir(os.path.join(path,i)) else files).append(i)
    return folders + files # put folders before files

items = listDir(".")
for i in items: print(i)

您所要做的就是定义“natSort”键并使用它。

在 Python 2 中，您可以cmp=ctypes.windll.shlwapi.StrCmpLogicalW直接使用。

这是一个读取所有子目录的脚本：

import os, ctypes, functools

def cleanPath(path): return os.path.normpath(path).replace("\\\","/")
natSort = functools.cmp_to_key(ctypes.windll.shlwapi.StrCmpLogicalW)
noFilter = lambda i: True

def listDir(folder='.',fileFilter=noFilter): # returns folders and files in folder
    folder = cleanPath(folder)+"/" # clean path with slash for joining
    folders,files = [],[]; items = os.listdir(folder); items.sort(key=natSort)
    if folder=="./": folder="" # don't need to start paths with ./
    for i in items:
        fi = folder+i # path to our item
        if os.path.isdir(fi): folders+=[fi] # folders are pathed
        elif fileFilter(i): files+=[i] # files aren't pathed
    return folders,files # return folders and files separately

def listDirs(root,fileFilter=noFilter): # returns all dirs containing files
    allDirs = []; walk = [cleanPath(root)] # clean path (no ending slash)
    while walk:
        folder = walk.pop(0)+"/"; # add slash so we can join with files
        folders,files = listDir(folder,fileFilter)
        if folders: walk = folders+walk # walk subdirs first
        if files: allDirs+=[(folder,files)] # folder path and its files
    return allDirs

# get all mp4 files in videos folder and subfolders
folders = listDirs("videos",lambda i: i.endswith(".mp4"))

for folder,files in folders:
    print("--- %s ---"%folder)
    for file in files: print(file) # pathedFile = folder+file
    print("")

解决方案 15：

ls默认情况下预览按名称排序的文件。（ls可以使用选项按日期、大小等排序）

files = list(os.popen("ls"))
files = [file.strip("
") for file in files]

ls当目录包含很多文件时，使用会有更好的性能。