递归统计 Linux 目录中的文件数量-IT科技

摘要：问题描述：如何递归计算 Linux 目录中的文件数量？我发现了这个：find DIR_NAME -type f ¦ wc -l 但是当我运行它时它返回以下错误。查找：路径必须在表达式之前：¦解决方案 1：这应该有效：find DIR_NAME -type f | wc -l 解释：-type f仅包含文件。|...

问题描述：

如何递归计算 Linux 目录中的文件数量？

我发现了这个：

find DIR_NAME -type f ¦ wc -l

但是当我运行它时它返回以下错误。

查找：路径必须在表达式之前：¦

解决方案 1：

这应该有效：

find DIR_NAME -type f | wc -l

解释：

-type f仅包含文件。
|（而不是¦）将find命令的标准输出重定向到wc命令的标准输入。
wc（字数统计的缩写）计算输入（文档）中的换行符、单词和字节数。
-l仅计算换行符。

笔记：

替换DIR_NAME为.在当前文件夹中执行命令。
您还可以删除-type f计数中包含的目录（和符号链接）。
如果文件名可以包含换行符，则此命令可能会计数过多。

关于您的示例不起作用的原因的解释：

在您显示的命令中，您没有使用“管道”( |) 来连接两个命令，而是使用断线 ( ¦)，shell 无法将其识别为命令或类似的东西。这就是您收到该错误消息的原因。

解决方案 2：

对于当前目录：

find -type f | wc -l

解决方案 3：

如果您想要了解当前目录下每个目录中有多少个文件：

for i in */ .*/ ; do 
    echo -n $i&quot;: &quot; ; 
    (find &quot;$i&quot; -type f | wc -l) ; 
done

当然，这些可以写在一行上。括号明确了wc -l应该监视谁的输出（find $i -type f在本例中）。

解决方案 4：

在我的计算机上，比接受的答案rsync要快一点：find | wc -l

$ rsync --stats --dry-run -ax /path/to/dir /tmp

Number of files: 173076
Number of files transferred: 150481
Total file size: 8414946241 bytes
Total transferred file size: 8414932602 bytes

第二行显示文件数量，上例中为 150,481。此外，您还可以获得总大小（以字节为单位）。

评论：

第一行是文件、目录、符号链接等的总和，这就是它比第二行更大的原因。
该--dry-run（或-n简称）选项很重要，因为实际上并不传输文件！
我使用了-x“不跨越文件系统边界”选项，这意味着如果您执行它/并且连接了外部硬盘，它将只计算根分区上的文件。

解决方案 5：

您可以使用

$ tree

使用以下命令安装tree包后

$ sudo apt-get install tree

（在 Debian / Mint / Ubuntu Linux 机器上）。

该命令不仅显示文件数，还分别显示目录数。选项 -L 可用于指定最大显示级别（默认情况下为目录树的最大深度）。

通过提供选项，也可以包含隐藏文件-a。

解决方案 6：

由于 UNIX 中的文件名可能包含换行符（是的，换行符），因此wc -l可能会计算太多文件。我会为每个文件打印一个点，然后计算点数：

find DIR_NAME -type f -printf &quot;.&quot; | wc -c

注意：该-printf选项仅适用于 GNU findutils 中的 find。您可能需要在 Mac 等设备上安装它。

解决方案 7：

将这里的几个答案结合在一起，最有用的解决方案似乎是：

find . -maxdepth 1 -type d -print0 |
xargs -0 -I {} sh -c &#039;echo -e $(find &quot;{}&quot; -printf &quot;
&quot; | wc -l) &quot;{}&quot;&#039; |
sort -n

它可以处理奇怪的事情，比如文件名中包含空格、括号甚至换行符。它还可以按文件数量对输出进行排序。

您可以在之后增加数字，-maxdepth以便子目录也计算在内。请记住，这可能需要很长时间，特别是如果您的目录结构嵌套程度很高，并且-maxdepth数字很大。

解决方案 8：

如果你想知道当前工作目录中有多少个文件和子目录，你可以使用这个单行命令

find . -maxdepth 1 -type d -print0 | xargs -0 -I {} sh -c &#039;echo -e $(find {} | wc -l) {}&#039; | sort -n

这将在 GNU 风格中发挥作用，并且只需从 BSD linux（例如 OSX）的 echo 命令中省略 -e。

解决方案 9：

如果需要递归地计算特定文件类型 ，可以执行以下操作：

find YOUR_PATH -name &#039;*.html&#039; -type f | wc -l

-l只是显示输出的行数。

如果需要排除某些文件夹，请使用-not -path

find . -not -path &#039;./node_modules/*&#039; -name &#039;*.js&#039; -type f | wc -l

解决方案 10：

你可以使用命令ncdu。它将递归地计算 Linux 目录中包含的文件数量。以下是输出示例：

在此处输入图片描述

它有一个进度条，如果你有许多文件的话这会很方便：

在此处输入图片描述

要在 Ubuntu 上安装：

sudo apt-get install -y ncdu

基准：我使用https://archive.org/details/cv_corpus_v1.tar (380390 个文件，11 GB) 作为需要计算文件数量的文件夹。

find . -type f | wc -l：大约需要 1 分 20 秒完成
ncdu：大约需要 1 分 20 秒完成

解决方案 11：

tree $DIR_PATH | tail -1

示例输出：

5309 个目录，2122 个文件

解决方案 12：

如果您想避免错误情况，请不要允许wc -l查看带有换行符的文件（它将算作 2 个以上的文件）

例如，假设我们有一个文件，里面有一个 EOL 字符

> mkdir emptydir &amp;&amp; cd emptydir
> touch $&#039;file with EOL(
) character in it&#039;
> find -type f
./file with EOL(?) character in it
> find -type f | wc -l
2

由于至少 gnu 似乎wc没有读取/计算以空字符结尾的列表的选项（除了从文件），最简单的解决方案就是不传递文件名，而是每次找到文件时都进行静态输出，例如在与上述相同的目录中

> find -type f -exec printf &#039;
&#039; ; | wc -l
1

或者如果你find支持它

> find -type f -printf &#039;
&#039; | wc -l
1

解决方案 13：

要确定当前目录中有多少个文件，请输入ls -1 | wc -l。这用于wc计算(-l)输出中的行数ls -1。它不计算点文件。请注意ls -l（这是“L”而不是前面示例中的“1”）我在本 HOWTO 的先前版本中使用的文件数实际上会比实际计数大一个。感谢 Kam Nejad 指出这一点。

如果您只想计算文件而不包含符号链接（这只是您可以做的其他操作的一个示例），您可以使用ls -l | grep -v ^l | wc -l（这次是“L”而不是“1”，我们想要一个“长”列表）。grep检查以“l”开头的任何行（表示链接），并丢弃该行（-v）。

相对速度：“ls -1 /usr/bin/ | wc -l”在未加载的 486SX25 上大约需要 1.03 秒（此机器上的 /usr/bin/ 有 355 个文件）。“ ls -l /usr/bin/ | grep -v ^l | wc -l”大约需要 1.19 秒。

来源： http: //www.tldp.org/HOWTO/Bash-Prompt-HOWTO/x700.html

解决方案 14：

find . -type f -name '*.fileextension' | wc -l

将 . 替换为目录路径，将文件扩展名替换为真实扩展名。例如，如果您要查找所有 png 文件，则将其替换为 *.png

解决方案 15：

对于名称中带有空格的目录...（基于上面的各种答案） - 递归打印目录名称及其中的文件数：

find . -mindepth 1 -type d -print0 | while IFS= read -r -d &#039;&#039; i ; do echo -n $i&quot;: &quot; ; ls -p &quot;$i&quot; | grep -v / | wc -l ; done

示例（已格式化以便于阅读）：

pwd
  /mnt/Vancouver/Programming/scripts/claws/corpus

ls -l
  total 8
  drwxr-xr-x 2 victoria victoria 4096 Mar 28 15:02 &#039;Catabolism - Autophagy; Phagosomes; Mitophagy&#039;
  drwxr-xr-x 3 victoria victoria 4096 Mar 29 16:04 &#039;Catabolism - Lysosomes&#039;

ls &#039;Catabolism - Autophagy; Phagosomes; Mitophagy&#039;/ | wc -l
  138

## 2 dir (one with 28 files; other with 1 file):
ls &#039;Catabolism - Lysosomes&#039;/ | wc -l
  29

使用以下方式可以更好地可视化目录结构tree：

tree -L 3 -F .
  .
  ├── Catabolism - Autophagy; Phagosomes; Mitophagy/
  │   ├── 1
  │   ├── 10
  │   ├── [ ... SNIP! (138 files, total) ... ]
  │   ├── 98
  │   └── 99
  └── Catabolism - Lysosomes/
      ├── 1
      ├── 10
      ├── [ ... SNIP! (28 files, total) ... ]
      ├── 8
      ├── 9
      └── aaa/
          └── bbb

  3 directories, 167 files

man find | grep mindep
  -mindepth levels
    Do not apply any tests or actions at levels less than levels
    (a non-negative integer).  -mindepth 1 means process all files
    except the starting-points.

ls -p | grep -v /（下面使用）来自https://unix.stackexchange.com/questions/48492/list-only-regular-files-but-not-directories-in-current-directory的答案 2

find . -mindepth 1 -type d -print0 | while IFS= read -r -d &#039;&#039; i ; do echo -n $i&quot;: &quot; ; ls -p &quot;$i&quot; | grep -v / | wc -l ; done
./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
./Catabolism - Lysosomes: 28
./Catabolism - Lysosomes/aaa: 1

应用： 我想在几百个目录中找出最大文件数（所有深度 = 1）[以下输出再次格式化以便于阅读]：

date; pwd
    Fri Mar 29 20:08:08 PDT 2019
    /home/victoria/Mail/2_RESEARCH - NEWS

time find . -mindepth 1 -type d -print0 | while IFS= read -r -d &#039;&#039; i ; do echo -n $i&quot;: &quot; ; ls -p &quot;$i&quot; | grep -v / | wc -l ; done > ../../aaa
    0:00.03

[victoria@victoria 2_RESEARCH - NEWS]$ head -n5 ../../aaa
    ./RNA - Exosomes: 26
    ./Cellular Signaling - Receptors: 213
    ./Catabolism - Autophagy; Phagosomes; Mitophagy: 138
    ./Stress - Physiological, Cellular - General: 261
    ./Ancient DNA; Ancient Protein: 34

[victoria@victoria 2_RESEARCH - NEWS]$ sed -r &#039;s/(^.*): ([0-9]{1,8}$)/: /g&#039; ../../aaa | sort -V | (head; echo &#039;&#039;; tail)

    0: ./Genomics - Gene Drive
    1: ./Causality; Causal Relationships
    1: ./Cloning
    1: ./GenMAPP 2
    1: ./Pathway Interaction Database
    1: ./Wasps
    2: ./Cellular Signaling - Ras-MAPK Pathway
    2: ./Cell Death - Ferroptosis
    2: ./Diet - Apples
    2: ./Environment - Waste Management

    988: ./Genomics - PPM (Personalized &amp; Precision Medicine)
    1113: ./Microbes - Pathogens, Parasites
    1418: ./Health - Female
    1420: ./Immunity, Inflammation - General
    1522: ./Science, Research - Miscellaneous
    1797: ./Genomics
    1910: ./Neuroscience, Neurobiology
    2740: ./Genomics - Functional
    3943: ./Cancer
    4375: ./Health - Disease

sort -V是一种自然排序。... 因此，我在任何这些 (Claws Mail) 目录中的最大文件数是 4375 个文件。如果我用左填充 ( https://stackoverflow.com/a/55409116/1904943 ) 这些文件名（它们在每个目录中都以数字形式命名，从 1 开始）并填充到总共 5 位数字，那么应该没问题。

附录

$ date; pwd
Tue 14 May 2019 04:08:31 PM PDT
/home/victoria/Mail/2_RESEARCH - NEWS

$ ls | head; echo; ls | tail
Acoustics
Ageing
Ageing - Calorie (Dietary) Restriction
Ageing - Senescence
Agriculture, Aquaculture, Fisheries
Ancient DNA; Ancient Protein
Anthropology, Archaeology
Ants
Archaeology
ARO-Relevant Literature, News

Transcriptome - CAGE
Transcriptome - FISSEQ
Transcriptome - RNA-seq
Translational Science, Medicine
Transposons
USACEHR-Relevant Literature
Vaccines
Vision, Eyes, Sight
Wasps
Women in Science, Medicine

$ find . -type f | wc -l
70214    ## files

$ find . -type d | wc -l
417      ## subdirectories

解决方案 16：

使用 bash：

使用 ( ) 创建一个条目数组，并使用 # 获取计数。

FILES=(./*); echo ${#FILES[@]}

好的，这不会递归计算文件数，但我想先展示一下简单的选项。一个常见的用例可能是创建文件的滚动备份。这将创建 logfile.1、logfile.2、logfile.3 等。

CNT=(./logfile*); mv logfile logfile.${#CNT[@]}

启用bash 4+ 的递归计数globstar（如 @tripleee 所述）

FILES=(**/*); echo ${#FILES[@]}

要递归获取文件数，我们仍然可以以相同的方式使用 find。

FILES=(`find . -type f`); echo ${#FILES[@]}

解决方案 17：

我们可以使用tree命令，它会递归显示所有文件和文件夹。同时，它会在输出的最后一行显示文件夹和文件的数量。

$ tree path/to/folder/
path/to/folder/
├── a-first.html
├── b-second.html
├── subfolder
│   ├── readme.html
│   ├── code.cpp
│   └── code.h
└── z-last-file.html

1 directories, 6 files

对于 tree 命令中的最后一行输出，我们可以对其输出使用 tail 命令

$ tree path/to/folder/ | tail -1
1 directories, 6 files

为了安装树，我们可以使用以下命令

$ sudo apt-get install tree

解决方案 18：

这里有很多正确答案。这里还有另一个！

find . -type f | sort | uniq -w 10 -c

其中.，是查找的文件夹，10是用于对目录进行分组的字符数。

解决方案 19：

我编写了ffcnt来加速特定情况下的递归文件计数：旋转磁盘和支持范围映射的文件系统。

ls它可以比基于的方法快一个数量级find，但 YMMV。

解决方案 20：

假设您想要每个目录的总文件数，请尝试：

for d in `find YOUR_SUBDIR_HERE -type d`; do 
   printf &quot;$d - files > &quot;
   find $d -type f | wc -l
done

对于当前目录尝试这个：

for d in `find . -type d`; do printf &quot;$d - files > &quot;; find $d -type f | wc -l; done;

如果您有较长的空间名称，则需要更改 IFS，如下所示：

OIFS=$IFS; IFS=$&#039;
&#039;
for d in `find . -type d`; do printf &quot;$d - files > &quot;; find $d -type f | wc -l; done
IFS=$OIFS

解决方案 21：

这种通过格式过滤的替代方法可以计算所有可用的 grub 内核模块：

ls -l /boot/grub/*.mod | wc -l

解决方案 22：

根据以上回复和评论，我得出了以下文件计数列表。特别是它结合了@Greg Bell 提供的解决方案以及@Arch Stanton 和@Schneems 的评论

function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d &#039;&#039; i ; do file_count=$(find &quot;$i&quot; -type f | wc -l) ; echo &quot;$file_count: $i&quot; ; done }; countit | sort -n -r >file-count.txt

统计当前目录及子目录中给定名称的所有文件

function countit { find . -maxdepth 1000000 -type d -print0 | while IFS= read -r -d &#039;&#039; i ; do file_count=$(find &quot;$i&quot; -type f | grep &lt;enter_filename_here> | wc -l) ; echo &quot;$file_count: $i&quot; ; done }; countit | sort -n -r >file-with-name-count.txt

解决方案 23：

以下解决方案对于 SSD 特别有用（因为它设计为在 SSD 上快速运行）：

可以使用gdu。它将递归计算 Linux 目录中包含的文件数。以下是输出示例（dundee的演示）：

在此处输入图片描述

要在 Ubuntu 上安装：

sudo add-apt-repository ppa:daniel-milde/gdu
sudo apt-get update
sudo apt-get install gdu

请参阅其他操作系统的安装页面以及如何安装 Gdu 的方法。

来自自述文件：

Gdu 主要针对 SSD 磁盘，可以充分利用并行处理。不过 HDD 也可以，只是性能提升不是很大。

自述文件指向类似的程序：

ncdu - 基于 NCurses 的纯 C （LTS）或 zig （稳定）编写的工具
godu - 具有类似旋转木马的用户界面的分析器
duaRust - 使用与 gdu（和 ncdu）类似的界面编写的工具
diskus - 用以下语言编写的非常简单但非常快速的工具 Rust
duc - 具有多种检查和可视化磁盘使用情况的工具集合
dustRust - 以树状结构显示磁盘使用情况的工具
pduRust - 以树状结构显示磁盘使用情况的工具

解决方案 24：

查找 -type f | wc -l

查找 . -type f | wc -l

解决方案 25：

这完全没问题。简单简短。如果你想计算文件夹中存在的文件数量。

ls | wc -l

解决方案 26：

ls -l | grep -e -x -e -dr | wc -l

长列表
过滤文件和目录
统计已过滤的行数

问题描述：

解决方案 1：

解决方案 2：

解决方案 3：

解决方案 4：

解决方案 5：

解决方案 6：

解决方案 7：

解决方案 8：

解决方案 9：

解决方案 10：

解决方案 11：

解决方案 12：

解决方案 13：

解决方案 14：

解决方案 15：

解决方案 16：

解决方案 17：

解决方案 18：

解决方案 19：

解决方案 20：

解决方案 21：

解决方案 22：

解决方案 23：

解决方案 24：

解决方案 25：

解决方案 26：

云端的项目管理软件