摘要：问题描述：我使用该函数将 Pandas 数据框转换为 HTML 输出DataFrame.to_html。当我将其保存到单独的 HTML 文件时，该文件显示截断的输出。例如，在我的 TEXT 列中，df.head(1)将显示这部电影是一部出色的作品。而不是这部电影对解构这一时期复杂的社会情绪进行了出色的尝试。对...

问题描述：

我使用该函数将 Pandas 数据框转换为 HTML 输出DataFrame.to_html。当我将其保存到单独的 HTML 文件时，该文件显示截断的输出。

例如，在我的 TEXT 列中，

df.head(1)将显示

这部电影是一部出色的作品。

而不是

这部电影对解构这一时期复杂的社会情绪进行了出色的尝试。

对于大量 Pandas 数据框的屏幕友好格式，这种呈现方式很好，但我需要一个 HTML 文件来显示数据框中包含的完整表格数据，也就是说，显示后一个文本元素而不是前一个文本片段。

我怎样才能在信息的 HTML 版本中显示 TEXT 列中每个元素的完整、未截断的文本数据？我想象 HTML 表必须显示长单元格才能显示完整的数据，但据我了解，只有列宽参数可以传递到函数中DataFrame.to_html。

解决方案 1：

将display.max_colwidth选项设置为None（或-11.0版本之前）：

pd.set_option('display.max_colwidth', None)

set_option文档

例如，在IPython中，我们看到信息被截断为 50 个字符。超出的部分将被省略：

截断结果

如果设置该display.max_colwidth选项，信息将会完整显示：

非截断结果

解决方案 2：

pd.set_option('display.max_columns', None)

id（第二个参数）可以完整显示列。

解决方案 3：

pd.set_option('display.max_columns', None)在设置显示的最大列数的同时，该选项pd.set_option('display.max_colwidth', -1)还设置了每个单个字段的最大宽度。

为了达到我的目的，我编写了一个小型辅助函数，用于完全打印巨大的数据帧而不影响其余代码。它还会重新格式化浮点数并设置虚拟显示宽度。您可以根据自己的用例采用它。

def print_full(x):
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 2000)
    pd.set_option('display.float_format', '{:20,.2f}'.format)
    pd.set_option('display.max_colwidth', None)
    print(x)
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')
    pd.reset_option('display.float_format')
    pd.reset_option('display.max_colwidth')

解决方案 4：

Jupyter 用户

每当我只需要对一个单元格执行此操作时，我都会使用以下命令：

with pd.option_context('display.max_colwidth', None):
  display(df)

解决方案 5：

也尝试一下这个：

pd.set_option("max_columns", None) # show all cols
pd.set_option('max_colwidth', None) # show full width of showing cols
pd.set_option("expand_frame_repr", False) # print cols side by side as it's supposed to be

解决方案 6：

显示特定单元格的完整数据框：

import pandas as pd
from IPython.display import display
with pd.option_context('display.max_colwidth', None,
                       'display.max_columns', None,
                       'display.max_rows', None):
    display(df)

上述方法可以扩展更多选项。

Karl Adler 更新的辅助函数：

def display_full(x):
    with pd.option_context('display.max_rows', None,
                           'display.max_columns', None,
                           'display.width', 2000,
                           'display.float_format', '{:20,.2f}'.format,
                           'display.max_colwidth', None):
        display(x)

更改所有单元格的显示选项：

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
display(df)

解决方案 7：

以下代码会导致以下错误：

pd.set_option('display.max_colwidth', -1)

FutureWarning：在 1.0 版本中，传递负整数已弃用，未来版本将不再支持。相反，请使用 None 来不限制列宽。

相反，使用：

pd.set_option('display.max_colwidth', None)

这样就完成了任务，并且符合Pandas 1.0 版以后的版本。

解决方案 8：

查看 Pandas 数据框中单元格全部内容的另一种方法是使用 IPython 的显示函数：

from IPython.display import HTML

HTML(df.to_html())

解决方案 9：

对于那些希望在 Dask 中执行此操作的人：

我在 Dask 中找不到类似的选项，但如果我只是在同一个 Pandas 笔记本中执行此操作，它也适用于 Dask。

import pandas as pd
import dask.dataframe as dd
pd.set_option('display.max_colwidth', -1) # This will set the no truncate for Pandas as well as for Dask. I am not sure how it does for Dask though, but it works.

train_data = dd.read_csv('./data/train.csv')
train_data.head(5)

解决方案 10：

对于那些喜欢减少打字的人（即所有人！）：pd.set_option('max_colwidth', None)做同样的事情

解决方案 11：

我想提供其他方法。如果您不想始终将其设置为默认值。

# First method
list(df.itertuples()) # This would force pandas to explicitly display your dataframe, however it's not that beautiful

# Second method
import tabulate
print(tabulate(df, tablefmt='psql', headers='keys')) 
# `headers` are your columns, `keys` are the current columns
# `psql` is one type of format for tabulate to organize before, you could pick other format you like in the documentation

解决方案 12：

Colab/Notebook 打印实用程序

我创建了一些函数来帮助我在 colab 中进行打印。为了解决这个问题，我使用了dfprint_wide以下实用程序代码片段：

打印完整 df 的工作原理

它每次打印 8 列以避免截断。这将确保没有列被截断。

您可以选择设置pd.set_option('display.max_colwidth', None)以确保单元格不会被截断。但这会使输出不那么美观（就像其他答案一样）

还可以选择打印到 html 文件。

用法

dfprint_wide(df)

实用代码

import os
import pandas as pd
from pathlib import Path

try:
    from IPython.display import display, HTML
except ImportError:
    pass

def hprint(text='', tag="p", export_file=None):
    html = f"<{tag}>{text}</{tag}>"
    if export_file:
        os.makedirs(os.path.dirname(export_file), exist_ok=True)
        with open(export_file, 'a') as f:
            f.write(html)
    else:
        try:
            display(HTML(html))
        except NameError:
            print(text)

def dfprint_wide(df, cols_per_chunk=8, export_file=None):
    num_cols = len(df.columns)
    for i in range(0, num_cols, cols_per_chunk):
        chunk_cols = df.columns[i:i+cols_per_chunk]
        dfprint(df[chunk_cols], export_file)
        hprint('', export_file)  # empty line for readability

def dfprint(data, export_file=None):
    if isinstance(data, dict):
        data = pd.DataFrame([data])
    df = pd.DataFrame(data)
    if export_file:
        os.makedirs(os.path.dirname(export_file), exist_ok=True)
        with open(export_file, 'a') as f:
            f.write(df.to_html())
    else:
        try:
            display(df)
        except NameError:
            print(df.to_string())