如何写入现有的 excel 文件而不覆盖数据(使用 pandas)?

2024-11-28 08:37:00
admin
原创
4
摘要:问题描述:我使用 pandas 以以下方式写入 excel 文件:import pandas writer = pandas.ExcelWriter('Masterfile.xlsx') data_filtered.to_excel(writer, "Main", cols=['Di...

问题描述:

我使用 pandas 以以下方式写入 excel 文件:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx 已包含多个不同的选项卡。但是,它尚未包含“Main”。

Pandas 正确地写入“主”表,但不幸的是它也会删除所有其他选项卡。


解决方案 1:

Pandas 文档称它使用 openpyxl 来处理 xlsx 文件。快速浏览一下代码,ExcelWriter你会发现类似下面的方法可能会奏效:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

解决方案 2:

更新:从 Pandas 1.3.0 开始,以下函数将无法正常工作,因为函数DataFrame.to_excel()已经pd.ExcelWriter()发生更改 -if_sheet_exists引入了一个新参数,这使得下面的函数无效。

您可以在这里找到适用于 Pandas 1.3.0+的更新版本。append_df_to_excel()


这是一个辅助函数:

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

已使用以下版本进行测试:

  • Pandas 1.2.3

  • Openpyxl 3.0.5

解决方案 3:

有了openpyxl版本2.4.0pandas版本0.19.2,@ski 提出的流程变得更简单一些:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

解决方案 4:

从 pandas 0.24 开始,你可以使用mode关键字参数简化这一过程ExcelWriter

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

解决方案 5:

我知道这是一个较旧的线程,但这是您在搜索时找到的第一个项目,如果您需要保留已创建的工作簿中的图表,则上述解决方案不起作用。在这种情况下,xlwings 是一个更好的选择 - 它允许您写入 excel 簿并保留图表/图表数据。

简单示例:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

解决方案 6:

老问题了,但我猜有些人仍在搜索这个问题 - 所以......

我觉得这种方法很好,因为所有工作表都加载到工作表名称和数据框对的字典中,该字典由 pandas 使用 sheetname=None 选项创建。在将电子表格读入字典格式并将其从字典写回之间,添加、删除或修改工作表非常简单。对我来说,就速度和格式而言,xlsxwriter 比 openpyxl 更适合这项特定任务。

注意:未来版本的 pandas(0.21.0+)将把“sheetname”参数更改为“sheet_name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

以 2013 年的问题为例:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

解决方案 7:

pandas 0.24中有一个更好的解决方案:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

前:

在此处输入图片描述

后:

在此处输入图片描述

所以现在就升级你的熊猫吧:

pip install --upgrade pandas

解决方案 8:

2023:简短回答(pandas 1.3.0+)

仅更新单元格值。保持格式和其他单元格不变。

with pd.ExcelWriter('my-existing-file.xlsx', engine='openpyxl', mode="a",
                    if_sheet_exists="overlay"  # => update value only
                    ) as writer:
    df.to_excel(writer, sheet_name="my-existing-sheet", index=None, startcol=2, startrow=3)  # Start at C4 

解决方案 9:

@MaxU 的解决方案不适用于更新版本的 python 和相关软件包。它引发错误:
“zipfile.BadZipFile:文件不是 zip 文件”

我生成了该函数的新版本,该版本可与更新版本的 python 和相关软件包配合使用,并使用 python 进行了测试:3.9 | openpyxl:3.0.6 | pandas:1.2.3

此外,我还为辅助函数添加了更多功能:

  1. 现在它根据单元格内容宽度调整所有列的大小,并且所有变量都将可见(参见“resizeColumns”)

  2. 您可以处理 NaN,如果您希望 NaN 显示为 NaN 或空单元格(参见“na_rep”)

  3. 添加了“startcol”,您可以决定从特定列开始写入,否则将从 col = 0 开始

这里是函数:

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

用法示例:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:DataSciencedf.xlsx"   

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE 
  
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

解决方案 10:

我使用了这里描述的答案

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

解决方案 11:

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

这一切都很好,唯一的问题是主文件(我们添加新工作表的文件)的格式丢失了。

解决方案 12:

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“keep_date_col”希望对你有帮助

解决方案 13:

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

解决方案 14:

@MaxU 的解决方案非常有效。我只有一个建议:

如果指定了 truncate_sheet=True,则不应从现有工作表中保留“startrow”。我建议:

        if startrow is None and sheet_name in writer.book.sheetnames:
            if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
                startrow = writer.book[sheet_name].max_row

解决方案 15:

我建议使用 xlwings ( https://docs.xlwings.org/en/stable/api.html ),它对于这个应用程序来说确实非常强大...这是我使用它的方式:

import xlwings as xw
import pandas as pd
import xlsxwriter

# function to get the active workbook
def getActiveWorkbook():
    try:
        # logic from xlwings to grab the current excel file
        activeWb = xw.books.active
    except:
        # print error message if unable to get the current workbook
        print('Unable to grab the current Workbook')
        pause()
        exitProgram()
    else:
        return activeWb

# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
    lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
    lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
    return str(lastRow), lastCol + str(lastRow)

activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])

# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
    ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

这似乎对我的应用程序非常有效,因为 .xlsm 工作簿可能非常棘手。您可以将其作为 Python 脚本执行,或使用 pyinstaller 将其转换为可执行文件,然后通过 excel 宏运行 .exe。您还可以使用 xlwings 从 Python 调用 VBA 宏,这非常有用。

解决方案 16:

方法:

  • 如果不存在则可以创建文件

  • 根据工作表名称附加到现有 Excel

import pandas as pd
from openpyxl import load_workbook

def write_to_excel(df, file):
    try:
        book = load_workbook(file)
        writer = pd.ExcelWriter(file, engine='openpyxl') 
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
        df.to_excel(writer, **kwds)
        writer.save()
    except FileNotFoundError as e:
        df.to_excel(file, **kwds)

用法:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

解决方案 17:

您可以使用 pandas.DataFrame.to_excel() 方法并将模式参数指定为“a”(附加模式),将数据写入现有 Excel 文件,而无需使用 pandas 覆盖数据。

以下是一个例子:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')
相关推荐
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   642  
  引言在当今快速变化的科技市场中,企业要想保持竞争力,就必须具备高效的产品开发流程。小米作为一家以创新驱动的科技公司,其集成产品开发(IPD)流程在业界颇受关注。其中,技术路线图规划作为IPD流程的核心环节,对于确保产品技术领先、满足市场需求以及实现长期战略目标至关重要。本文将深入探讨小米IPD流程中的技术路线图规划,分...
华为IPD是什么   0  
  在当今快速变化的商业环境中,项目管理的高效执行是企业成功的关键。为了应对日益复杂的产品开发挑战,企业纷纷寻求将产品开发流程(Product Development Process, PDCP)与集成产品开发(Integrated Product Development, IPD)流程相结合的策略,以实现更高效、更协同的...
IPD管理   0  
  在当今竞争激烈的市场环境中,提高客户满意度是企业持续发展和成功的关键。为了实现这一目标,企业需要不断优化其产品开发和管理流程。IPD(Integrated Product Development,集成产品开发)流程图作为一种高效的项目管理工具,能够帮助企业实现跨部门协作、优化资源配置,并最终提升客户满意度。本文将深入探...
IPD流程是谁发明的   0  
  在项目管理领域,集成产品开发(IPD, Integrated Product Development)流程被视为提升项目成功率的关键框架。IPD通过其系统化的方法,将产品开发过程中的各个阶段紧密连接,确保从概念到市场的每一步都经过深思熟虑和高效执行。本文将深入探讨IPD流程的六个核心阶段如何深刻影响项目成功,并为项目管...
IPD流程中CDCP   0  
热门文章
项目管理软件有哪些?
云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用