如何写入现有的 excel 文件而不覆盖数据(使用 pandas)?

2024-11-28 08:37:00
admin
原创
253
摘要:问题描述:我使用 pandas 以以下方式写入 excel 文件:import pandas writer = pandas.ExcelWriter('Masterfile.xlsx') data_filtered.to_excel(writer, "Main", cols=['Di...

问题描述:

我使用 pandas 以以下方式写入 excel 文件:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx 已包含多个不同的选项卡。但是,它尚未包含“Main”。

Pandas 正确地写入“主”表,但不幸的是它也会删除所有其他选项卡。


解决方案 1:

Pandas 文档称它使用 openpyxl 来处理 xlsx 文件。快速浏览一下代码,ExcelWriter你会发现类似下面的方法可能会奏效:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

解决方案 2:

更新:从 Pandas 1.3.0 开始,以下函数将无法正常工作,因为函数DataFrame.to_excel()已经pd.ExcelWriter()发生更改 -if_sheet_exists引入了一个新参数,这使得下面的函数无效。

您可以在这里找到适用于 Pandas 1.3.0+的更新版本。append_df_to_excel()


这是一个辅助函数:

import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    @param filename: File path or existing ExcelWriter
                     (Example: '/path/to/file.xlsx')
    @param df: DataFrame to save to workbook
    @param sheet_name: Name of sheet which will contain DataFrame.
                       (default: 'Sheet1')
    @param startrow: upper left cell row to dump data frame.
                     Per default (startrow=None) calculate the last row
                     in the existing DF and write to the next row...
    @param truncate_sheet: truncate (remove and recreate) [sheet_name]
                           before writing DataFrame to Excel file
    @param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
                            [can be a dictionary]
    @return: None

    Usage examples:

    >>> append_df_to_excel('d:/temp/test.xlsx', df)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
                           index=False)

    >>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', 
                           index=False, startrow=25)

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    # Excel file doesn't exist - saving and exiting
    if not os.path.isfile(filename):
        df.to_excel(
            filename,
            sheet_name=sheet_name, 
            startrow=startrow if startrow is not None else 0, 
            **to_excel_kwargs)
        return
    
    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

    # try to open an existing workbook
    writer.book = load_workbook(filename)
    
    # get the last row in the existing Excel sheet
    # if it was not specified explicitly
    if startrow is None and sheet_name in writer.book.sheetnames:
        startrow = writer.book[sheet_name].max_row

    # truncate sheet
    if truncate_sheet and sheet_name in writer.book.sheetnames:
        # index of [sheet_name] sheet
        idx = writer.book.sheetnames.index(sheet_name)
        # remove [sheet_name]
        writer.book.remove(writer.book.worksheets[idx])
        # create an empty sheet [sheet_name] using old index
        writer.book.create_sheet(sheet_name, idx)
    
    # copy existing sheets
    writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

已使用以下版本进行测试:

  • Pandas 1.2.3

  • Openpyxl 3.0.5

解决方案 3:

有了openpyxl版本2.4.0pandas版本0.19.2,@ski 提出的流程变得更简单一些:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

解决方案 4:

从 pandas 0.24 开始,你可以使用mode关键字参数简化这一过程ExcelWriter

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

解决方案 5:

我知道这是一个较旧的线程,但这是您在搜索时找到的第一个项目,如果您需要保留已创建的工作簿中的图表,则上述解决方案不起作用。在这种情况下,xlwings 是一个更好的选择 - 它允许您写入 excel 簿并保留图表/图表数据。

简单示例:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

解决方案 6:

老问题了,但我猜有些人仍在搜索这个问题 - 所以......

我觉得这种方法很好,因为所有工作表都加载到工作表名称和数据框对的字典中,该字典由 pandas 使用 sheetname=None 选项创建。在将电子表格读入字典格式并将其从字典写回之间,添加、删除或修改工作表非常简单。对我来说,就速度和格式而言,xlsxwriter 比 openpyxl 更适合这项特定任务。

注意:未来版本的 pandas(0.21.0+)将把“sheetname”参数更改为“sheet_name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

以 2013 年的问题为例:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

解决方案 7:

pandas 0.24中有一个更好的解决方案:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

前:

在此处输入图片描述

后:

在此处输入图片描述

所以现在就升级你的熊猫吧:

pip install --upgrade pandas

解决方案 8:

2023:简短回答(pandas 1.3.0+)

仅更新单元格值。保持格式和其他单元格不变。

with pd.ExcelWriter('my-existing-file.xlsx', engine='openpyxl', mode="a",
                    if_sheet_exists="overlay"  # => update value only
                    ) as writer:
    df.to_excel(writer, sheet_name="my-existing-sheet", index=None, startcol=2, startrow=3)  # Start at C4 

解决方案 9:

@MaxU 的解决方案不适用于更新版本的 python 和相关软件包。它引发错误:
“zipfile.BadZipFile:文件不是 zip 文件”

我生成了该函数的新版本,该版本可与更新版本的 python 和相关软件包配合使用,并使用 python 进行了测试:3.9 | openpyxl:3.0.6 | pandas:1.2.3

此外,我还为辅助函数添加了更多功能:

  1. 现在它根据单元格内容宽度调整所有列的大小,并且所有变量都将可见(参见“resizeColumns”)

  2. 您可以处理 NaN,如果您希望 NaN 显示为 NaN 或空单元格(参见“na_rep”)

  3. 添加了“startcol”,您可以决定从特定列开始写入,否则将从 col = 0 开始

这里是函数:

import pandas as pd

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
    truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file

      resizeColumns: default = True . It resize all columns based on cell content width
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]
      na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''


    Returns: None

    *******************

    CONTRIBUTION:
    Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
    Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
    Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)


    Features of the new helper function:
    1) Now it works with python 3.9 and latest versions of pandas and openpxl
    ---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
    2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
    3) You can handle NaN,  if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
    4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0

    *******************



    """
    from openpyxl import load_workbook
    from string import ascii_uppercase
    from openpyxl.utils import get_column_letter
    from openpyxl import Workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    try:
        f = open(filename)
        # Do something with the file
    except IOError:
        # print("File not accessible")
        wb = Workbook()
        ws = wb.active
        ws.title = sheet_name
        wb.save(filename)

    writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')


    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        # startrow = -1
        startrow = 0

    if startcol is None:
        startcol = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)


    if resizeColumns:

        ws = writer.book[sheet_name]

        def auto_format_cell_width(ws):
            for letter in range(1,ws.max_column):
                maximum_value = 0
                for cell in ws[get_column_letter(letter)]:
                    val_to_check = len(str(cell.value))
                    if val_to_check > maximum_value:
                        maximum_value = val_to_check
                ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2

        auto_format_cell_width(ws)

    # save the workbook
    writer.save()

用法示例:

# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
                    'colors': ['red', 'white', 'blue'],
                    'colorsTwo': ['yellow', 'white', 'blue'],
                    'NaNcheck': [float('NaN'), 1, float('NaN')],
                    })

# EDIT YOUR PATH FOR THE EXPORT 
filename = r"C:DataSciencedf.xlsx"   

# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCELFILE 
  
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

解决方案 10:

我使用了这里描述的答案

from openpyxl import load_workbook
writer = pd.ExcelWriter(p_file_name, engine='openpyxl', mode='a')
writer.book = load_workbook(p_file_name)
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
df.to_excel(writer, 'Data', startrow=10, startcol=20)
writer.save()

解决方案 11:

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

这一切都很好,唯一的问题是主文件(我们添加新工作表的文件)的格式丢失了。

解决方案 12:

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“keep_date_col”希望对你有帮助

解决方案 13:

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

解决方案 14:

@MaxU 的解决方案非常有效。我只有一个建议:

如果指定了 truncate_sheet=True,则不应从现有工作表中保留“startrow”。我建议:

        if startrow is None and sheet_name in writer.book.sheetnames:
            if not truncate_sheet: # truncate_sheet would use startrow if provided (or zero below)
                startrow = writer.book[sheet_name].max_row

解决方案 15:

我建议使用 xlwings ( https://docs.xlwings.org/en/stable/api.html ),它对于这个应用程序来说确实非常强大...这是我使用它的方式:

import xlwings as xw
import pandas as pd
import xlsxwriter

# function to get the active workbook
def getActiveWorkbook():
    try:
        # logic from xlwings to grab the current excel file
        activeWb = xw.books.active
    except:
        # print error message if unable to get the current workbook
        print('Unable to grab the current Workbook')
        pause()
        exitProgram()
    else:
        return activeWb

# function that returns the last row number and last cell of a sheet
def getLastRow(myBook, sheetName):
    lastRow = myBook.sheets[sheetName].range("A1").current_region.last_cell.row
    lastCol = str(xlsxwriter.utility.xl_col_to_name(myBook.sheets[sheetName].range("A1").current_region.last_cell.column))
    return str(lastRow), lastCol + str(lastRow)

activeWb = getActiveWorkbook()
df = pd.DataFrame(data=[1,2,3])

# look at worksheet = Part Number Status
sheetName = "Sheet1"
ws = activeWb.sheets[sheetName]
lastRow, lastCell = getLastRow(activeWb, sheetName)
if int(lastRow) > 1:
    ws.range("A1:" + lastCell).clear()
ws.range("A1").options(index=False, header=False).value = df.fillna('')

这似乎对我的应用程序非常有效,因为 .xlsm 工作簿可能非常棘手。您可以将其作为 Python 脚本执行,或使用 pyinstaller 将其转换为可执行文件,然后通过 excel 宏运行 .exe。您还可以使用 xlwings 从 Python 调用 VBA 宏,这非常有用。

解决方案 16:

方法:

  • 如果不存在则可以创建文件

  • 根据工作表名称附加到现有 Excel

import pandas as pd
from openpyxl import load_workbook

def write_to_excel(df, file):
    try:
        book = load_workbook(file)
        writer = pd.ExcelWriter(file, engine='openpyxl') 
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
        df.to_excel(writer, **kwds)
        writer.save()
    except FileNotFoundError as e:
        df.to_excel(file, **kwds)

用法:

df_a = pd.DataFrame(range(10), columns=["a"])
df_b = pd.DataFrame(range(10, 20), columns=["b"])
write_to_excel(df_a, "test.xlsx", sheet_name="Sheet a", columns=['a'], index=False)
write_to_excel(df_b, "test.xlsx", sheet_name="Sheet b", columns=['b'])

解决方案 17:

您可以使用 pandas.DataFrame.to_excel() 方法并将模式参数指定为“a”(附加模式),将数据写入现有 Excel 文件,而无需使用 pandas 覆盖数据。

以下是一个例子:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Write the DataFrame to an existing Excel file in append mode
df.to_excel('existing_file.xlsx', engine='openpyxl', mode='a', index=False, sheet_name='Sheet1')
相关推荐
  为什么项目管理通常仍然耗时且低效?您是否还在反复更新电子表格、淹没在便利贴中并参加每周更新会议?这确实是耗费时间和精力。借助软件工具的帮助,您可以一目了然地全面了解您的项目。如今,国内外有足够多优秀的项目管理软件可以帮助您掌控每个项目。什么是项目管理软件?项目管理软件是广泛行业用于项目规划、资源分配和调度的软件。它使项...
项目管理软件   1247  
  IPD(Integrated Product Development)研发管理体系作为一种先进的产品开发管理模式,在众多企业中得到了广泛应用,助力企业提升产品开发效率、降低成本、增强市场竞争力。深入理解其核心要素,对于企业成功实施 IPD 研发管理体系至关重要。以下将对 IPD 研发管理体系的四大核心要素进行详细解析。...
IPD流程中的charter   19  
  IPD(Integrated Product Development)研发管理体系强调将产品开发视为一个完整的流程,从市场需求出发,整合企业的各种资源,实现产品的快速、高质量交付。在这个过程中,成本控制是至关重要的一环,它直接关系到产品的竞争力和企业的盈利能力。有效的成本控制能够确保在不牺牲产品质量和性能的前提下,降低...
IPD开发流程管理   22  
  IPD(Integrated Product Development)项目管理作为一种先进的产品开发管理模式,在众多企业中得到了广泛应用。它通过整合跨部门团队,实现从概念到产品上市的全流程高效管理,提升产品竞争力。深入探讨IPD项目管理的六个关键阶段,对于企业理解和运用这一模式,优化产品开发流程具有重要意义。概念阶段概...
IPD概念阶段   29  
  IPD(Integrated Product Development)流程管理作为一种先进的产品开发管理模式,旨在通过整合各种资源,实现产品开发的高效、协同与创新。在这一流程管理体系下,产品质量保障成为企业关注的核心要点之一。有效的产品质量保障策略不仅能够提升产品的市场竞争力,还能为企业赢得良好的声誉和客户忠诚度。接下...
华为IPD   24  
热门文章
项目管理软件有哪些?
云禅道AD
禅道项目管理软件

云端的项目管理软件

尊享禅道项目软件收费版功能

无需维护,随时随地协同办公

内置subversion和git源码管理

每天备份,随时转为私有部署

免费试用