记账流程

使用到的工具

电脑
Actual
Python
Java
Excel

消费方式

中国银行卡
工商银行卡
交通银行信用卡
支付宝：余额、中国银行卡、交通银行信用卡
微信：余额、中国银行卡、交通银行信用卡
美团：美团钱包、数字人民币、中国银行卡、支付宝、微信、交通银行信用卡
美团外卖：美团钱包、数字人民币、中国银行卡
大众点评：美团钱包
饿了么：支付宝
拼多多：钱包、中国银行卡
京东：微信余额、中国银行卡
数字人民币：中国银行卡
12306：支付宝、微信

计费思路

通过设置不同的账户，将不同的账户的交易记录导出。

尽量减少对于银行卡记录的依赖，尽量使用第一手交易记录，以防止交易备注信息的丢失。

尽量使用一个中心点（银行卡），多种方式一对一连接消费，对于有单独使用支付方式（余额）的支付工具，单独设置账户。

如：中国银行卡扣款的，选中扣款项，将这一条数据作为转账给对应的消费账户，这时消费账户如美团钱包，会对账清零。

但是会有一种情况是，美团使用微信支付，然后微信有余额时扣除余额，没有余额时扣除银行卡，造成将账单数据导入之后无法判断哪个应该转给美团钱包对账。

第一种是将微信的扣款顺序改为银行卡优先，然后扫描二维码手动选择余额，这样子还是可能会出现多条美团消费扣款方式不一样，但是可能会减少。第二种手动将微信导出的excel支付方式列整合到备注中，一起导入，然后使用搜索来判断是否需要转移给美团钱包对账

最终设立的账户

中国银行卡（去除记录：支付宝、微信、美团钱包）
工商银行卡
交通银行信用卡
支付宝（去除记录：美团）
微信（去除记录：美团商户平台）
美团钱包

获取消费记录

不同消费方式获取消费记录的方式和获取到的文件格式也不同。

中国银行卡

登录手机银行，搜索交易流水打印，发送到邮箱。可以获取到 PDF 消费记录。

支付宝

进入手机APP-我的-账单-(点击右上角···图标)-开局交易流水证明-申请。可以获取到 Excel 消费记录。

微信

进入手机APP-我-服务-钱包-客服中心-(在常用工具中找到下载账单)-用于个人对账。可以获取到 Excel 消费记录。

美团钱包

进入美团手机APP-钱包-银行卡-银行卡交易记录查看

PDF 提取 Excel

解密 PDF

中国银行提供的 PDF 为加密文件，需要先通过浏览器打开，然后使用浏览器的打印功能另存为新的 PDF。

使用 Python 代码提取 PDF 中的表格

安装依赖

pip install tabula-py pandas jpype1 xlsxwriter -i https://mirrors.aliyun.com/pypi/simple/

创建一个代码文件，PDF 文件路径填写 pdf_path 中，执行完成后在代码文件目录下会存在一个 output.xlsx Excel 文件。

import tabula
import pandas as pd

# 指定PDF文件路径
pdf_path = "C:\\Files\\个人\\0001-2.pdf"

# 使用tabula提取PDF中的表格，返回一个DataFrame列表
tables = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)

# 合并所有表格到一个DataFrame
combined_table = pd.concat(tables, ignore_index=True)

# 将合并后的表格保存到一个Excel文件中
excel_path = "output.xlsx"

# 创建一个Excel writer对象
with pd.ExcelWriter(excel_path, engine='xlsxwriter') as writer:
    # 将合并的表格写入Excel中的一个工作表
    combined_table.to_excel(writer, sheet_name='CombinedSheet', index=False)

print(f"PDF中的表格已成功合并并保存到 {excel_path}")

问题

银行卡对于来自第三方APP的消费记录的备注不明晰，如支付宝就只会备注”支付宝”

设置支付宝账号备注为”支付宝-账号”，微信账号备注为”微信-账号”，美团账号备注为”美团-账号”，这样子在导入银行卡数据时就可以通过备注来判断是否需要转账给对应的消费账户进行对账。

微信、支付宝、美团有时候会使用余额支付，这部分记录不会出现在银行卡中

当使用余额之后，因为没有银行卡转账，所以自动扣除余额部分。

有时候退款会退回到APP的余额，只统计银行卡的话就只会漏掉这部分退款，如果不记录，会存在”支出”大于”收入”的情况，如果要记录，会存在”相同的”支付记录

每个有余额的账号单独设置一个账户，记录余额的变动，这样子就可以通过余额的变动来判断是否需要转账给对应的消费账户进行对账。

中国银行卡会将多次的小额支付记录合并，如支付宝支付多次，到银行卡中就只会记录一条数据，并且是自动计算过的记录

手动计算每次小额支付的金额，按照支付宝的记录来计算每次小额支付的金额，然后将这一条数据作为转账给对应的消费账户，会对账清零。

导出账单压缩包处理脚本说明

这是一个AI生成的Python脚本，用于自动处理各种财务数据文件，包括：

PDF文件（使用tabula-py提取表格）
Excel文件（微信支付账单）
CSV文件（支付宝、美团账单）

脚本会将这些文件转换为标准CSV格式，并清理output目录，只保留四个最终的处理结果文件。 KA为中国银行卡PDF文件处理结果，wechat为微信账单处理结果，alipay为支付宝账单处理结果，meituan为美团账单处理结果。

使用方式

创建password.txt文件，包含解压密码：

支付宝: your_alipay_password
微信: your_wechat_password
美团: your_meituan_password
KA: your_ka_password

运行脚本：
Terminal window
```
python convert_financial_data.py
```
处理完成后，output目录将只包含四个CSV文件：
- ka_processed.csv (PDF处理结果)
- wechat_processed.csv (微信处理结果)
- alipay_processed.csv (支付宝处理结果)
- meituan_processed.csv (美团处理结果)

需要的工具

Python 3.6+
支持的操作系统：Windows/Linux/macOS

需要安装的依赖

使用pip安装以下Python包：

pip install pandas pyunpack patool chardet

依赖说明

pandas: 数据处理和CSV操作
pyunpack: 文件解压
patool: 压缩文件处理
chardet: 编码检测

注意事项

确保输入文件格式正确
密码文件必须存在且格式正确
脚本会自动清理中间文件，只保留最终结果

代码

import os
import zipfile
import pandas as pd
import pyxlsb
import tabula
import csv
import py7zr  # 添加7z支持
import rarfile  # 添加rar支持
from pyunpack import Archive

def read_passwords(password_file):
    """读取密码文件"""
    passwords = {}
    with open(password_file, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if ':' in line:
                key, pwd = line.split(':', 1)
                passwords[key] = pwd
    return passwords

def process_meituan_csv(file_path, output_dir):
    """处理美团CSV文件：删除前19行"""
    output_path = os.path.join(output_dir, 'meituan_processed.csv')
    if os.path.exists(output_path):
        print(f"美团文件已存在，跳过处理: {output_path}")
        return

    # 美团CSV使用"/"作为行分隔符，需要特殊处理
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()

    # 找到【美团交易账单明细列表】行
    start_line_idx = 0
    for i, line in enumerate(lines):
        if '【美团交易账单明细列表】' in line:
            start_line_idx = i + 1  # 下一行是头部
            break

    # 从头部开始处理每一行
    csv_lines = []
    for line in lines[start_line_idx:]:
        line = line.strip()
        if line.endswith(',/'):
            line = line[:-2]  # 去掉末尾的",/"
        if line:  # 跳过空行
            csv_lines.append(line)

    # 组合成CSV内容
    csv_content = '\n'.join(csv_lines)

    import io
    df = pd.read_csv(io.StringIO(csv_content))

    df.to_csv(output_path, index=False, encoding='utf-8')
    print(f"美团文件已处理并保存至: {output_path}")


def process_wechat_xlsx(file_path, output_dir):
    """处理微信Excel文件：删除前17行并转为CSV"""
    # 使用pandas读取Excel文件，跳过前17行
    df = pd.read_excel(file_path, skiprows=17)
    output_path = os.path.join(output_dir, 'wechat_processed.csv')
    df.to_csv(output_path, index=False)
    print(f"微信文件已处理并保存至: {output_path}")

def process_alipay_csv(file_path, output_dir):
    """处理支付宝CSV文件：以GBK编码读取，删除前24行，以UTF-8保存"""
    df = pd.read_csv(file_path, encoding='gbk', skiprows=24)
    output_path = os.path.join(output_dir, 'alipay_processed.csv')
    df.to_csv(output_path, index=False, encoding='utf-8')
    print(f"支付宝文件已处理并保存至: {output_path}")

def process_pdf(pdf_path, password, output_dir):
    """处理PDF文件：使用密码打开并提取表格"""
    try:
        # 检查Java是否可用
        import subprocess
        result = subprocess.run(['java', '-version'], capture_output=True, text=True)
    except FileNotFoundError:
        print("警告: 未找到Java环境，无法处理PDF文件。请安装Java运行时环境(JRE)。")
        return

    try:
        # 使用tabula提取PDF中的所有表格
        tables = tabula.read_pdf(pdf_path, password=password, pages='all', multiple_tables=True)

        # 合并所有表格到一个DataFrame
        if isinstance(tables, list) and len(tables) > 0:
            combined_df = pd.concat(tables, ignore_index=True)
            output_path = os.path.join(output_dir, 'ka_processed.csv')
            combined_df.to_csv(output_path, index=False)
            print(f"PDF文件已处理并保存至: {output_path}")
        elif isinstance(tables, pd.DataFrame):
            output_path = os.path.join(output_dir, 'ka_processed.csv')
            tables.to_csv(output_path, index=False)
            print(f"PDF文件已处理并保存至: {output_path}")
        else:
            print("未能从PDF中提取到数据")
    except Exception as e:
        print(f"处理PDF文件时发生错误: {e}")
        print("请确保已安装Java运行时环境(JRE)并且版本兼容。")

def extract_archive(file_path, password, output_dir):
    """
    通用解压函数，尝试多种压缩格式
    """
    import tempfile
    import os
    import shutil

    # 尝试使用zipfile解压
    try:
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            zip_ref.extractall(output_dir, pwd=password.encode())
        return True
    except Exception as e:
        print(f"标准zip解压失败: {e}")

    # 尝试使用py7zr解压
    try:
        import py7zr
        with py7zr.SevenZipFile(file_path, mode='r', password=password) as archive:
            archive.extractall(path=output_dir)
        return True
    except Exception as e:
        print(f"7z解压失败: {e}")

    # 尝试使用pyunpack解压
    try:
        import pyunpack
        # pyunpack不直接支持密码，所以我们需要先复制文件到临时位置再尝试
        # 对于带密码的压缩包，pyunpack可能无法直接处理，我们使用命令行工具
        import subprocess
        import sys

        # 使用7z命令行工具（如果可用），处理文件路径中的特殊字符
        try:
            # 使用短路径和英文名称来避免路径问题
            abs_file_path = os.path.abspath(file_path)
            abs_output_dir = os.path.abspath(output_dir)

            # 使用PowerShell调用7z，以更好地处理特殊字符
            ps_script = f'''
            try {{
                $output = & "7z" "x" "-p{password}" "-y" ''' + '"$env:FILE_PATH"' + f''' "-o$env:OUTPUT_DIR"
                exit $LASTEXITCODE
            }} catch {{
                Write-Error $_
                exit 1
            }}
            '''

            # 设置环境变量以避免路径问题
            env = os.environ.copy()
            env['FILE_PATH'] = abs_file_path
            env['OUTPUT_DIR'] = abs_output_dir

            result = subprocess.run(
                ['powershell', '-Command', ps_script],
                capture_output=True,
                text=True,
                check=False,
                env=env
            )

            # 检查是否成功
            if result.returncode == 0:
                return True
            else:
                print(f"7z解压失败，返回码: {result.returncode}, 错误: {result.stderr}")
                # 如果直接使用原始路径失败，尝试使用临时文件的方法
                with tempfile.TemporaryDirectory() as temp_dir:
                    # 复制压缩文件到临时目录，使用简单英文文件名
                    temp_zip_path = os.path.join(temp_dir, "temp.zip")
                    shutil.copy2(file_path, temp_zip_path)

                    # 使用临时文件路径
                    temp_env = os.environ.copy()
                    temp_env['FILE_PATH'] = temp_zip_path
                    temp_env['OUTPUT_DIR'] = abs_output_dir

                    temp_ps_script = f'''
                    try {{
                        $output = & "7z" "x" "-p{password}" "-y" ''' + '"$env:FILE_PATH"' + f''' "-o$env:OUTPUT_DIR"
                        exit $LASTEXITCODE
                    }} catch {{
                        Write-Error $_
                        exit 1
                    }}
                    '''

                    result2 = subprocess.run(
                        ['powershell', '-Command', temp_ps_script],
                        capture_output=True,
                        text=True,
                        check=False,
                        env=temp_env
                    )

                    if result2.returncode == 0:
                        return True
                    else:
                        print(f"7z临时路径解压也失败，返回码: {result2.returncode}, 错误: {result2.stderr}")

        except FileNotFoundError:
            print("7z命令未找到，尝试其他方法")
        except Exception as e:
            print(f"7z执行异常: {e}")

        # 如果7z不可用，尝试使用Windows自带的PowerShell命令
        try:
            script = f'''
            Add-Type -AssemblyName System.IO.Compression.FileSystem
            $tempPath = [System.IO.Path]::GetTempFileName()
            [System.IO.File]::Delete($tempPath)
            [System.IO.Directory]::CreateDirectory($tempPath)

            # 这里我们需要使用支持密码的解压方法
            # 由于PowerShell默认的Expand-Archive不支持密码，我们跳过此方法
            throw "PowerShell解压方法不适用于加密压缩包"
            '''
            result = subprocess.run([
                'powershell', '-Command', script
            ], capture_output=True, text=True)
        except Exception as e:
            print(f"PowerShell解压方法也不适用: {e}")

    except Exception as e:
        print(f"pyunpack相关方法失败: {e}")

    # 如果上述方法都失败，返回False
    return False


def main():
    current_dir = '.'
    output_dir = os.path.join(current_dir, 'output')

    # 创建输出目录
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # 读取密码
    passwords = read_passwords('password.txt')

    # 查找并处理各文件
    for filename in os.listdir(current_dir):
        if filename.endswith(('.zip', '.7z', '.rar')):
            if '美团' in filename:
                # 解压美团文件
                zip_path = os.path.join(current_dir, filename)
                try:
                    success = extract_archive(zip_path, passwords['美团'], output_dir)

                    if not success:
                        print(f"无法解压美团文件: {filename}")
                        # 尝试其他方法：使用系统工具或手动处理
                        print("尝试使用备用方法处理美团文件...")
                        # 在这里可以添加其他处理方法，比如提示用户手动解压等
                        continue
                except Exception as e:
                    print(f"解压美团文件时发生严重错误: {e}")
                    print("跳过美团文件处理，继续处理其他文件...")
                    continue

                # 找到解压后的CSV文件
                extracted_files = os.listdir(output_dir)
                meituan_found = False
                for extracted_file in extracted_files:
                    if extracted_file.endswith('.csv'):
                        if '美团' in extracted_file or 'meituan' in extracted_file.lower():
                            process_meituan_csv(os.path.join(output_dir, extracted_file), output_dir)
                            meituan_found = True
                            break

                if not meituan_found:
                    # 如果没找到明确包含"美团"的CSV，尝试处理目录中的任何CSV文件
                    for extracted_file in extracted_files:
                        if extracted_file.endswith('.csv') and not any(name in extracted_file for name in ['alipay', 'wechat', 'ka']):
                            process_meituan_csv(os.path.join(output_dir, extracted_file), output_dir)
                            meituan_found = True
                            break

            elif '美团' in filename and filename.endswith('.csv'):
                # 美团文件是直接的CSV文件，无需解压
                process_meituan_csv(os.path.join(current_dir, filename), output_dir)

            elif '微信' in filename:
                # 解压微信文件
                zip_path = os.path.join(current_dir, filename)
                success = extract_archive(zip_path, passwords['微信'], output_dir)

                if not success:
                    print(f"无法解压微信文件: {filename}")
                    continue

                # 找到解压后的Excel文件
                extracted_files = os.listdir(output_dir)
                for extracted_file in extracted_files:
                    if extracted_file.endswith(('.xlsx', '.xls', '.xlsb')) and '微信' in extracted_file:
                        process_wechat_xlsx(os.path.join(output_dir, extracted_file), output_dir)
                        break

            elif '支付宝' in filename:
                # 解压支付宝文件
                zip_path = os.path.join(current_dir, filename)
                success = extract_archive(zip_path, passwords['支付宝'], output_dir)

                if not success:
                    print(f"无法解压支付宝文件: {filename}")
                    continue

                # 找到解压后的CSV文件
                extracted_files = os.listdir(output_dir)
                for extracted_file in extracted_files:
                    if extracted_file.endswith('.csv') and '支付宝' in extracted_file:
                        process_alipay_csv(os.path.join(output_dir, extracted_file), output_dir)
                        break

        elif filename.startswith('KA') and filename.endswith('.pdf'):
            pdf_path = os.path.join(current_dir, filename)
            process_pdf(pdf_path, passwords['KA'], output_dir)

    # 清理output目录，只保留最终的四个CSV文件
    final_files = {'ka_processed.csv', 'wechat_processed.csv', 'alipay_processed.csv', 'meituan_processed.csv'}
    for file in os.listdir(output_dir):
        if file not in final_files:
            file_path = os.path.join(output_dir, file)
            try:
                os.remove(file_path)
                print(f"已删除中间文件: {file}")
            except Exception as e:
                print(f"删除文件失败 {file}: {e}")

if __name__ == "__main__":
    main()