py
This commit is contained in:
parent
a156fff7f4
commit
41feafbd2b
203
README_Python.md
Normal file
203
README_Python.md
Normal file
@ -0,0 +1,203 @@
|
||||
# PDF书签合并工具 - Python版
|
||||
|
||||
## 📋 简介
|
||||
|
||||
这是PDF书签合并工具的Python版本,使用Python + Tkinter开发,具有以下特点:
|
||||
|
||||
- 🚀 **运行速度快**:Python优化算法,处理效率高
|
||||
- 📦 **绿色软件**:无需安装,打包后独立运行
|
||||
- 🎯 **智能分组**:自动按文件名前缀合并相关文件
|
||||
- 📊 **实时日志**:详细的处理进度和错误信息
|
||||
- 🌍 **编码兼容**:支持UTF-8、GBK、GB2312等多种编码
|
||||
- 💾 **独立程序**:打包后无任何依赖,复制即用
|
||||
|
||||
## 🔧 环境要求
|
||||
|
||||
### 开发环境
|
||||
- Python 3.7 或更高版本
|
||||
- 标准库:tkinter, pathlib, re, os, sys, logging
|
||||
|
||||
### 运行环境(打包后)
|
||||
- Windows 7 SP1 或更高版本
|
||||
- **无需安装Python**(打包后独立运行)
|
||||
|
||||
## 📁 文件结构
|
||||
|
||||
```
|
||||
SlideCombine/
|
||||
├── slide_combine.py # 主程序文件
|
||||
├── build_exe.py # 自动打包脚本
|
||||
├── requirements.txt # 依赖包列表(仅开发需要)
|
||||
├── README_Python.md # Python版说明文档
|
||||
├── app.ico # 程序图标(可选)
|
||||
└── dist/ # 打包输出文件夹
|
||||
└── SlideCombine.exe # 最终可执行文件
|
||||
```
|
||||
|
||||
## 🚀 使用方法
|
||||
|
||||
### 方法一:直接运行Python脚本
|
||||
```bash
|
||||
python slide_combine.py
|
||||
```
|
||||
|
||||
### 方法二:自动打包成exe
|
||||
```bash
|
||||
python build_exe.py
|
||||
```
|
||||
|
||||
打包完成后,会在 `dist` 文件夹中生成 `SlideCombine.exe`,可以直接运行。
|
||||
|
||||
## 🎯 功能特点
|
||||
|
||||
### 1. 智能文件分组
|
||||
- 自动识别文件名前缀
|
||||
- 例如:`CH-875 1-3` 和 `CH-875 4-6` → `CH-875.txt`
|
||||
|
||||
### 2. 多格式书签解析
|
||||
- 支持阿拉伯数字页码:1, 2, 3...
|
||||
- 支持罗马数字页码:I, II, III...
|
||||
- 自动识别页码位置
|
||||
|
||||
### 3. 编码自动检测
|
||||
- 优先尝试:UTF-8, GBK, GB2312
|
||||
- 兜底方案:Latin-1 确保文件可读
|
||||
- 输出统一使用UTF-8 with BOM
|
||||
|
||||
### 4. 友好的用户界面
|
||||
- 现代化Tkinter界面设计
|
||||
- 实时日志显示,支持颜色区分
|
||||
- 进度跟踪和错误提示
|
||||
|
||||
## 📦 部署说明
|
||||
|
||||
### 绿色软件特性
|
||||
- 无需安装任何软件
|
||||
- 不写入注册表
|
||||
- 不依赖系统框架
|
||||
- 可以直接在U盘运行
|
||||
|
||||
### 系统兼容性
|
||||
- ✅ Windows 7 SP1
|
||||
- ✅ Windows 8/8.1
|
||||
- ✅ Windows 10/11
|
||||
- ✅ 支持32位和64位系统
|
||||
|
||||
### 文件大小
|
||||
- 压缩前:约15-20 MB(包含Python运行时)
|
||||
- 解压后内存占用:约30-50 MB
|
||||
|
||||
## 🔄 与C#版本对比
|
||||
|
||||
| 特性 | Python版 | C#版 |
|
||||
|------|----------|-------|
|
||||
| 运行环境 | Windows全系统 | 需要.NET Framework |
|
||||
| 打包大小 | 较大(15-20MB) | 较小(30-50KB) |
|
||||
| 启动速度 | 稍慢 | 很快 |
|
||||
| 开发效率 | 高 | 中 |
|
||||
| 维护性 | 高 | 中 |
|
||||
| 跨平台 | 支持 | 仅Windows |
|
||||
|
||||
## 🛠️ 开发说明
|
||||
|
||||
### 代码架构
|
||||
- **BookmarkItem**: 书签数据模型
|
||||
- **DocumentMetadata**: 文档元数据模型
|
||||
- **BookmarkExtractor**: 书签提取器
|
||||
- **FileProcessor**: 文件处理器
|
||||
- **SlideCombineGUI**: 图形用户界面
|
||||
|
||||
### 打包技术
|
||||
- 使用PyInstaller进行打包
|
||||
- 单文件模式,所有依赖打包在一起
|
||||
- 无窗口控制台,纯GUI应用
|
||||
- 支持UPX压缩减小体积
|
||||
|
||||
### 性能优化
|
||||
- 文件I/O操作优化
|
||||
- 内存使用优化
|
||||
- 编码检测缓存
|
||||
- 界面响应优化
|
||||
|
||||
## 📝 使用示例
|
||||
|
||||
### 目录结构示例
|
||||
```
|
||||
PDF文件夹/
|
||||
├── CH-875 1-3/
|
||||
│ └── FreePic2Pdf_bkmk.txt
|
||||
├── CH-875 4-6/
|
||||
│ └── FreePic2Pdf_bkmk.txt
|
||||
|
||||
TXT源文件/
|
||||
├── CH-875 1-3.txt
|
||||
├── CH-875 4-6.txt
|
||||
|
||||
输出路径/
|
||||
└── CH-875.txt # 合并后的文件
|
||||
```
|
||||
|
||||
### 输出格式示例
|
||||
```
|
||||
title:文档标题
|
||||
Other titles:其他标题
|
||||
Volume:卷期信息
|
||||
ISBN:ISBN号码
|
||||
creator:创作者
|
||||
contributor:贡献者
|
||||
issuedDate:发行日期
|
||||
publisher:出版社
|
||||
place:出版地
|
||||
Classification number:分类号
|
||||
page:页数
|
||||
tableOfContents:
|
||||
书签标题1---------------页码1<br/>
|
||||
书签标题2---------------页码2<br/>
|
||||
subject:主题
|
||||
date:日期范围
|
||||
spatial:地理信息
|
||||
Other ISBN:其他ISBN
|
||||
Other time:其他时间
|
||||
url:链接地址
|
||||
```
|
||||
|
||||
## 🔍 故障排除
|
||||
|
||||
### 常见问题
|
||||
|
||||
**Q1:程序无法启动**
|
||||
- 检查是否有杀毒软件阻止运行
|
||||
- 确认文件没有损坏
|
||||
- 尝试以管理员身份运行
|
||||
|
||||
**Q2:找不到文件**
|
||||
- 检查输入路径是否正确
|
||||
- 确认文件权限
|
||||
- 查看详细日志输出
|
||||
|
||||
**Q3:编码问题**
|
||||
- 程序已支持多种编码,会自动检测
|
||||
- 如果仍有问题,请检查源文件编码
|
||||
- 输出统一使用UTF-8 with BOM
|
||||
|
||||
**Q4:打包失败**
|
||||
- 确保Python环境正常
|
||||
- 安装最新版PyInstaller:`pip install --upgrade pyinstaller`
|
||||
- 检查是否有足够的磁盘空间
|
||||
|
||||
## 📞 技术支持
|
||||
|
||||
- 开源项目,代码完全透明
|
||||
- 使用Python标准库,无第三方依赖
|
||||
- 遵循PEP 8编码规范
|
||||
- 完整的错误处理机制
|
||||
|
||||
## 📜 许可证
|
||||
|
||||
本项目采用MIT许可证,可自由使用和分发。
|
||||
|
||||
---
|
||||
|
||||
**版本:v2.0**
|
||||
**更新日期:2024-11-24**
|
||||
**开发语言:Python 3.7+**
|
||||
387
build_exe.py
Normal file
387
build_exe.py
Normal file
@ -0,0 +1,387 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
PDF书签合并工具 - 自动打包脚本
|
||||
使用PyInstaller将Python程序打包成独立的exe文件
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def check_pyinstaller():
|
||||
"""检查是否安装了PyInstaller"""
|
||||
try:
|
||||
subprocess.run([sys.executable, "-c", "import PyInstaller"],
|
||||
check=True, capture_output=True)
|
||||
print("✅ PyInstaller 已安装")
|
||||
return True
|
||||
except subprocess.CalledProcessError:
|
||||
print("❌ PyInstaller 未安装")
|
||||
return False
|
||||
|
||||
|
||||
def install_pyinstaller():
|
||||
"""安装PyInstaller"""
|
||||
print("正在安装 PyInstaller...")
|
||||
try:
|
||||
subprocess.run([sys.executable, "-m", "pip", "install", "pyinstaller"],
|
||||
check=True)
|
||||
print("✅ PyInstaller 安装成功")
|
||||
return True
|
||||
except subprocess.CalledProcessError as ex:
|
||||
print(f"❌ PyInstaller 安装失败: {ex}")
|
||||
return False
|
||||
|
||||
|
||||
def clean_build():
|
||||
"""清理之前的构建"""
|
||||
print("🧹 清理之前的构建...")
|
||||
|
||||
folders_to_remove = ['build', 'dist', '__pycache__']
|
||||
files_to_remove = ['SlideCombine.spec']
|
||||
|
||||
for folder in folders_to_remove:
|
||||
if os.path.exists(folder):
|
||||
shutil.rmtree(folder)
|
||||
print(f" 已删除文件夹: {folder}")
|
||||
|
||||
for file in files_to_remove:
|
||||
if os.path.exists(file):
|
||||
os.remove(file)
|
||||
print(f" 已删除文件: {file}")
|
||||
|
||||
|
||||
def create_spec_file():
|
||||
"""创建PyInstaller配置文件"""
|
||||
spec_content = '''
|
||||
# -*- mode: python ; coding: utf-8 -*-
|
||||
|
||||
block_cipher = None
|
||||
|
||||
a = Analysis(
|
||||
['slide_combine.py'],
|
||||
pathex=[],
|
||||
binaries=[],
|
||||
datas=[],
|
||||
hiddenimports=[],
|
||||
hookspath=[],
|
||||
hooksconfig={},
|
||||
runtime_hooks=[],
|
||||
excludes=[],
|
||||
win_no_prefer_redirects=False,
|
||||
win_private_assemblies=False,
|
||||
cipher=block_cipher,
|
||||
noarchive=False,
|
||||
)
|
||||
|
||||
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
|
||||
|
||||
exe = EXE(
|
||||
pyz,
|
||||
a.scripts,
|
||||
a.binaries,
|
||||
a.zipfiles,
|
||||
a.datas,
|
||||
[],
|
||||
name='SlideCombine',
|
||||
debug=False,
|
||||
bootloader_ignore_signals=False,
|
||||
strip=False,
|
||||
upx=True,
|
||||
upx_exclude=[],
|
||||
runtime_tmpdir=None,
|
||||
console=False,
|
||||
disable_windowed_traceback=False,
|
||||
argv_emulation=False,
|
||||
target_arch=None,
|
||||
codesign_identity=None,
|
||||
entitlements_file=None,
|
||||
icon='app.ico', # 图标文件(如果存在)
|
||||
version='version_info.txt' # 版本信息(如果存在)
|
||||
)
|
||||
'''
|
||||
|
||||
with open('SlideCombine.spec', 'w', encoding='utf-8') as f:
|
||||
f.write(spec_content)
|
||||
print("✅ 已创建 SlideCombine.spec 配置文件")
|
||||
|
||||
|
||||
def create_version_info():
|
||||
"""创建版本信息文件"""
|
||||
version_info = '''
|
||||
# UTF-8
|
||||
#
|
||||
# For more details about fixed file info 'ffi' see:
|
||||
# http://msdn.microsoft.com/en-us/library/ms646997.aspx
|
||||
VSVersionInfo(
|
||||
ffi=FixedFileInfo(
|
||||
# filevers and prodvers should be always a tuple with four items: (1, 2, 3, 4)
|
||||
# Set not needed items to zero 0.
|
||||
filevers=(2,0,0,0),
|
||||
prodvers=(2,0,0,0),
|
||||
# Contains a bitmask that specifies the valid bits 'flags'r
|
||||
mask=0x3f,
|
||||
# Contains a bitmask that specifies the Boolean attributes of the file.
|
||||
flags=0x0,
|
||||
# The operating system for which this file was designed.
|
||||
# 0x4 - NT and there is no need to change it.
|
||||
OS=0x4,
|
||||
# The general type of file.
|
||||
# 0x1 - the file is an application.
|
||||
fileType=0x1,
|
||||
# The function of the file.
|
||||
# 0x0 - the function is not defined for this fileType
|
||||
subtype=0x0,
|
||||
# Creation date and time stamp.
|
||||
date=(0, 0)
|
||||
),
|
||||
kids=[
|
||||
StringFileInfo(
|
||||
[
|
||||
StringTable(
|
||||
u'080404B0',
|
||||
[StringStruct(u'CompanyName', u'PDF书签合并工具'),
|
||||
StringStruct(u'FileDescription', u'PDF书签合并工具 - 用于从PDF文件夹中提取书签信息,与TXT元数据文件合并'),
|
||||
StringStruct(u'FileVersion', u'2.0.0.0'),
|
||||
StringStruct(u'InternalName', u'SlideCombine'),
|
||||
StringStruct(u'LegalCopyright', u'Copyright (C) 2024'),
|
||||
StringStruct(u'OriginalFilename', u'SlideCombine.exe'),
|
||||
StringStruct(u'ProductName', u'PDF书签合并工具'),
|
||||
StringStruct(u'ProductVersion', u'2.0.0.0')])
|
||||
]),
|
||||
VarFileInfo([VarStruct(u'Translation', [2052, 1200])])
|
||||
]
|
||||
)
|
||||
'''
|
||||
|
||||
with open('version_info.txt', 'w', encoding='utf-8') as f:
|
||||
f.write(version_info)
|
||||
print("✅ 已创建 version_info.txt 版本信息文件")
|
||||
|
||||
|
||||
def build_exe():
|
||||
"""构建exe文件"""
|
||||
print("🔨 开始构建 exe 文件...")
|
||||
|
||||
try:
|
||||
# 构建命令
|
||||
cmd = [
|
||||
sys.executable, '-m', 'PyInstaller',
|
||||
'--clean',
|
||||
'--onefile',
|
||||
'--windowed',
|
||||
'--name=SlideCombine',
|
||||
'slide_combine.py'
|
||||
]
|
||||
|
||||
# 如果有图标文件,添加图标参数
|
||||
if os.path.exists('app.ico'):
|
||||
cmd.insert(-1, '--icon=app.ico')
|
||||
print("✅ 已找到图标文件: app.ico")
|
||||
else:
|
||||
print("⚠️ 未找到图标文件,将使用默认图标")
|
||||
|
||||
print(f"执行命令: {' '.join(cmd)}")
|
||||
|
||||
result = subprocess.run(cmd, check=True, capture_output=True, text=True)
|
||||
print("✅ 构建成功!")
|
||||
return True
|
||||
|
||||
except subprocess.CalledProcessError as ex:
|
||||
print(f"❌ 构建失败: {ex}")
|
||||
if ex.stderr:
|
||||
print(f"错误信息: {ex.stderr}")
|
||||
return False
|
||||
|
||||
|
||||
def create_release_package():
|
||||
"""创建发布包"""
|
||||
print("📦 创建发布包...")
|
||||
|
||||
# 检查dist文件夹
|
||||
dist_path = Path('dist')
|
||||
if not dist_path.exists():
|
||||
print("❌ 未找到 dist 文件夹")
|
||||
return False
|
||||
|
||||
# 查找生成的exe文件
|
||||
exe_files = list(dist_path.glob('*.exe'))
|
||||
if not exe_files:
|
||||
print("❌ 未找到生成的 exe 文件")
|
||||
return False
|
||||
|
||||
exe_file = exe_files[0]
|
||||
print(f"✅ 找到 exe 文件: {exe_file}")
|
||||
|
||||
# 创建发布包文件夹
|
||||
import datetime
|
||||
version = "2.0.0"
|
||||
date = datetime.datetime.now().strftime("%Y%m%d")
|
||||
package_name = f"SlideCombine_v{version}_{date}"
|
||||
|
||||
package_path = Path(package_name)
|
||||
if package_path.exists():
|
||||
shutil.rmtree(package_path)
|
||||
|
||||
package_path.mkdir()
|
||||
print(f"✅ 创建发布包文件夹: {package_name}")
|
||||
|
||||
# 复制exe文件
|
||||
shutil.copy2(exe_file, package_path / 'SlideCombine.exe')
|
||||
print("✅ 复制主程序文件")
|
||||
|
||||
# 创建使用说明
|
||||
readme_content = f"""PDF书签合并工具 v{version} 使用说明
|
||||
=====================================
|
||||
|
||||
系统要求:
|
||||
- Windows 7 SP1 或更高版本
|
||||
- 无需额外安装软件(绿色软件)
|
||||
|
||||
使用方法:
|
||||
1. 双击运行 SlideCombine.exe
|
||||
2. 选择三个路径:
|
||||
- PDF文件夹路径:包含 FreePic2Pdf_bkmk.txt 文件的文件夹
|
||||
- TXT源文件路径:包含元数据 TXT 文件的路径
|
||||
- 输出路径:合并后文件的保存位置
|
||||
3. 点击"开始合并"按钮
|
||||
4. 等待处理完成
|
||||
|
||||
示例目录结构:
|
||||
PDF文件夹/
|
||||
├─ CH-875 1-3/FreePic2Pdf_bkmk.txt
|
||||
├─ CH-875 4-6/FreePic2Pdf_bkmk.txt
|
||||
|
||||
TXT源文件/
|
||||
├─ CH-875 1-3.txt
|
||||
├─ CH-875 4-6.txt
|
||||
|
||||
输出结果:
|
||||
输出路径/CH-875.txt (合并后的文件)
|
||||
|
||||
特点:
|
||||
- 🚀 运行速度快
|
||||
- 📦 绿色软件,无需安装
|
||||
- 🎯 智能文件分组
|
||||
- 📊 详细处理日志
|
||||
- 🌍 支持多种文件编码
|
||||
- 💾 独立程序,无依赖
|
||||
|
||||
故障排除:
|
||||
- 如果程序无法启动,请检查是否有杀毒软件阻止运行
|
||||
- 确保输入的路径存在且有访问权限
|
||||
- 查看日志输出了解详细的处理信息
|
||||
|
||||
版本信息:
|
||||
- 程序版本:v{version}
|
||||
- 构建日期:{date}
|
||||
- 开发语言:Python
|
||||
- 界面框架:Tkinter
|
||||
|
||||
技术支持:
|
||||
这是一个绿色软件,解压即用,无需安装任何依赖项。
|
||||
"""
|
||||
|
||||
with open(package_path / '使用说明.txt', 'w', encoding='utf-8') as f:
|
||||
f.write(readme_content)
|
||||
print("✅ 创建使用说明")
|
||||
|
||||
# 创建启动脚本
|
||||
bat_content = '''@echo off
|
||||
title PDF书签合并工具 v2.0
|
||||
echo 启动 PDF书签合并工具...
|
||||
echo.
|
||||
|
||||
if exist "SlideCombine.exe" (
|
||||
start "" "SlideCombine.exe"
|
||||
echo ✅ 程序已启动
|
||||
) else (
|
||||
echo ❌ 错误:未找到 SlideCombine.exe
|
||||
echo 请确保在正确的目录中运行此脚本
|
||||
pause
|
||||
)
|
||||
|
||||
timeout /t 2 >nul
|
||||
'''
|
||||
|
||||
with open(package_path / '启动程序.bat', 'w', encoding='gbk') as f:
|
||||
f.write(bat_content)
|
||||
print("✅ 创建启动脚本")
|
||||
|
||||
# 获取文件大小
|
||||
exe_size = (package_path / 'SlideCombine.exe').stat().st_size
|
||||
size_mb = exe_size / (1024 * 1024)
|
||||
size_kb = exe_size / 1024
|
||||
|
||||
print(f"""
|
||||
✅ 发布包创建完成!
|
||||
📁 发布包位置: {package_path.absolute()}
|
||||
💾 主程序大小: {size_kb:.1f} KB ({size_mb:.1f} MB)
|
||||
📋 包含内容:
|
||||
├─ SlideCombine.exe (主程序)
|
||||
├─ 使用说明.txt (用户指南)
|
||||
└─ 启动程序.bat (快捷启动)
|
||||
|
||||
🎉 部署说明:
|
||||
1. 将整个 {package_name} 文件夹复制到目标电脑
|
||||
2. 双击"启动程序.bat"或直接运行"SlideCombine.exe"
|
||||
3. 无需安装任何软件,绿色环保
|
||||
""")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("="*50)
|
||||
print(" PDF书签合并工具 - 自动打包脚本")
|
||||
print("="*50)
|
||||
print()
|
||||
|
||||
# 检查Python版本
|
||||
if sys.version_info < (3, 7):
|
||||
print("❌ 需要 Python 3.7 或更高版本")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"✅ Python版本: {sys.version}")
|
||||
|
||||
# 检查主程序文件
|
||||
if not os.path.exists('slide_combine.py'):
|
||||
print("❌ 未找到 slide_combine.py 文件")
|
||||
print("请确保在项目根目录下运行此脚本")
|
||||
sys.exit(1)
|
||||
|
||||
# 检查/安装PyInstaller
|
||||
if not check_pyinstaller():
|
||||
print("尝试自动安装 PyInstaller...")
|
||||
if not install_pyinstaller():
|
||||
print("请手动安装 PyInstaller: pip install pyinstaller")
|
||||
sys.exit(1)
|
||||
|
||||
# 清理之前的构建
|
||||
clean_build()
|
||||
|
||||
# 创建配置文件
|
||||
create_spec_file()
|
||||
create_version_info()
|
||||
|
||||
# 构建exe文件
|
||||
if not build_exe():
|
||||
print("❌ 构建失败,请检查错误信息")
|
||||
sys.exit(1)
|
||||
|
||||
# 创建发布包
|
||||
if not create_release_package():
|
||||
print("❌ 创建发布包失败")
|
||||
sys.exit(1)
|
||||
|
||||
print("🎉 打包完成!")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
9
requirements.txt
Normal file
9
requirements.txt
Normal file
@ -0,0 +1,9 @@
|
||||
# PDF书签合并工具依赖包
|
||||
# 核心依赖:使用Python标准库,无需额外依赖
|
||||
|
||||
# 如果需要更好的界面美化,可以安装以下包(可选)
|
||||
# ttkthemes==3.2.2 # 提供更多主题选择
|
||||
# pillow==10.0.0 # 图标处理支持
|
||||
|
||||
# 打包工具
|
||||
# pyinstaller==5.13.2 # 用于打包成独立exe文件
|
||||
608
slide_combine.py
Normal file
608
slide_combine.py
Normal file
@ -0,0 +1,608 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
PDF书签合并工具 - Python版本
|
||||
用于从PDF文件夹中提取书签信息,与TXT元数据文件合并
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import tkinter as tk
|
||||
from tkinter import ttk, filedialog, messagebox, scrolledtext
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Optional, Tuple
|
||||
from datetime import datetime
|
||||
import logging
|
||||
|
||||
|
||||
class BookmarkItem:
|
||||
"""书签项数据模型"""
|
||||
|
||||
def __init__(self, title: str = "", page: str = ""):
|
||||
self.title = title
|
||||
self.page = page
|
||||
|
||||
def __str__(self):
|
||||
return f"BookmarkItem(title='{self.title}', page='{self.page}')"
|
||||
|
||||
|
||||
class DocumentMetadata:
|
||||
"""文档元数据模型"""
|
||||
|
||||
def __init__(self):
|
||||
# 基本信息
|
||||
self.title = ""
|
||||
self.other_titles = ""
|
||||
self.volume = ""
|
||||
self.isbn = ""
|
||||
|
||||
# 创建和出版信息
|
||||
self.creator = ""
|
||||
self.contributor = ""
|
||||
self.issued_date = ""
|
||||
self.publisher = ""
|
||||
self.place = ""
|
||||
|
||||
# 分类和页码信息
|
||||
self.classification_number = ""
|
||||
self.page = ""
|
||||
|
||||
# 书签目录
|
||||
self.table_of_contents: List[BookmarkItem] = []
|
||||
|
||||
# 扩展信息
|
||||
self.subject = ""
|
||||
self.date = ""
|
||||
self.spatial = ""
|
||||
self.other_isbn = ""
|
||||
self.other_time = ""
|
||||
self.url = ""
|
||||
|
||||
def to_formatted_string(self) -> str:
|
||||
"""转换为格式化的字符串输出"""
|
||||
result = []
|
||||
|
||||
# 基本信息
|
||||
result.append(f"title:{self.title}")
|
||||
if self.other_titles:
|
||||
result.append(f"Other titles:{self.other_titles}")
|
||||
result.append(f"Volume:{self.volume}")
|
||||
result.append(f"ISBN:{self.isbn}")
|
||||
result.append(f"creator:{self.creator}")
|
||||
result.append(f"contributor:{self.contributor}")
|
||||
result.append(f"issuedDate:{self.issued_date}")
|
||||
result.append(f"publisher:{self.publisher}")
|
||||
result.append(f"place:{self.place}")
|
||||
result.append(f"Classification number:{self.classification_number}")
|
||||
result.append(f"page:{self.page}")
|
||||
|
||||
# 书签目录
|
||||
result.append("tableOfContents:")
|
||||
for bookmark in self.table_of_contents:
|
||||
if bookmark.title:
|
||||
result.append(f"{bookmark.title}---------------{bookmark.page}<br/>")
|
||||
|
||||
# 扩展信息
|
||||
result.append(f"subject:{self.subject}")
|
||||
result.append(f"date:{self.date}")
|
||||
result.append(f"spatial:{self.spatial}")
|
||||
result.append(f"Other ISBN:{self.other_isbn}")
|
||||
result.append(f"Other time:{self.other_time}")
|
||||
result.append(f"url:{self.url}")
|
||||
|
||||
return "\\n".join(result)
|
||||
|
||||
|
||||
class BookmarkExtractor:
|
||||
"""书签提取器"""
|
||||
|
||||
@staticmethod
|
||||
def extract_bookmarks_from_bkmk(bkmk_file_path: str) -> List[BookmarkItem]:
|
||||
"""从bkmk文件提取书签列表"""
|
||||
if not os.path.exists(bkmk_file_path):
|
||||
raise FileNotFoundError(f"FreePic2Pdf_bkmk文件不存在: {bkmk_file_path}")
|
||||
|
||||
bookmarks = []
|
||||
|
||||
try:
|
||||
# 尝试不同编码
|
||||
content = None
|
||||
for encoding in ['utf-8', 'gbk', 'gb2312', 'utf-8-sig']:
|
||||
try:
|
||||
with open(bkmk_file_path, 'r', encoding=encoding) as f:
|
||||
content = f.read()
|
||||
break
|
||||
except UnicodeDecodeError:
|
||||
continue
|
||||
|
||||
if content is None:
|
||||
content = open(bkmk_file_path, 'r', encoding='latin-1').read()
|
||||
|
||||
# 按行分割内容
|
||||
lines = content.split('\\n')
|
||||
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# 解析书签行
|
||||
bookmark = BookmarkExtractor._parse_bookmark_line(line)
|
||||
if bookmark:
|
||||
bookmarks.append(bookmark)
|
||||
|
||||
except Exception as ex:
|
||||
raise Exception(f"读取书签文件失败: {ex}")
|
||||
|
||||
return bookmarks
|
||||
|
||||
@staticmethod
|
||||
def _parse_bookmark_line(line: str) -> Optional[BookmarkItem]:
|
||||
"""解析单行书签数据"""
|
||||
# 分割行内容,最后一部分作为页码
|
||||
parts = re.split(r'[\\s\\t:]+', line)
|
||||
parts = [p for p in parts if p.strip()]
|
||||
|
||||
if len(parts) < 2:
|
||||
return None
|
||||
|
||||
bookmark = BookmarkItem()
|
||||
page_part = parts[-1]
|
||||
|
||||
# 验证页码格式(支持阿拉伯数字和罗马数字)
|
||||
if BookmarkExtractor._is_page_number(page_part):
|
||||
bookmark.page = page_part
|
||||
bookmark.title = " ".join(parts[:-1])
|
||||
return bookmark
|
||||
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _is_page_number(text: str) -> bool:
|
||||
"""验证是否为页码格式"""
|
||||
# 支持阿拉伯数字
|
||||
if re.match(r'^\\d+$', text):
|
||||
return True
|
||||
|
||||
# 支持罗马数字
|
||||
if re.match(r'^[IVXLCDMivxlcdm]+$', text):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class FileProcessor:
|
||||
"""文件处理器"""
|
||||
|
||||
@staticmethod
|
||||
def process_all_folders(pdf_root_path: str, txt_source_path: str) -> Dict[str, Dict]:
|
||||
"""处理所有文件夹"""
|
||||
results = {}
|
||||
|
||||
# 查找所有bkmk文件
|
||||
bkmk_files = []
|
||||
for root, dirs, files in os.walk(pdf_root_path):
|
||||
for file in files:
|
||||
if file.startswith('FreePic2Pdf_bkmk'):
|
||||
bkmk_files.append(os.path.join(root, file))
|
||||
|
||||
if not bkmk_files:
|
||||
raise Exception(f"在路径 {pdf_root_path} 下未找到任何 FreePic2Pdf_bkmk 文件")
|
||||
|
||||
# 按基础文件名分组
|
||||
file_groups = {}
|
||||
for bkmk_file in bkmk_files:
|
||||
folder_name = os.path.basename(os.path.dirname(bkmk_file))
|
||||
base_name = FileProcessor._get_base_filename(folder_name)
|
||||
|
||||
if base_name not in file_groups:
|
||||
file_groups[base_name] = []
|
||||
file_groups[base_name].append(bkmk_file)
|
||||
|
||||
# 处理每个分组
|
||||
for base_name, bkmk_files in file_groups.items():
|
||||
try:
|
||||
bkmk_files.sort() # 按文件名排序
|
||||
metadata_documents = []
|
||||
|
||||
for bkmk_file in bkmk_files:
|
||||
# 查找对应的TXT文件
|
||||
folder_name = os.path.basename(os.path.dirname(bkmk_file))
|
||||
txt_file = os.path.join(txt_source_path, f"{folder_name}.txt")
|
||||
|
||||
# 创建元数据文档
|
||||
metadata = FileProcessor._create_metadata_from_files(txt_file, bkmk_file)
|
||||
if metadata:
|
||||
metadata_documents.append(metadata)
|
||||
|
||||
# 合并所有元数据文档
|
||||
combined_content = FileProcessor._combine_metadata_documents(metadata_documents)
|
||||
|
||||
results[base_name] = {
|
||||
'success': True,
|
||||
'base_filename': base_name,
|
||||
'source_files': bkmk_files,
|
||||
'metadata_documents': metadata_documents,
|
||||
'output_content': combined_content
|
||||
}
|
||||
|
||||
except Exception as ex:
|
||||
results[base_name] = {
|
||||
'success': False,
|
||||
'base_filename': base_name,
|
||||
'error_message': str(ex)
|
||||
}
|
||||
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _get_base_filename(folder_name: str) -> str:
|
||||
"""获取基础文件名"""
|
||||
# 获取空格前的部分作为基础名称
|
||||
space_index = folder_name.find(' ')
|
||||
return folder_name[:space_index] if space_index > 0 else folder_name
|
||||
|
||||
@staticmethod
|
||||
def _create_metadata_from_files(txt_file: str, bkmk_file: str) -> Optional[DocumentMetadata]:
|
||||
"""从文件创建元数据"""
|
||||
metadata = DocumentMetadata()
|
||||
|
||||
# 从TXT文件读取元数据
|
||||
if os.path.exists(txt_file):
|
||||
FileProcessor._read_metadata_from_txt(txt_file, metadata)
|
||||
|
||||
# 从bkmk文件提取书签
|
||||
if os.path.exists(bkmk_file):
|
||||
metadata.table_of_contents = BookmarkExtractor.extract_bookmarks_from_bkmk(bkmk_file)
|
||||
|
||||
return metadata
|
||||
|
||||
@staticmethod
|
||||
def _read_metadata_from_txt(txt_file: str, metadata: DocumentMetadata):
|
||||
"""从TXT文件读取元数据"""
|
||||
try:
|
||||
# 尝试不同编码
|
||||
lines = None
|
||||
for encoding in ['gb2312', 'gbk', 'utf-8', 'utf-8-sig']:
|
||||
try:
|
||||
with open(txt_file, 'r', encoding=encoding) as f:
|
||||
lines = f.readlines()
|
||||
break
|
||||
except UnicodeDecodeError:
|
||||
continue
|
||||
|
||||
if lines is None:
|
||||
lines = open(txt_file, 'r', encoding='latin-1').readlines()
|
||||
|
||||
for line in lines:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# 按冒号分割,最多分割成两部分
|
||||
parts = line.split(':', 1)
|
||||
if len(parts) == 2:
|
||||
key = parts[0].strip()
|
||||
value = parts[1].strip()
|
||||
|
||||
# 根据字段名设置对应的属性
|
||||
if key == "title":
|
||||
metadata.title = value
|
||||
elif key == "Other titles":
|
||||
metadata.other_titles = value
|
||||
elif key == "Volume":
|
||||
metadata.volume = value
|
||||
elif key == "ISBN":
|
||||
metadata.isbn = value
|
||||
elif key == "creator":
|
||||
metadata.creator = value
|
||||
elif key == "contributor":
|
||||
metadata.contributor = value
|
||||
elif key == "issuedDate":
|
||||
metadata.issued_date = value
|
||||
elif key == "publisher":
|
||||
metadata.publisher = value
|
||||
elif key == "place":
|
||||
metadata.place = value
|
||||
elif key == "Classification number":
|
||||
metadata.classification_number = value
|
||||
elif key == "page":
|
||||
metadata.page = value
|
||||
elif key == "subject":
|
||||
metadata.subject = value
|
||||
elif key == "date":
|
||||
metadata.date = value
|
||||
elif key == "spatial":
|
||||
metadata.spatial = value
|
||||
elif key == "Other ISBN":
|
||||
metadata.other_isbn = value
|
||||
elif key == "Other time":
|
||||
metadata.other_time = value
|
||||
elif key == "url":
|
||||
metadata.url = value
|
||||
|
||||
except Exception as ex:
|
||||
raise Exception(f"读取TXT文件 {txt_file} 失败: {ex}")
|
||||
|
||||
@staticmethod
|
||||
def _combine_metadata_documents(metadata_documents: List[DocumentMetadata]) -> str:
|
||||
"""合并多个元数据文档"""
|
||||
if not metadata_documents:
|
||||
return ""
|
||||
|
||||
if len(metadata_documents) == 1:
|
||||
return metadata_documents[0].to_formatted_string()
|
||||
|
||||
# 多个文档用 <> 分隔
|
||||
formatted_docs = [doc.to_formatted_string() for doc in metadata_documents]
|
||||
return " <>\\n".join(formatted_docs)
|
||||
|
||||
|
||||
class SlideCombineGUI:
|
||||
"""PDF书签合并工具图形界面"""
|
||||
|
||||
def __init__(self):
|
||||
self.root = tk.Tk()
|
||||
self.root.title("PDF书签合并工具 v2.0 - Python版")
|
||||
self.root.geometry("800x600")
|
||||
self.root.configure(bg='#f0f0f0')
|
||||
|
||||
# 设置窗口居中
|
||||
self.center_window()
|
||||
|
||||
# 创建界面
|
||||
self.create_widgets()
|
||||
|
||||
# 设置日志
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
||||
|
||||
def center_window(self):
|
||||
"""窗口居中"""
|
||||
self.root.update_idletasks()
|
||||
width = self.root.winfo_width()
|
||||
height = self.root.winfo_height()
|
||||
x = (self.root.winfo_screenwidth() // 2) - (width // 2)
|
||||
y = (self.root.winfo_screenheight() // 2) - (height // 2)
|
||||
self.root.geometry(f'{width}x{height}+{x}+{y}')
|
||||
|
||||
def create_widgets(self):
|
||||
"""创建界面组件"""
|
||||
|
||||
# 主标题
|
||||
title_frame = tk.Frame(self.root, bg='#f0f0f0')
|
||||
title_frame.pack(pady=10)
|
||||
|
||||
tk.Label(title_frame, text="📄 PDF书签合并工具",
|
||||
font=("微软雅黑", 16, "bold"), bg='#f0f0f0').pack()
|
||||
tk.Label(title_frame, text="v2.0 - Python版",
|
||||
font=("微软雅黑", 10), bg='#f0f0f0', fg='#666').pack()
|
||||
|
||||
# 路径选择区域
|
||||
path_frame = tk.LabelFrame(self.root, text="📁 路径选择",
|
||||
font=("微软雅黑", 12), bg='#f0f0f0')
|
||||
path_frame.pack(fill='x', padx=20, pady=10)
|
||||
|
||||
# PDF路径
|
||||
tk.Label(path_frame, text="PDF文件夹路径(含FreePic2Pdf_bkmk.txt文件):",
|
||||
font=("微软雅黑", 10), bg='#f0f0f0').grid(row=0, column=0, columnspan=2,
|
||||
sticky='w', padx=10, pady=5)
|
||||
|
||||
self.pdf_path_var = tk.StringVar()
|
||||
tk.Entry(path_frame, textvariable=self.pdf_path_var, width=60,
|
||||
font=("微软雅黑", 9)).grid(row=1, column=0, padx=10, sticky='ew')
|
||||
tk.Button(path_frame, text="浏览", command=self.browse_pdf_path,
|
||||
bg='#4285f4', fg='white', font=("微软雅黑", 9),
|
||||
relief='flat', padx=15).grid(row=1, column=1, padx=10, pady=5)
|
||||
|
||||
# TXT路径
|
||||
tk.Label(path_frame, text="TXT源文件路径:",
|
||||
font=("微软雅黑", 10), bg='#f0f0f0').grid(row=2, column=0, columnspan=2,
|
||||
sticky='w', padx=10, pady=5)
|
||||
|
||||
self.txt_path_var = tk.StringVar()
|
||||
tk.Entry(path_frame, textvariable=self.txt_path_var, width=60,
|
||||
font=("微软雅黑", 9)).grid(row=3, column=0, padx=10, sticky='ew')
|
||||
tk.Button(path_frame, text="浏览", command=self.browse_txt_path,
|
||||
bg='#4285f4', fg='white', font=("微软雅黑", 9),
|
||||
relief='flat', padx=15).grid(row=3, column=1, padx=10, pady=5)
|
||||
|
||||
# 输出路径
|
||||
tk.Label(path_frame, text="输出路径:",
|
||||
font=("微软雅黑", 10), bg='#f0f0f0').grid(row=4, column=0, columnspan=2,
|
||||
sticky='w', padx=10, pady=5)
|
||||
|
||||
self.output_path_var = tk.StringVar()
|
||||
tk.Entry(path_frame, textvariable=self.output_path_var, width=60,
|
||||
font=("微软雅黑", 9)).grid(row=5, column=0, padx=10, sticky='ew')
|
||||
tk.Button(path_frame, text="浏览", command=self.browse_output_path,
|
||||
bg='#4285f4', fg='white', font=("微软雅黑", 9),
|
||||
relief='flat', padx=15).grid(row=5, column=1, padx=10, pady=5)
|
||||
|
||||
path_frame.columnconfigure(0, weight=1)
|
||||
|
||||
# 操作按钮
|
||||
button_frame = tk.Frame(self.root, bg='#f0f0f0')
|
||||
button_frame.pack(fill='x', padx=20, pady=10)
|
||||
|
||||
tk.Button(button_frame, text="🚀 开始合并", command=self.start_merge,
|
||||
bg='#34a853', fg='white', font=("微软雅黑", 11, "bold"),
|
||||
relief='flat', padx=20, pady=10).pack(side='left', padx=5)
|
||||
|
||||
tk.Button(button_frame, text="🔄 清空", command=self.clear_all,
|
||||
bg='#fbbc04', fg='white', font=("微软雅黑", 10),
|
||||
relief='flat', padx=15, pady=10).pack(side='left', padx=5)
|
||||
|
||||
tk.Button(button_frame, text="❌ 退出", command=self.root.quit,
|
||||
bg='#ea4335', fg='white', font=("微软雅黑", 10),
|
||||
relief='flat', padx=15, pady=10).pack(side='left', padx=5)
|
||||
|
||||
# 日志显示区域
|
||||
log_frame = tk.LabelFrame(self.root, text="📊 处理日志",
|
||||
font=("微软雅黑", 12), bg='#f0f0f0')
|
||||
log_frame.pack(fill='both', expand=True, padx=20, pady=10)
|
||||
|
||||
self.log_text = scrolledtext.ScrolledText(log_frame, height=15,
|
||||
font=("Consolas", 9),
|
||||
bg='#2d2d2d', fg='#00ff00')
|
||||
self.log_text.pack(fill='both', expand=True, padx=10, pady=10)
|
||||
|
||||
self.log("PDF书签合并工具已启动", "info")
|
||||
|
||||
def browse_pdf_path(self):
|
||||
"""浏览PDF路径"""
|
||||
path = filedialog.askdirectory(title="选择包含PDF文件夹的路径")
|
||||
if path:
|
||||
self.pdf_path_var.set(path)
|
||||
self.log(f"已选择PDF路径: {path}", "info")
|
||||
|
||||
def browse_txt_path(self):
|
||||
"""浏览TXT路径"""
|
||||
path = filedialog.askdirectory(title="选择包含TXT源文件的路径")
|
||||
if path:
|
||||
self.txt_path_var.set(path)
|
||||
self.log(f"已选择TXT源路径: {path}", "info")
|
||||
|
||||
def browse_output_path(self):
|
||||
"""浏览输出路径"""
|
||||
path = filedialog.askdirectory(title="选择输出路径")
|
||||
if path:
|
||||
self.output_path_var.set(path)
|
||||
self.log(f"已选择输出路径: {path}", "info")
|
||||
|
||||
def clear_all(self):
|
||||
"""清空所有输入"""
|
||||
self.pdf_path_var.set("")
|
||||
self.txt_path_var.set("")
|
||||
self.output_path_var.set("")
|
||||
self.log_text.delete(1.0, tk.END)
|
||||
self.log("界面已清空", "info")
|
||||
|
||||
def log(self, message: str, level: str = "info"):
|
||||
"""添加日志"""
|
||||
timestamp = datetime.now().strftime("%H:%M:%S")
|
||||
|
||||
if level == "info":
|
||||
prefix = "ℹ️"
|
||||
color = "white"
|
||||
elif level == "success":
|
||||
prefix = "✅"
|
||||
color = "#00ff00"
|
||||
elif level == "error":
|
||||
prefix = "❌"
|
||||
color = "#ff6b6b"
|
||||
elif level == "warning":
|
||||
prefix = "⚠️"
|
||||
color = "#feca57"
|
||||
else:
|
||||
prefix = "ℹ️"
|
||||
color = "white"
|
||||
|
||||
log_message = f"[{timestamp}] {prefix} {message}\\n"
|
||||
|
||||
self.log_text.insert(tk.END, log_message)
|
||||
self.log_text.see(tk.END)
|
||||
self.root.update_idletasks()
|
||||
|
||||
# 控制台输出
|
||||
print(log_message.rstrip())
|
||||
|
||||
def start_merge(self):
|
||||
"""开始合并"""
|
||||
# 验证输入
|
||||
pdf_path = self.pdf_path_var.get().strip()
|
||||
txt_path = self.txt_path_var.get().strip()
|
||||
output_path = self.output_path_var.get().strip()
|
||||
|
||||
if not pdf_path or not txt_path or not output_path:
|
||||
messagebox.showwarning("提示", "请选择所有三个路径:PDF路径、TXT源路径和输出路径")
|
||||
return
|
||||
|
||||
if not os.path.exists(pdf_path):
|
||||
messagebox.showerror("错误", f"指定的PDF文件夹路径不存在\\n{pdf_path}")
|
||||
return
|
||||
|
||||
if not os.path.exists(txt_path):
|
||||
messagebox.showerror("错误", f"指定的TXT源文件路径不存在\\n{txt_path}")
|
||||
return
|
||||
|
||||
# 禁用按钮
|
||||
for widget in self.root.winfo_children():
|
||||
if isinstance(widget, tk.Frame):
|
||||
for child in widget.winfo_children():
|
||||
if isinstance(child, tk.Button):
|
||||
child.config(state='disabled')
|
||||
|
||||
# 开始处理
|
||||
self.log("开始处理PDF书签文件...", "info")
|
||||
|
||||
try:
|
||||
# 处理文件
|
||||
results = FileProcessor.process_all_folders(pdf_path, txt_path)
|
||||
|
||||
self.log(f"找到 {len(results)} 个文件组需要处理", "info")
|
||||
|
||||
# 保存结果
|
||||
success_count = 0
|
||||
fail_count = 0
|
||||
|
||||
for base_name, result in results.items():
|
||||
if result['success']:
|
||||
try:
|
||||
# 确保输出目录存在
|
||||
os.makedirs(output_path, exist_ok=True)
|
||||
|
||||
# 保存文件
|
||||
output_file = os.path.join(output_path, f"{base_name}.txt")
|
||||
with open(output_file, 'w', encoding='utf-8-sig') as f:
|
||||
f.write(result['output_content'])
|
||||
|
||||
success_count += 1
|
||||
source_count = len(result['source_files'])
|
||||
self.log(f"✓ 成功处理: {base_name} (合并了 {source_count} 个文件)", "success")
|
||||
|
||||
except Exception as ex:
|
||||
fail_count += 1
|
||||
self.log(f"✗ 保存文件失败: {base_name} - {ex}", "error")
|
||||
else:
|
||||
fail_count += 1
|
||||
self.log(f"✗ 处理失败: {result['base_filename']} - {result['error_message']}", "error")
|
||||
|
||||
self.log(f"处理完成! 成功: {success_count}, 失败: {fail_count}", "info")
|
||||
|
||||
if success_count > 0:
|
||||
messagebox.showinfo("处理完成",
|
||||
f"书签合并完成!\\n成功处理 {success_count} 个文件\\n输出路径: {output_path}")
|
||||
else:
|
||||
messagebox.showwarning("处理失败", "没有成功处理任何文件,请检查输入路径和文件格式。")
|
||||
|
||||
except Exception as ex:
|
||||
self.log(f"处理过程中发生错误: {ex}", "error")
|
||||
messagebox.showerror("错误", f"处理过程中发生错误:\\n{ex}")
|
||||
|
||||
finally:
|
||||
# 重新启用按钮
|
||||
for widget in self.root.winfo_children():
|
||||
if isinstance(widget, tk.Frame):
|
||||
for child in widget.winfo_children():
|
||||
if isinstance(child, tk.Button):
|
||||
child.config(state='normal')
|
||||
|
||||
def run(self):
|
||||
"""运行应用"""
|
||||
self.root.mainloop()
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
try:
|
||||
app = SlideCombineGUI()
|
||||
app.run()
|
||||
except Exception as ex:
|
||||
print(f"程序启动失败: {ex}")
|
||||
messagebox.showerror("启动错误", f"程序启动失败:\\n{ex}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
x
Reference in New Issue
Block a user