init

4 weeks ago · 7570e1314d
commit 7570e1314d
34 changed files with 3175 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,48 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual Environment
+venv/
+ENV/
+env/
+.venv/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+
+# Model archives (keep extracted folders only)
+models/*.tar
+
+# Logs
+*.log
+log.txt
+
+# OCR output
+ocr_*.json
+*_result.json
+
+# OS
+.DS_Store
+Thumbs.db
--- a/README.md
+++ b/README.md
@ -0,0 +1,605 @@
+# Vision-OCR: 图片 OCR 识别系统
+
+基于 PaddleOCR 的图片 OCR 识别系统，支持单张图片、批量图片和目录扫描，提供文本检测、识别、方向分类，输出结构化识别结果。
+
+## 功能特性
+
+- **多输入模式**: 单张图片、多张图片列表、目录批量扫描
+- **完整 OCR 能力**: 文本检测 + 文本识别 + 方向分类
+- **结构化输出**: 文字内容、置信度、位置信息（4 点坐标）
+- **快递单解析**: 自动合并分散文本块，提取运单号、收寄件人等结构化信息
+- **可视化展示**: 在图片上绘制文本框和识别结果
+- **结果导出**: 支持 JSON 结果导出和标注图片保存
+- **ROI 裁剪**: 支持只识别图片指定区域
+- **模块化设计**: 图片加载与 OCR 逻辑完全解耦，便于扩展
+- **全本地运行**: 不依赖任何云服务
+
+## 项目结构
+
+```
+vision-ocr/
+├── input/                  # 图片输入模块
+│   ├── __init__.py
+│   └── loader.py           # 图片加载器
+├── ocr/                    # OCR 处理模块
+│   ├── __init__.py
+│   ├── engine.py           # PaddleOCR 引擎封装
+│   ├── pipeline.py         # OCR 处理管道
+│   └── express_parser.py   # 快递单解析器
+├── visualize/              # 可视化模块
+│   ├── __init__.py
+│   └── draw.py             # 结果绘制器
+├── utils/                  # 工具模块
+│   ├── __init__.py
+│   └── config.py           # 配置管理
+├── models/                 # 模型文件目录（运行 download_models.py 后生成）
+├── main.py                 # 主入口
+├── download_models.py      # 模型下载脚本
+├── requirements.txt        # 依赖清单
+└── README.md
+```
+
+## 环境要求
+
+- Python 3.9+
+- 支持的操作系统: Windows / Linux / macOS
+
+## 安装
+
+### 1. 克隆项目
+
+```bash
+git clone <repository-url>
+cd vision-ocr
+```
+
+### 2. 创建虚拟环境（推荐）
+
+```bash
+python -m venv venv
+
+# Windows
+venv\Scripts\activate
+
+# Linux/macOS
+source venv/bin/activate
+```
+
+### 3. 安装依赖
+
+```bash
+pip install -r requirements.txt
+```
+
+### 4. 模型说明
+
+本项目已内置 PaddleOCR 模型文件（位于 `models/` 目录），clone 后即可直接使用，无需额外下载。
+
+> **备用方案**：如果模型文件缺失或需要更新，可运行 `python download_models.py` 重新下载。
+
+#### 模型详情
+
+本项目使用 PaddleOCR 的 PP-OCRv4 系列模型，包含 3 个模型协同工作：
+
+| 模型类型 | 模型名称 | 作用 | 大小 |
+|---------|---------|------|------|
+| **det (检测模型)** | ch_PP-OCRv4_det_infer | 定位图像中所有文本区域的位置，输出每个文本块的 4 点边界框坐标 | ~4.7MB |
+| **rec (识别模型)** | ch_PP-OCRv4_rec_infer | 将检测到的文本区域图像转换为实际文字内容，输出文本和置信度 | ~10MB |
+| **cls (方向分类模型)** | ch_ppocr_mobile_v2.0_cls_infer | 判断文本是正向(0度)还是倒置(180度)，用于矫正倒置文本后再识别 | ~1.4MB |
+
+**OCR 处理流程：**
+
+```
+输入图像 -> [det 检测] -> 文本区域 -> [cls 分类] -> 方向矫正 -> [rec 识别] -> 文字结果
+```
+
+**模型下载地址：**
+
+- 检测模型: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar
+- 识别模型: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar
+- 方向分类模型: https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
+
+**注意：** 可通过 `--no-angle-cls` 参数禁用方向分类模型，适用于文本方向固定的场景，可略微提升处理速度。
+
+### 5. GPU 加速（可选）
+
+如需使用 GPU 加速，请安装对应 CUDA 版本的 PaddlePaddle：
+
+```bash
+# CUDA 11.8
+pip install paddlepaddle-gpu==2.5.2.post118 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+
+# CUDA 12.0
+pip install paddlepaddle-gpu==2.5.2.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+```
+
+## 使用方法
+
+### 基础用法
+
+```bash
+# 识别单张图片
+python main.py --image path/to/image.jpg
+
+# 识别目录中的所有图片
+python main.py --dir path/to/images/
+
+# 识别目录中的特定格式图片
+python main.py --dir path/to/images/ --pattern "*.png"
+
+# 递归搜索子目录
+python main.py --dir path/to/images/ --recursive
+```
+
+### 高级选项
+
+```bash
+# 启用 GPU 加速
+python main.py --image test.jpg --gpu
+
+# 启用 ROI 区域裁剪（只识别画面中央 60% 区域）
+python main.py --image test.jpg --roi 0.2 0.2 0.6 0.6
+
+# 调整置信度阈值（过滤低置信度结果）
+python main.py --image test.jpg --drop-score 0.7
+
+# 切换识别语言
+python main.py --image test.jpg --lang en    # 英文
+python main.py --image test.jpg --lang ch    # 中文（默认）
+
+# 禁用方向分类（轻微提升速度）
+python main.py --image test.jpg --no-angle-cls
+
+# 显示可视化窗口
+python main.py --image test.jpg --show-window
+
+# 保存标注后的图片
+python main.py --image test.jpg --save-image
+
+# 指定输出目录
+python main.py --dir images/ --output-dir results/
+```
+
+### 完整参数列表
+
+| 参数 | 简写 | 说明 | 默认值 |
+|------|------|------|--------|
+| `--image` | `-i` | 单张图片路径 | - |
+| `--dir` | `-d` | 图片目录路径 | - |
+| `--pattern` | `-p` | 文件匹配模式 | - |
+| `--recursive` | `-r` | 递归搜索子目录 | False |
+| `--lang` | `-l` | OCR 语言 | ch |
+| `--gpu` | - | 启用 GPU 加速 | False |
+| `--no-angle-cls` | - | 禁用方向分类 | False |
+| `--drop-score` | - | 置信度阈值 | 0.5 |
+| `--roi` | - | ROI 区域 (x y w h) | - |
+| `--show-window` | - | 显示可视化窗口 | False |
+| `--no-confidence` | - | 不显示置信度 | False |
+| `--output-dir` | `-o` | 输出目录路径 | - |
+| `--save-image` | - | 保存标注后的图片 | False |
+| `--no-json` | - | 不保存 JSON 结果 | False |
+| `--json-filename` | - | JSON 结果文件名 | ocr_result.json |
+| `--express` | `-e` | 启用快递单解析模式 | False |
+
+### 运行时快捷键
+
+| 按键 | 功能 |
+|------|------|
+| `q` | 退出程序 |
+| 任意键 | 处理下一张图片 |
+
+### 结果导出
+
+程序处理完成后会自动将所有识别结果导出到 JSON 文件：
+
+- 默认输出文件：`ocr_result.json`
+- 可通过 `--json-filename` 参数指定文件名
+- 可通过 `--output-dir` 参数指定输出目录
+
+**汇总 JSON 格式：**
+
+```json
+{
+  "total_images": 10,
+  "total_text_blocks": 45,
+  "results": [
+    {
+      "image_index": 1,
+      "image_path": "path/to/image.jpg",
+      "timestamp": 1704355200.123,
+      "processing_time_ms": 45.2,
+      "text_count": 3,
+      "average_confidence": 0.92,
+      "roi_applied": false,
+      "roi_rect": null,
+      "text_blocks": [
+        {
+          "text": "识别的文本",
+          "confidence": 0.95,
+          "bbox": [[100, 50], [200, 50], [200, 80], [100, 80]],
+          "bbox_with_offset": [[100, 50], [200, 50], [200, 80], [100, 80]],
+          "center": [150, 65],
+          "width": 100,
+          "height": 30
+        }
+      ]
+    }
+  ]
+}
+```
+
+#### JSON 字段说明
+
+**顶层字段**
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `total_images` | int | 处理的图片总数 |
+| `total_text_blocks` | int | 所有图片识别出的文本块总数 |
+| `results` | array | 每张图片的识别结果数组 |
+
+**单张图片结果字段**
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `image_index` | int | 图片索引，从 1 开始递增 |
+| `image_path` | string | 图片文件的完整路径 |
+| `timestamp` | float | 处理完成时的 Unix 时间戳（秒） |
+| `processing_time_ms` | float | OCR 处理耗时（毫秒） |
+| `text_count` | int | 该图片识别出的文本块数量 |
+| `average_confidence` | float | 所有文本块的平均置信度 (0.0 ~ 1.0) |
+| `roi_applied` | bool | 是否应用了 ROI 区域裁剪 |
+| `roi_rect` | array\|null | ROI 矩形区域 `[x, y, width, height]`，未应用时为 `null` |
+| `text_blocks` | array | 识别出的文本块数组 |
+
+**文本块 (text_blocks) 字段**
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `text` | string | 识别出的文本内容 |
+| `confidence` | float | 识别置信度 (0.0 ~ 1.0)，越高表示识别结果越可靠 |
+| `bbox` | array | 文本边界框的 4 个顶点坐标 `[[x1,y1], [x2,y2], [x3,y3], [x4,y4]]`，顺序为左上、右上、右下、左下。如果启用了 ROI，坐标相对于 ROI 区域 |
+| `bbox_with_offset` | array | 带偏移的边界框坐标，已还原到原图坐标系。格式同 `bbox` |
+| `center` | array | 文本块中心点坐标 `[cx, cy]` |
+| `width` | float | 文本块宽度（像素） |
+| `height` | float | 文本块高度（像素） |
+
+## 快递单解析模式
+
+使用 `--express` 参数启用快递单解析模式，系统会自动：
+
+1. **合并分散文本块**: 基于位置信息将同一行的文本块合并
+2. **提取结构化信息**: 运单号、快递公司、收/寄件人姓名、电话、地址
+
+### 使用方式
+
+```bash
+# 单张快递单图片
+python main.py --image express.jpg --express
+
+# 批量处理快递单图片
+python main.py --dir express_images/ --express --output-dir results/
+```
+
+### 输出格式
+
+快递单模式下的 JSON 输出格式：
+
+```json
+{
+  "total_images": 5,
+  "total_text_blocks": 50,
+  "results": [
+    {
+      "image_index": 1,
+      "image_path": "express.jpg",
+      "processing_time_ms": 45.2,
+      "express_info": {
+        "tracking_number": "SF1234567890",
+        "sender": {
+          "name": "张三",
+          "phone": "13800138000",
+          "address": "北京市朝阳区xxx路"
+        },
+        "receiver": {
+          "name": "李四",
+          "phone": "13900139000",
+          "address": "上海市浦东新区xxx路"
+        },
+        "courier_company": "顺丰速运",
+        "confidence": 0.95,
+        "extra_fields": {},
+        "raw_text": "顺丰速运\n运单号：SF1234567890\n..."
+      },
+      "merged_text": "顺丰速运\n运单号：SF1234567890\n收件人：李四 13900139000\n..."
+    }
+  ]
+}
+```
+
+#### 快递单模式 JSON 字段说明
+
+**单张图片结果字段（快递单模式）**
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `image_index` | int | 图片索引，从 1 开始递增 |
+| `image_path` | string | 图片文件的完整路径 |
+| `processing_time_ms` | float | OCR 处理耗时（毫秒） |
+| `express_info` | object | 解析出的快递单结构化信息 |
+| `merged_text` | string | 基于位置信息智能合并后的完整文本，同一行的文本块会被合并 |
+
+**快递单信息 (express_info) 字段**
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `tracking_number` | string\|null | 运单号/快递单号 |
+| `sender` | object | 寄件人信息 |
+| `sender.name` | string\|null | 寄件人姓名 |
+| `sender.phone` | string\|null | 寄件人电话（11位手机号） |
+| `sender.address` | string\|null | 寄件人地址 |
+| `receiver` | object | 收件人信息 |
+| `receiver.name` | string\|null | 收件人姓名 |
+| `receiver.phone` | string\|null | 收件人电话（11位手机号） |
+| `receiver.address` | string\|null | 收件人地址 |
+| `courier_company` | string\|null | 快递公司名称（如：顺丰速运、圆通速递等） |
+| `confidence` | float | 所有文本块的平均置信度 (0.0 ~ 1.0) |
+| `extra_fields` | object | 其他识别到的额外字段（键值对形式） |
+| `raw_text` | string | 原始合并文本，用于调试和验证 |
+
+### 支持的快递公司
+
+顺丰、圆通、中通、韵达、申通、极兔、京东、邮政、EMS、百世、德邦、天天、宅急送
+
+### 编程接口
+
+```python
+from ocr import OCRPipeline, ExpressParser
+
+# 使用 OCRResult 的 parse_express() 方法
+result = pipeline.process(image)
+if result and result.text_count > 0:
+    # 解析快递单信息
+    express_info = result.parse_express()
+    print(f"运单号: {express_info.tracking_number}")
+    print(f"收件人: {express_info.receiver_name}")
+    print(f"收件电话: {express_info.receiver_phone}")
+
+    # 获取合并后的完整文本
+    merged_text = result.merge_text()
+    print(f"合并文本: {merged_text}")
+```
+
+## 编程接口
+
+### 作为模块使用
+
+```python
+from input import ImageLoader
+from ocr import OCRPipeline
+from visualize import OCRVisualizer
+from utils import Config, InputConfig, InputMode
+
+# 创建配置
+config = Config.for_single_image("path/to/image.jpg")
+
+# 创建组件
+loader = ImageLoader()
+pipeline = OCRPipeline(config.ocr, config.pipeline)
+visualizer = OCRVisualizer(config.visualize)
+
+# 初始化
+pipeline.initialize()
+
+# 加载并处理图片
+image_info = loader.load("path/to/image.jpg")
+if image_info:
+    result = pipeline.process(image_info.image, image_info.path)
+
+    if result and result.text_count > 0:
+        # 获取识别结果
+        for block in result.text_blocks:
+            print(f"文本: {block.text}")
+            print(f"置信度: {block.confidence}")
+            print(f"位置: {block.bbox}")
+
+        # 导出为 JSON
+        json_data = result.to_dict()
+
+    # 可视化
+    display_image = visualizer.draw_result(image_info.image, result)
+    visualizer.show(display_image, wait_key=0)
+
+# 清理资源
+visualizer.close()
+```
+
+### 批量处理
+
+```python
+from input import ImageLoader
+from ocr import OCRPipeline
+from utils import Config
+
+# 创建配置
+config = Config.for_directory("path/to/images/", pattern="*.jpg")
+
+# 创建组件
+loader = ImageLoader()
+pipeline = OCRPipeline(config.ocr)
+pipeline.initialize()
+
+# 批量处理
+for image_info in loader.load_directory("path/to/images/"):
+    result = pipeline.process(image_info.image, image_info.path)
+    print(f"{image_info.filename}: 识别到 {result.text_count} 个文本块")
+```
+
+### OCRResult 数据结构
+
+```python
+{
+    "image_index": 1,
+    "image_path": "path/to/image.jpg",
+    "timestamp": 1704355200.123,
+    "processing_time_ms": 45.6,
+    "text_count": 3,
+    "average_confidence": 0.92,
+    "roi_applied": False,
+    "roi_rect": None,
+    "text_blocks": [
+        {
+            "text": "识别的文本",
+            "confidence": 0.95,
+            "bbox": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
+            "bbox_with_offset": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
+            "center": [cx, cy],
+            "width": 120.0,
+            "height": 30.0
+        }
+    ]
+}
+```
+
+## 模块说明
+
+### input/loader.py - 图片加载模块
+
+提供图片加载功能，支持单张、批量和目录加载。
+
+- `ImageLoader`: 图片加载器类
+- `ImageInfo`: 图片信息数据结构
+- `load_image()`: 便捷函数，加载单张图片
+- `load_images()`: 便捷函数，批量加载图片
+
+### ocr/engine.py - OCR 引擎模块
+
+封装 PaddleOCR，提供简洁的 OCR 调用接口。
+
+- `OCREngine`: OCR 引擎类
+- `TextBlock`: 文本块数据结构
+
+### ocr/pipeline.py - OCR 处理管道
+
+串联图片加载、ROI 裁剪、OCR 识别、结果封装。
+
+- `OCRPipeline`: 处理管道类
+- `OCRResult`: OCR 结果数据结构
+
+### visualize/draw.py - 可视化模块
+
+在图像上绘制 OCR 识别结果。
+
+- `OCRVisualizer`: 可视化器类
+
+### utils/config.py - 配置管理模块
+
+集中管理所有可配置参数。
+
+- `Config`: 全局配置聚合类
+- `InputConfig`: 输入配置
+- `OCRConfig`: OCR 引擎配置
+- `PipelineConfig`: 管道配置
+- `VisualizeConfig`: 可视化配置
+- `OutputConfig`: 输出配置
+- `ROIConfig`: ROI 区域配置
+
+## 扩展开发
+
+### 添加图片预处理器
+
+```python
+import cv2
+
+def denoise_preprocessor(image):
+    """降噪预处理"""
+    return cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 7, 21)
+
+pipeline.add_preprocessor(denoise_preprocessor)
+```
+
+### 添加结果后处理器
+
+```python
+def filter_short_text(result):
+    """过滤短文本"""
+    result.text_blocks = [
+        block for block in result.text_blocks
+        if len(block.text) >= 3
+    ]
+    return result
+
+pipeline.add_postprocessor(filter_short_text)
+```
+
+## 性能优化建议
+
+1. **启用 ROI**: 使用 `--roi` 参数只处理感兴趣区域
+2. **使用 GPU**: 使用 `--gpu` 参数启用 GPU 加速
+3. **禁用方向分类**: 如果文本方向固定，使用 `--no-angle-cls`
+4. **提高置信度阈值**: 使用 `--drop-score` 过滤低质量结果
+5. **批量处理**: 使用目录模式批量处理多张图片
+
+## 常见问题
+
+### Q: Windows 中文用户名导致模型加载失败？
+
+A: PaddlePaddle 的 C++ 推理引擎无法正确处理包含中文字符的路径。请运行以下命令将模型下载到项目目录：
+
+```bash
+python download_models.py
+```
+
+程序会自动检测并使用 `models/` 目录中的模型。
+
+### Q: 中文无法正常显示？
+
+A: OpenCV 默认字体不支持中文。可以在 `VisualizeConfig` 中配置 `font_path` 指向系统中文字体文件：
+
+```python
+config.visualize.font_path = "C:/Windows/Fonts/simhei.ttf"  # Windows
+config.visualize.font_path = "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc"  # Linux
+```
+
+### Q: OCR 速度慢？
+
+A: 参考上方「性能优化建议」部分。
+
+### Q: 支持哪些图片格式？
+
+A: 支持以下格式：`.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.tif`, `.webp`
+
+## 贡献指南
+
+欢迎提交 Issue 和 Pull Request。
+
+### 开发流程
+
+1. Fork 本仓库
+2. 创建功能分支: `git checkout -b feature/your-feature`
+3. 提交更改: `git commit -m "Add your feature"`
+4. 推送分支: `git push origin feature/your-feature`
+5. 创建 Pull Request
+
+### 代码规范
+
+- 遵循 PEP 8 代码风格
+- 所有公共类和函数需要添加文档字符串
+- 新功能需要添加相应的类型注解
+- 提交前确保代码可正常运行
+
+### 目录结构规范
+
+- `input/`: 仅包含图片加载相关代码
+- `ocr/`: 仅包含 OCR 处理相关代码
+- `visualize/`: 仅包含可视化相关代码
+- `utils/`: 通用工具和配置
+
+## 许可证
+
+MIT License
+
+## 致谢
+
+- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - 强大的 OCR 工具库
+- [OpenCV](https://opencv.org/) - 计算机视觉库
--- a/data/csb.png
+++ b/data/csb.png
--- a/data/ems.png
+++ b/data/ems.png
--- a/data/img.png
+++ b/data/img.png
--- a/data/invert.png
+++ b/data/invert.png
--- a/data/jd.png
+++ b/data/jd.png
--- a/data/sf.png
+++ b/data/sf.png
--- a/data/st.png
+++ b/data/st.png
--- a/data/test.png
+++ b/data/test.png
--- a/data/zt.png
+++ b/data/zt.png
--- a/download_models.py
+++ b/download_models.py
@ -0,0 +1,82 @@
+# -*- coding: utf-8 -*-
+"""
+模型下载脚本
+将 PaddleOCR 模型下载到项目目录，避免中文路径问题
+"""
+
+import os
+import tarfile
+import urllib.request
+from pathlib import Path
+
+# 模型下载地址
+MODELS = {
+    "det": {
+        "url": "https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar",
+        "dir": "ch_PP-OCRv4_det_infer"
+    },
+    "rec": {
+        "url": "https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar",
+        "dir": "ch_PP-OCRv4_rec_infer"
+    },
+    "cls": {
+        "url": "https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar",
+        "dir": "ch_ppocr_mobile_v2.0_cls_infer"
+    }
+}
+
+def download_and_extract(url: str, save_dir: Path, model_name: str) -> Path:
+    """
+    下载并解压模型
+
+    Args:
+        url: 下载地址
+        save_dir: 保存目录
+        model_name: 模型名称
+
+    Returns:
+        解压后的模型目录路径
+    """
+    save_dir.mkdir(parents=True, exist_ok=True)
+    tar_path = save_dir / f"{model_name}.tar"
+
+    # 下载
+    if not tar_path.exists():
+        print(f"[INFO] Downloading {model_name}...")
+        urllib.request.urlretrieve(url, tar_path)
+        print(f"[INFO] Downloaded to {tar_path}")
+    else:
+        print(f"[INFO] {model_name} already exists, skipping download")
+
+    # 解压
+    extract_dir = save_dir / model_name
+    if not extract_dir.exists():
+        print(f"[INFO] Extracting {model_name}...")
+        with tarfile.open(tar_path, "r") as tar:
+            tar.extractall(save_dir)
+        print(f"[INFO] Extracted to {extract_dir}")
+
+    return extract_dir
+
+
+def main():
+    """下载所有模型"""
+    project_root = Path(__file__).parent
+    models_dir = project_root / "models"
+
+    print(f"[INFO] Models will be saved to: {models_dir}")
+
+    for model_type, info in MODELS.items():
+        model_dir = download_and_extract(
+            url=info["url"],
+            save_dir=models_dir,
+            model_name=info["dir"]
+        )
+        print(f"[INFO] {model_type} model ready at: {model_dir}")
+
+    print("\n[INFO] All models downloaded successfully!")
+    print("[INFO] You can now run: python main.py")
+
+
+if __name__ == "__main__":
+    main()
--- a/input/init.py
+++ b/input/init.py
@ -0,0 +1,9 @@
+# -*- coding: utf-8 -*-
+"""
+图片输入模块
+提供图片加载功能，支持单张图片、多张图片和目录批量加载
+"""
+
+from .loader import ImageLoader, ImageInfo, load_image, load_images
+
+__all__ = ["ImageLoader", "ImageInfo", "load_image", "load_images"]
--- a/input/loader.py
+++ b/input/loader.py
@ -0,0 +1,278 @@
+# -*- coding: utf-8 -*-
+"""
+图片加载模块
+提供图片加载功能，支持单张图片、多张图片和目录批量加载
+"""
+
+import cv2
+import numpy as np
+from pathlib import Path
+from typing import List, Optional, Generator, Union, Tuple
+from dataclasses import dataclass
+
+
+# 支持的图片格式
+SUPPORTED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff', '.tif', '.webp'}
+
+
+@dataclass
+class ImageInfo:
+    """
+    图片信息数据结构
+
+    Attributes:
+        path: 图片文件路径
+        image: 图片数据 (numpy array, BGR 格式)
+        width: 图片宽度
+        height: 图片高度
+        filename: 文件名（不含路径）
+    """
+    path: str
+    image: np.ndarray
+    width: int
+    height: int
+    filename: str
+
+    @classmethod
+    def from_image(cls, path: str, image: np.ndarray) -> "ImageInfo":
+        """
+        从图片数据创建 ImageInfo
+
+        Args:
+            path: 图片路径
+            image: 图片数据
+
+        Returns:
+            ImageInfo 实例
+        """
+        h, w = image.shape[:2]
+        return cls(
+            path=path,
+            image=image,
+            width=w,
+            height=h,
+            filename=Path(path).name
+        )
+
+
+class ImageLoader:
+    """
+    图片加载器
+    支持单张图片、多张图片列表和目录批量加载
+    """
+
+    def __init__(self, supported_extensions: Optional[set] = None):
+        """
+        初始化图片加载器
+
+        Args:
+            supported_extensions: 支持的图片扩展名集合，默认使用 SUPPORTED_EXTENSIONS
+        """
+        self._extensions = supported_extensions or SUPPORTED_EXTENSIONS
+
+    def load(self, path: Union[str, Path]) -> Optional[ImageInfo]:
+        """
+        加载单张图片
+
+        Args:
+            path: 图片文件路径
+
+        Returns:
+            ImageInfo 对象，加载失败返回 None
+        """
+        path = Path(path)
+
+        if not path.exists():
+            print(f"[ERROR] 文件不存在: {path}")
+            return None
+
+        if not path.is_file():
+            print(f"[ERROR] 路径不是文件: {path}")
+            return None
+
+        if path.suffix.lower() not in self._extensions:
+            print(f"[ERROR] 不支持的图片格式: {path.suffix}")
+            return None
+
+        # 使用 cv2.imdecode 处理中文路径
+        image = self._read_image(str(path))
+
+        if image is None:
+            print(f"[ERROR] 无法读取图片: {path}")
+            return None
+
+        return ImageInfo.from_image(str(path), image)
+
+    def load_batch(self, paths: List[Union[str, Path]]) -> Generator[ImageInfo, None, None]:
+        """
+        批量加载多张图片
+
+        Args:
+            paths: 图片路径列表
+
+        Yields:
+            ImageInfo 对象
+        """
+        for path in paths:
+            info = self.load(path)
+            if info is not None:
+                yield info
+
+    def load_directory(
+        self,
+        dir_path: Union[str, Path],
+        pattern: Optional[str] = None,
+        recursive: bool = False
+    ) -> Generator[ImageInfo, None, None]:
+        """
+        加载目录中的所有图片
+
+        Args:
+            dir_path: 目录路径
+            pattern: 文件匹配模式（如 "*.jpg"），None 表示加载所有支持的格式
+            recursive: 是否递归搜索子目录
+
+        Yields:
+            ImageInfo 对象
+        """
+        dir_path = Path(dir_path)
+
+        if not dir_path.exists():
+            print(f"[ERROR] 目录不存在: {dir_path}")
+            return
+
+        if not dir_path.is_dir():
+            print(f"[ERROR] 路径不是目录: {dir_path}")
+            return
+
+        # 获取文件列表
+        if pattern:
+            if recursive:
+                files = list(dir_path.rglob(pattern))
+            else:
+                files = list(dir_path.glob(pattern))
+        else:
+            # 加载所有支持的格式
+            files = []
+            for ext in self._extensions:
+                if recursive:
+                    files.extend(dir_path.rglob(f"*{ext}"))
+                    files.extend(dir_path.rglob(f"*{ext.upper()}"))
+                else:
+                    files.extend(dir_path.glob(f"*{ext}"))
+                    files.extend(dir_path.glob(f"*{ext.upper()}"))
+
+        # 按文件名排序
+        files = sorted(set(files), key=lambda p: p.name)
+
+        print(f"[INFO] 在目录 {dir_path} 中找到 {len(files)} 张图片")
+
+        for file_path in files:
+            info = self.load(file_path)
+            if info is not None:
+                yield info
+
+    def get_image_paths(
+        self,
+        dir_path: Union[str, Path],
+        pattern: Optional[str] = None,
+        recursive: bool = False
+    ) -> List[str]:
+        """
+        获取目录中所有图片的路径列表（不加载图片）
+
+        Args:
+            dir_path: 目录路径
+            pattern: 文件匹配模式
+            recursive: 是否递归搜索
+
+        Returns:
+            图片路径列表
+        """
+        dir_path = Path(dir_path)
+
+        if not dir_path.exists() or not dir_path.is_dir():
+            return []
+
+        if pattern:
+            if recursive:
+                files = list(dir_path.rglob(pattern))
+            else:
+                files = list(dir_path.glob(pattern))
+        else:
+            files = []
+            for ext in self._extensions:
+                if recursive:
+                    files.extend(dir_path.rglob(f"*{ext}"))
+                    files.extend(dir_path.rglob(f"*{ext.upper()}"))
+                else:
+                    files.extend(dir_path.glob(f"*{ext}"))
+                    files.extend(dir_path.glob(f"*{ext.upper()}"))
+
+        return sorted([str(f) for f in set(files)], key=lambda p: Path(p).name)
+
+    def _read_image(self, path: str) -> Optional[np.ndarray]:
+        """
+        读取图片，支持中文路径
+
+        Args:
+            path: 图片路径
+
+        Returns:
+            图片数据，读取失败返回 None
+        """
+        # 使用 numpy 和 imdecode 处理中文路径
+        try:
+            with open(path, 'rb') as f:
+                data = np.frombuffer(f.read(), dtype=np.uint8)
+            image = cv2.imdecode(data, cv2.IMREAD_COLOR)
+            return image
+        except Exception as e:
+            print(f"[ERROR] 读取图片失败: {path}, 错误: {e}")
+            return None
+
+    @property
+    def supported_extensions(self) -> set:
+        """获取支持的图片扩展名"""
+        return self._extensions.copy()
+
+
+def load_image(path: Union[str, Path]) -> Optional[np.ndarray]:
+    """
+    便捷函数：加载单张图片
+
+    Args:
+        path: 图片路径
+
+    Returns:
+        图片数据 (numpy array, BGR 格式)，加载失败返回 None
+    """
+    loader = ImageLoader()
+    info = loader.load(path)
+    return info.image if info else None
+
+
+def load_images(
+    paths: Optional[List[Union[str, Path]]] = None,
+    directory: Optional[Union[str, Path]] = None,
+    pattern: Optional[str] = None,
+    recursive: bool = False
+) -> Generator[ImageInfo, None, None]:
+    """
+    便捷函数：批量加载图片
+
+    Args:
+        paths: 图片路径列表
+        directory: 图片目录
+        pattern: 文件匹配模式
+        recursive: 是否递归搜索
+
+    Yields:
+        ImageInfo 对象
+    """
+    loader = ImageLoader()
+
+    if paths:
+        yield from loader.load_batch(paths)
+    elif directory:
+        yield from loader.load_directory(directory, pattern, recursive)
--- a/main.py
+++ b/main.py
@ -0,0 +1,551 @@
+# -*- coding: utf-8 -*-
+"""
+OCR 图片识别系统 - 主入口
+支持单张图片、多张图片和目录批量处理
+"""
+
+import os
+from pathlib import Path
+
+# 在所有其他导入之前设置 PaddleOCR 模型路径
+# 解决 Windows 中文用户名路径问题
+_PROJECT_ROOT = Path(__file__).parent
+_MODELS_DIR = _PROJECT_ROOT / "models"
+_MODELS_DIR.mkdir(exist_ok=True)
+os.environ["PADDLEOCR_HOME"] = str(_MODELS_DIR)
+
+import argparse
+import json
+import sys
+import cv2
+from typing import Optional, List, Generator
+
+from input.loader import ImageLoader, ImageInfo
+from ocr.pipeline import OCRPipeline, OCRResult
+from visualize.draw import OCRVisualizer
+from utils.config import (
+    Config,
+    InputConfig,
+    InputMode,
+    OCRConfig,
+    PipelineConfig,
+    VisualizeConfig,
+    OutputConfig,
+    ROIConfig
+)
+
+
+class OCRApplication:
+    """
+    OCR 应用主类
+    协调各模块完成图片 OCR 识别
+    """
+
+    def __init__(
+        self,
+        config: Config,
+        express_mode: bool = False
+    ):
+        """
+        初始化应用
+
+        Args:
+            config: 全局配置
+            express_mode: 是否启用快递单解析模式
+        """
+        self._config = config
+        self._loader: Optional[ImageLoader] = None
+        self._pipeline: Optional[OCRPipeline] = None
+        self._visualizer: Optional[OCRVisualizer] = None
+        self._express_mode = express_mode
+        self._all_results: List[dict] = []
+
+    def initialize(self) -> bool:
+        """
+        初始化所有组件
+
+        Returns:
+            是否初始化成功
+        """
+        print("[INFO] 正在初始化 OCR 系统...")
+
+        # 创建图片加载器
+        self._loader = ImageLoader()
+
+        # 创建 OCR 管道
+        self._pipeline = OCRPipeline(
+            ocr_config=self._config.ocr,
+            pipeline_config=self._config.pipeline
+        )
+
+        # 创建可视化器
+        self._visualizer = OCRVisualizer(self._config.visualize)
+
+        # 初始化 OCR 管道（预加载模型）
+        print("[INFO] 正在加载 OCR 模型...")
+        self._pipeline.initialize()
+        print("[INFO] OCR 模型加载完成")
+
+        return True
+
+    def _get_images(self) -> Generator[ImageInfo, None, None]:
+        """
+        根据配置获取图片
+
+        Yields:
+            ImageInfo 对象
+        """
+        input_config = self._config.input
+        if input_config is None:
+            return
+
+        if input_config.mode == InputMode.SINGLE:
+            info = self._loader.load(input_config.image_path)
+            if info:
+                yield info
+
+        elif input_config.mode == InputMode.BATCH:
+            yield from self._loader.load_batch(input_config.image_paths)
+
+        elif input_config.mode == InputMode.DIRECTORY:
+            yield from self._loader.load_directory(
+                input_config.directory,
+                input_config.pattern,
+                input_config.recursive
+            )
+
+    def run(self) -> None:
+        """运行图片处理"""
+        if self._loader is None or self._pipeline is None or self._visualizer is None:
+            print("[ERROR] 系统未初始化")
+            return
+
+        self._all_results = []
+        print("[INFO] 开始 OCR 识别...")
+
+        try:
+            for image_info in self._get_images():
+                print(f"\n[INFO] 处理图片: {image_info.filename}")
+
+                # OCR 处理
+                result = self._pipeline.process(image_info.image, image_info.path)
+
+                # 收集结果
+                if result:
+                    if self._express_mode:
+                        # 快递单模式：解析并收集结构化结果
+                        express_info = result.parse_express()
+                        self._all_results.append({
+                            "image_index": result.image_index,
+                            "image_path": result.image_path,
+                            "processing_time_ms": result.processing_time_ms,
+                            "express_info": express_info.to_dict(),
+                            "merged_text": result.merge_text()
+                        })
+                    else:
+                        self._all_results.append(result.to_dict())
+
+                # 打印结果
+                if result and result.text_count > 0:
+                    if self._express_mode:
+                        self._print_express_result(result)
+                    else:
+                        self._print_result(result)
+
+                # 可视化并保存
+                if result:
+                    self._handle_visualization(image_info, result)
+
+        except KeyboardInterrupt:
+            print("\n[INFO] 收到中断信号，正在退出...")
+
+        finally:
+            # 导出汇总结果
+            self._export_summary()
+            self.cleanup()
+
+    def _print_result(self, result: OCRResult) -> None:
+        """
+        打印 OCR 结果到控制台
+
+        Args:
+            result: OCR 结果
+        """
+        print(f"  识别到 {result.text_count} 个文本块 (耗时: {result.processing_time_ms:.1f}ms)")
+        for i, block in enumerate(result.text_blocks):
+            print(f"    [{i+1}] {block.text} (置信度: {block.confidence:.3f})")
+
+    def _print_express_result(self, result: OCRResult) -> None:
+        """
+        打印快递单解析结果到控制台
+
+        Args:
+            result: OCR 结果
+        """
+        express_info = result.parse_express()
+        print(f"  快递单解析结果 (耗时: {result.processing_time_ms:.1f}ms)")
+
+        if express_info.courier_company:
+            print(f"    快递公司: {express_info.courier_company}")
+        if express_info.tracking_number:
+            print(f"    运单号: {express_info.tracking_number}")
+        if express_info.receiver_name:
+            print(f"    收件人: {express_info.receiver_name}")
+        if express_info.receiver_phone:
+            print(f"    收件电话: {express_info.receiver_phone}")
+        if express_info.receiver_address:
+            print(f"    收件地址: {express_info.receiver_address}")
+        if express_info.sender_name:
+            print(f"    寄件人: {express_info.sender_name}")
+        if express_info.sender_phone:
+            print(f"    寄件电话: {express_info.sender_phone}")
+        if express_info.sender_address:
+            print(f"    寄件地址: {express_info.sender_address}")
+
+        if not express_info.is_valid:
+            print("    [未识别到有效快递单信息]")
+            print(f"    合并文本: {result.merge_text()}")
+
+    def _handle_visualization(self, image_info: ImageInfo, result: OCRResult) -> None:
+        """
+        处理可视化和图片保存
+
+        Args:
+            image_info: 图片信息
+            result: OCR 结果
+        """
+        # 绘制结果
+        display_image = self._visualizer.draw_result(image_info.image, result)
+
+        # 显示窗口
+        if self._config.visualize.show_window:
+            key = self._visualizer.show(display_image, wait_key=0)
+            if key == ord('q') or key == ord('Q'):
+                raise KeyboardInterrupt()
+
+        # 保存标注后的图片
+        if self._config.output.save_image:
+            self._save_annotated_image(image_info, display_image)
+
+    def _save_annotated_image(self, image_info: ImageInfo, annotated_image) -> None:
+        """
+        保存标注后的图片
+
+        Args:
+            image_info: 原始图片信息
+            annotated_image: 标注后的图片
+        """
+        output_config = self._config.output
+
+        # 确定输出目录
+        if output_config.output_dir:
+            output_dir = Path(output_config.output_dir)
+        else:
+            output_dir = Path(image_info.path).parent
+
+        output_dir.mkdir(parents=True, exist_ok=True)
+
+        # 生成输出文件名
+        original_path = Path(image_info.path)
+        output_filename = f"{original_path.stem}{output_config.image_suffix}{original_path.suffix}"
+        output_path = output_dir / output_filename
+
+        # 保存图片（支持中文路径）
+        _, ext = os.path.splitext(str(output_path))
+        success, encoded = cv2.imencode(ext, annotated_image)
+        if success:
+            with open(output_path, 'wb') as f:
+                f.write(encoded.tobytes())
+            print(f"  [INFO] 标注图片已保存: {output_path}")
+
+    def _export_summary(self) -> None:
+        """
+        导出所有识别结果到汇总 JSON 文件
+        """
+        if not self._all_results:
+            print("[INFO] 没有识别结果需要导出")
+            return
+
+        output_config = self._config.output
+        if not output_config.save_json:
+            return
+
+        # 确定输出路径
+        if output_config.output_dir:
+            output_dir = Path(output_config.output_dir)
+            output_dir.mkdir(parents=True, exist_ok=True)
+            output_path = output_dir / output_config.json_filename
+        else:
+            output_path = Path(output_config.json_filename)
+
+        # 构建汇总数据
+        summary = {
+            "total_images": len(self._all_results),
+            "total_text_blocks": sum(r.get("text_count", 0) for r in self._all_results),
+            "results": self._all_results
+        }
+
+        # 写入文件
+        with open(output_path, 'w', encoding='utf-8') as f:
+            json.dump(summary, f, ensure_ascii=False, indent=2)
+
+        print(f"[INFO] 汇总结果已导出到 {output_path}，共 {summary['total_images']} 张图片")
+
+    def cleanup(self) -> None:
+        """清理资源"""
+        if self._visualizer:
+            self._visualizer.close()
+        print("[INFO] 资源已释放")
+
+
+def parse_args() -> argparse.Namespace:
+    """解析命令行参数"""
+    parser = argparse.ArgumentParser(
+        description="OCR 图片识别系统",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例:
+  # 识别单张图片
+  python main.py --image path/to/image.jpg
+
+  # 识别目录中的所有图片
+  python main.py --dir path/to/images/
+
+  # 识别目录中的特定格式图片
+  python main.py --dir path/to/images/ --pattern "*.png"
+
+  # 递归搜索子目录
+  python main.py --dir path/to/images/ --recursive
+
+  # 启用快递单解析模式
+  python main.py --image express.jpg --express
+
+  # 保存标注后的图片
+  python main.py --image test.jpg --save-image
+
+  # 指定输出目录
+  python main.py --dir images/ --output-dir results/
+
+  # 启用 ROI 裁剪（画面中央 60% 区域）
+  python main.py --image test.jpg --roi 0.2 0.2 0.6 0.6
+
+  # 使用 GPU 加速
+  python main.py --image test.jpg --gpu
+        """
+    )
+
+    # 输入源（互斥）
+    input_group = parser.add_mutually_exclusive_group(required=True)
+    input_group.add_argument(
+        "--image", "-i",
+        type=str,
+        help="单张图片路径"
+    )
+    input_group.add_argument(
+        "--dir", "-d",
+        type=str,
+        help="图片目录路径"
+    )
+
+    # 目录模式选项
+    parser.add_argument(
+        "--pattern", "-p",
+        type=str,
+        default=None,
+        help="文件匹配模式（如 '*.jpg'）"
+    )
+    parser.add_argument(
+        "--recursive", "-r",
+        action="store_true",
+        help="递归搜索子目录"
+    )
+
+    # OCR 配置
+    parser.add_argument(
+        "--lang", "-l",
+        type=str,
+        default="ch",
+        help="OCR 语言（默认: ch）"
+    )
+    parser.add_argument(
+        "--gpu",
+        action="store_true",
+        help="启用 GPU 加速"
+    )
+    parser.add_argument(
+        "--no-angle-cls",
+        action="store_true",
+        help="禁用方向分类"
+    )
+    parser.add_argument(
+        "--drop-score",
+        type=float,
+        default=0.5,
+        help="置信度阈值（默认: 0.5）"
+    )
+
+    # ROI 配置
+    parser.add_argument(
+        "--roi",
+        type=float,
+        nargs=4,
+        metavar=("X", "Y", "W", "H"),
+        help="ROI 区域（归一化坐标: x y width height）"
+    )
+
+    # 可视化配置
+    parser.add_argument(
+        "--show-window",
+        action="store_true",
+        help="显示可视化窗口（默认不显示）"
+    )
+    parser.add_argument(
+        "--no-confidence",
+        action="store_true",
+        help="不显示置信度"
+    )
+
+    # 输出配置
+    parser.add_argument(
+        "--output-dir", "-o",
+        type=str,
+        default=None,
+        help="输出目录路径"
+    )
+    parser.add_argument(
+        "--save-image",
+        action="store_true",
+        help="保存标注后的图片"
+    )
+    parser.add_argument(
+        "--no-json",
+        action="store_true",
+        help="不保存 JSON 结果"
+    )
+    parser.add_argument(
+        "--json-filename",
+        type=str,
+        default="ocr_result.json",
+        help="JSON 结果文件名（默认: ocr_result.json）"
+    )
+
+    # 快递单解析模式
+    parser.add_argument(
+        "--express", "-e",
+        action="store_true",
+        help="启用快递单解析模式，自动合并文本并提取结构化信息"
+    )
+
+    return parser.parse_args()
+
+
+def build_config(args: argparse.Namespace) -> Config:
+    """
+    根据命令行参数构建配置
+
+    Args:
+        args: 命令行参数
+
+    Returns:
+        配置对象
+    """
+    # 输入配置
+    if args.image:
+        input_config = InputConfig(
+            mode=InputMode.SINGLE,
+            image_path=args.image
+        )
+    else:
+        input_config = InputConfig(
+            mode=InputMode.DIRECTORY,
+            directory=args.dir,
+            pattern=args.pattern,
+            recursive=args.recursive
+        )
+
+    # OCR 配置
+    # 设置模型目录（解决 Windows 中文用户名路径问题）
+    det_model_dir = str(_MODELS_DIR / "ch_PP-OCRv4_det_infer")
+    rec_model_dir = str(_MODELS_DIR / "ch_PP-OCRv4_rec_infer")
+    cls_model_dir = str(_MODELS_DIR / "ch_ppocr_mobile_v2.0_cls_infer")
+
+    # 检查模型是否已下载
+    models_exist = (
+        Path(det_model_dir).exists() and
+        Path(rec_model_dir).exists() and
+        Path(cls_model_dir).exists()
+    )
+
+    ocr_config = OCRConfig(
+        lang=args.lang,
+        use_angle_cls=not args.no_angle_cls,
+        use_gpu=args.gpu,
+        drop_score=args.drop_score,
+        det_model_dir=det_model_dir if models_exist else None,
+        rec_model_dir=rec_model_dir if models_exist else None,
+        cls_model_dir=cls_model_dir if models_exist else None
+    )
+
+    # ROI 配置
+    roi_config = ROIConfig(enabled=False)
+    if args.roi:
+        roi_config = ROIConfig(
+            enabled=True,
+            x_ratio=args.roi[0],
+            y_ratio=args.roi[1],
+            width_ratio=args.roi[2],
+            height_ratio=args.roi[3]
+        )
+
+    # 管道配置
+    pipeline_config = PipelineConfig(roi=roi_config)
+
+    # 可视化配置
+    visualize_config = VisualizeConfig(
+        show_window=args.show_window,
+        show_confidence=not args.no_confidence
+    )
+
+    # 输出配置
+    output_config = OutputConfig(
+        output_dir=args.output_dir,
+        save_json=not args.no_json,
+        save_image=args.save_image,
+        json_filename=args.json_filename
+    )
+
+    return Config(
+        input=input_config,
+        ocr=ocr_config,
+        pipeline=pipeline_config,
+        visualize=visualize_config,
+        output=output_config
+    )
+
+
+def main() -> int:
+    """主函数"""
+    args = parse_args()
+    config = build_config(args)
+
+    # 检查模型是否已下载
+    if config.ocr.det_model_dir is None:
+        print("[WARN] 模型未在项目目录中找到")
+        print("[WARN] 对于 Windows 中文用户名用户，请先运行:")
+        print("[WARN]   python download_models.py")
+        print("[INFO] 回退到默认 PaddleOCR 模型路径...")
+
+    app = OCRApplication(
+        config,
+        express_mode=args.express
+    )
+
+    if not app.initialize():
+        return 1
+
+    app.run()
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/models/ch_PP-OCRv4_det_infer/inference.pdiparams
+++ b/models/ch_PP-OCRv4_det_infer/inference.pdiparams
--- a/models/ch_PP-OCRv4_det_infer/inference.pdiparams.info
+++ b/models/ch_PP-OCRv4_det_infer/inference.pdiparams.info
--- a/models/ch_PP-OCRv4_det_infer/inference.pdmodel
+++ b/models/ch_PP-OCRv4_det_infer/inference.pdmodel
--- a/models/ch_PP-OCRv4_rec_infer/inference.pdiparams
+++ b/models/ch_PP-OCRv4_rec_infer/inference.pdiparams
--- a/models/ch_PP-OCRv4_rec_infer/inference.pdiparams.info
+++ b/models/ch_PP-OCRv4_rec_infer/inference.pdiparams.info
--- a/models/ch_PP-OCRv4_rec_infer/inference.pdmodel
+++ b/models/ch_PP-OCRv4_rec_infer/inference.pdmodel
--- a/models/ch_ppocr_mobile_v2.0_cls_infer/._inference.pdmodel
+++ b/models/ch_ppocr_mobile_v2.0_cls_infer/._inference.pdmodel
--- a/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams
+++ b/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams
--- a/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams.info
+++ b/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdiparams.info
--- a/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel
+++ b/models/ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel
--- a/ocr/init.py
+++ b/ocr/init.py
@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+"""
+OCR 模块
+提供 OCR 引擎和处理管道
+"""
+
+from .engine import OCREngine, TextBlock
+from .pipeline import OCRPipeline, OCRResult
+from .express_parser import ExpressParser, ExpressInfo, TextLine
+
+__all__ = [
+    "OCREngine",
+    "TextBlock",
+    "OCRPipeline",
+    "OCRResult",
+    "ExpressParser",
+    "ExpressInfo",
+    "TextLine"
+]
--- a/ocr/engine.py
+++ b/ocr/engine.py
@ -0,0 +1,220 @@
+# -*- coding: utf-8 -*-
+"""
+OCR 引擎模块
+封装 PaddleOCR，提供统一的 OCR 接口
+"""
+
+import os
+from pathlib import Path
+
+# 在导入 PaddleOCR 之前设置环境变量
+# 解决 Windows 中文用户名路径问题
+_PROJECT_ROOT = Path(__file__).parent.parent
+_MODELS_DIR = _PROJECT_ROOT / "models"
+_MODELS_DIR.mkdir(exist_ok=True)
+os.environ["PADDLEOCR_HOME"] = str(_MODELS_DIR)
+
+import numpy as np
+from typing import List, Optional, Any
+from dataclasses import dataclass
+from paddleocr import PaddleOCR
+
+from utils.config import OCRConfig
+
+
+@dataclass
+class TextBlock:
+    """
+    文本块数据结构
+    表示 OCR 识别出的单个文本区域
+
+    Attributes:
+        text: 识别出的文本内容
+        confidence: 置信度 (0.0 ~ 1.0)
+        bbox: 边界框，4 个点的坐标 [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+        bbox_offset: ROI 偏移量，用于还原到原图坐标
+    """
+    text: str
+    confidence: float
+    bbox: List[List[float]]
+    bbox_offset: tuple = (0, 0)
+
+    @property
+    def bbox_with_offset(self) -> List[List[float]]:
+        """获取带偏移的边界框（还原到原图坐标）"""
+        offset_x, offset_y = self.bbox_offset
+        return [[p[0] + offset_x, p[1] + offset_y] for p in self.bbox]
+
+    @property
+    def center(self) -> tuple:
+        """获取文本块中心点"""
+        x_coords = [p[0] for p in self.bbox]
+        y_coords = [p[1] for p in self.bbox]
+        return (sum(x_coords) / 4, sum(y_coords) / 4)
+
+    @property
+    def width(self) -> float:
+        """获取文本块宽度"""
+        x_coords = [p[0] for p in self.bbox]
+        return max(x_coords) - min(x_coords)
+
+    @property
+    def height(self) -> float:
+        """获取文本块高度"""
+        y_coords = [p[1] for p in self.bbox]
+        return max(y_coords) - min(y_coords)
+
+    def to_dict(self) -> dict:
+        """转换为字典格式"""
+        return {
+            "text": self.text,
+            "confidence": self.confidence,
+            "bbox": self.bbox,
+            "bbox_with_offset": self.bbox_with_offset,
+            "center": self.center,
+            "width": self.width,
+            "height": self.height
+        }
+
+
+class OCREngine:
+    """
+    OCR 引擎类
+    封装 PaddleOCR，提供简洁的 OCR 调用接口
+    """
+
+    def __init__(self, config: OCRConfig):
+        """
+        初始化 OCR 引擎
+
+        Args:
+            config: OCR 配置
+        """
+        self._config = config
+        self._ocr: Optional[PaddleOCR] = None
+
+    def initialize(self) -> None:
+        """
+        初始化 PaddleOCR 实例
+        延迟初始化，避免在导入时加载模型
+        适配 PaddleOCR 2.x API
+        """
+        if self._ocr is not None:
+            return
+
+        # 构建参数
+        params = {
+            "lang": self._config.lang,
+            "use_angle_cls": self._config.use_angle_cls,
+            "use_gpu": self._config.use_gpu,
+            "det_db_thresh": self._config.det_db_thresh,
+            "det_db_box_thresh": self._config.det_db_box_thresh,
+            "drop_score": self._config.drop_score,
+            "show_log": self._config.show_log
+        }
+
+        # 如果指定了模型目录，则使用自定义路径（解决中文路径问题）
+        if self._config.det_model_dir:
+            params["det_model_dir"] = self._config.det_model_dir
+        if self._config.rec_model_dir:
+            params["rec_model_dir"] = self._config.rec_model_dir
+        if self._config.cls_model_dir:
+            params["cls_model_dir"] = self._config.cls_model_dir
+
+        # PaddleOCR 2.x API
+        self._ocr = PaddleOCR(**params)
+
+    def recognize(
+        self,
+        image: np.ndarray,
+        roi_offset: tuple = (0, 0)
+    ) -> List[TextBlock]:
+        """
+        对图像进行 OCR 识别
+
+        Args:
+            image: 输入图像 (numpy array, BGR 或灰度图)
+            roi_offset: ROI 偏移量 (x, y)，用于还原坐标
+
+        Returns:
+            识别结果列表
+        """
+        # 确保引擎已初始化
+        if self._ocr is None:
+            self.initialize()
+
+        # 执行 OCR (PaddleOCR 2.x API)
+        result = self._ocr.ocr(image, cls=self._config.use_angle_cls)
+
+        # 解析结果
+        text_blocks: List[TextBlock] = []
+
+        # PaddleOCR 返回格式: [[line1, line2, ...]] 或 None
+        if result is None or len(result) == 0:
+            return text_blocks
+
+        # 遍历每一行结果
+        for line in result:
+            if line is None:
+                continue
+            for item in line:
+                if item is None or len(item) < 2:
+                    continue
+
+                bbox = item[0]  # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+                text_info = item[1]  # (text, confidence)
+
+                if len(text_info) < 2:
+                    continue
+
+                text = text_info[0]
+                confidence = float(text_info[1])
+
+                # 过滤低置信度结果
+                if confidence < self._config.drop_score:
+                    continue
+
+                text_block = TextBlock(
+                    text=text,
+                    confidence=confidence,
+                    bbox=bbox,
+                    bbox_offset=roi_offset
+                )
+                text_blocks.append(text_block)
+
+        return text_blocks
+
+    def recognize_batch(
+        self,
+        images: List[np.ndarray]
+    ) -> List[List[TextBlock]]:
+        """
+        批量 OCR 识别
+
+        Args:
+            images: 输入图像列表
+
+        Returns:
+            每张图像的识别结果列表
+        """
+        return [self.recognize(img) for img in images]
+
+    @property
+    def config(self) -> OCRConfig:
+        """获取当前配置"""
+        return self._config
+
+    def update_config(self, **kwargs) -> None:
+        """
+        更新配置并重新初始化引擎
+
+        Args:
+            **kwargs: 要更新的配置项
+        """
+        for key, value in kwargs.items():
+            if hasattr(self._config, key):
+                setattr(self._config, key, value)
+
+        # 重新初始化
+        self._ocr = None
+        self.initialize()
--- a/ocr/express_parser.py
+++ b/ocr/express_parser.py
@ -0,0 +1,404 @@
+# -*- coding: utf-8 -*-
+"""
+快递单解析模块
+将分散的 OCR 文本块合并并解析成结构化的快递单信息
+"""
+
+import re
+from dataclasses import dataclass, field
+from typing import List, Optional, Dict, Any
+from .engine import TextBlock
+
+
+@dataclass
+class ExpressInfo:
+    """
+    快递单结构化信息
+
+    Attributes:
+        tracking_number: 运单号
+        sender_name: 寄件人姓名
+        sender_phone: 寄件人电话
+        sender_address: 寄件人地址
+        receiver_name: 收件人姓名
+        receiver_phone: 收件人电话
+        receiver_address: 收件人地址
+        courier_company: 快递公司
+        raw_text: 原始合并文本（用于调试）
+        confidence: 平均置信度
+        extra_fields: 其他识别到的字段
+    """
+    tracking_number: Optional[str] = None
+    sender_name: Optional[str] = None
+    sender_phone: Optional[str] = None
+    sender_address: Optional[str] = None
+    receiver_name: Optional[str] = None
+    receiver_phone: Optional[str] = None
+    receiver_address: Optional[str] = None
+    courier_company: Optional[str] = None
+    raw_text: str = ""
+    confidence: float = 0.0
+    extra_fields: Dict[str, str] = field(default_factory=dict)
+
+    def to_dict(self) -> Dict[str, Any]:
+        """转换为字典格式"""
+        return {
+            "tracking_number": self.tracking_number,
+            "sender": {
+                "name": self.sender_name,
+                "phone": self.sender_phone,
+                "address": self.sender_address
+            },
+            "receiver": {
+                "name": self.receiver_name,
+                "phone": self.receiver_phone,
+                "address": self.receiver_address
+            },
+            "courier_company": self.courier_company,
+            "confidence": self.confidence,
+            "extra_fields": self.extra_fields,
+            "raw_text": self.raw_text
+        }
+
+    @property
+    def is_valid(self) -> bool:
+        """检查是否包含有效的快递单信息"""
+        # 至少需要运单号或收件人信息
+        return bool(self.tracking_number or self.receiver_name or self.receiver_phone)
+
+
+@dataclass
+class TextLine:
+    """
+    合并后的文本行
+
+    Attributes:
+        text: 合并后的文本
+        blocks: 原始文本块列表
+        y_center: 行中心 Y 坐标
+        x_min: 行起始 X 坐标
+    """
+    text: str
+    blocks: List[TextBlock]
+    y_center: float
+    x_min: float
+
+    @property
+    def confidence(self) -> float:
+        """计算平均置信度"""
+        if not self.blocks:
+            return 0.0
+        return sum(b.confidence for b in self.blocks) / len(self.blocks)
+
+
+class ExpressParser:
+    """
+    快递单解析器
+    将分散的文本块合并成行，并提取结构化信息
+    """
+
+    # 快递公司关键词
+    COURIER_KEYWORDS = {
+        "顺丰": "顺丰速运",
+        "SF": "顺丰速运",
+        "圆通": "圆通速递",
+        "中通": "中通快递",
+        "韵达": "韵达快递",
+        "申通": "申通快递",
+        "极兔": "极兔速递",
+        "京东": "京东物流",
+        "JD": "京东物流",
+        "邮政": "中国邮政",
+        "EMS": "中国邮政EMS",
+        "百世": "百世快递",
+        "德邦": "德邦快递",
+        "天天": "天天快递",
+        "宅急送": "宅急送",
+    }
+
+    # 字段关键词模式
+    FIELD_PATTERNS = {
+        "tracking_number": [
+            r"运单号[：:]\s*(\w+)",
+            r"单号[：:]\s*(\w+)",
+            r"快递单号[：:]\s*(\w+)",
+            r"物流单号[：:]\s*(\w+)",
+            r"^(\d{10,20})$",  # 纯数字运单号
+            r"^([A-Z]{2}\d{9,13}[A-Z]{2})$",  # 国际快递单号格式
+        ],
+        "receiver_name": [
+            r"收件人[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+            r"收货人[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+            r"收[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+        ],
+        "receiver_phone": [
+            r"收件人.*?电话[：:]\s*(\d{11})",
+            r"收件人.*?手机[：:]\s*(\d{11})",
+            r"收.*?(\d{11})",
+            r"电话[：:]\s*(\d{11})",
+            r"手机[：:]\s*(\d{11})",
+            r"(?<![0-9])(\d{11})(?![0-9])",  # 独立的11位手机号
+        ],
+        "receiver_address": [
+            r"收件地址[：:]\s*(.+?)(?:寄件|发件|$)",
+            r"收货地址[：:]\s*(.+?)(?:寄件|发件|$)",
+            r"地址[：:]\s*(.+?)(?:寄件|发件|电话|$)",
+        ],
+        "sender_name": [
+            r"寄件人[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+            r"发件人[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+            r"寄[：:]\s*(.+?)(?:\s|电话|手机|地址|$)",
+        ],
+        "sender_phone": [
+            r"寄件人.*?电话[：:]\s*(\d{11})",
+            r"寄件人.*?手机[：:]\s*(\d{11})",
+        ],
+        "sender_address": [
+            r"寄件地址[：:]\s*(.+?)(?:收件|$)",
+            r"发件地址[：:]\s*(.+?)(?:收件|$)",
+        ],
+    }
+
+    def __init__(
+        self,
+        line_merge_threshold: float = 0.6,
+        horizontal_gap_threshold: float = 2.0
+    ):
+        """
+        初始化解析器
+
+        Args:
+            line_merge_threshold: 行合并阈值（相对于文本高度的比例）
+            horizontal_gap_threshold: 水平间距阈值（相对于平均字符宽度的比例）
+        """
+        self._line_merge_threshold = line_merge_threshold
+        self._horizontal_gap_threshold = horizontal_gap_threshold
+
+    def parse(self, text_blocks: List[TextBlock]) -> ExpressInfo:
+        """
+        解析文本块列表，提取快递单信息
+
+        Args:
+            text_blocks: OCR 识别的文本块列表
+
+        Returns:
+            结构化的快递单信息
+        """
+        if not text_blocks:
+            return ExpressInfo()
+
+        # 1. 合并文本块为行
+        lines = self._merge_blocks_to_lines(text_blocks)
+
+        # 2. 生成完整文本（用于正则匹配）
+        full_text = self._lines_to_text(lines)
+
+        # 3. 提取结构化信息
+        info = self._extract_info(full_text, lines)
+
+        # 4. 计算平均置信度
+        info.confidence = sum(b.confidence for b in text_blocks) / len(text_blocks)
+        info.raw_text = full_text
+
+        return info
+
+    def _merge_blocks_to_lines(self, blocks: List[TextBlock]) -> List[TextLine]:
+        """
+        将文本块按位置合并为行
+
+        基于 Y 坐标将相近的文本块合并到同一行，
+        然后按 X 坐标排序合并文本
+        """
+        if not blocks:
+            return []
+
+        # 按 Y 坐标排序
+        sorted_blocks = sorted(blocks, key=lambda b: b.center[1])
+
+        lines: List[TextLine] = []
+        current_line_blocks: List[TextBlock] = [sorted_blocks[0]]
+        current_y = sorted_blocks[0].center[1]
+
+        for block in sorted_blocks[1:]:
+            block_y = block.center[1]
+            block_height = block.height
+
+            # 判断是否属于同一行（Y 坐标差值小于阈值）
+            threshold = block_height * self._line_merge_threshold
+            if abs(block_y - current_y) <= threshold:
+                current_line_blocks.append(block)
+            else:
+                # 完成当前行，开始新行
+                line = self._create_line(current_line_blocks)
+                lines.append(line)
+                current_line_blocks = [block]
+                current_y = block_y
+
+        # 处理最后一行
+        if current_line_blocks:
+            line = self._create_line(current_line_blocks)
+            lines.append(line)
+
+        return lines
+
+    def _create_line(self, blocks: List[TextBlock]) -> TextLine:
+        """
+        从文本块列表创建文本行
+
+        按 X 坐标排序，根据间距决定是否添加空格
+        """
+        # 按 X 坐标排序
+        sorted_blocks = sorted(blocks, key=lambda b: b.center[0])
+
+        # 合并文本
+        text_parts = []
+        prev_block = None
+
+        for block in sorted_blocks:
+            if prev_block is not None:
+                # 计算水平间距
+                prev_right = max(p[0] for p in prev_block.bbox)
+                curr_left = min(p[0] for p in block.bbox)
+                gap = curr_left - prev_right
+
+                # 计算平均字符宽度
+                avg_char_width = prev_block.width / max(len(prev_block.text), 1)
+
+                # 如果间距较大，添加空格
+                if gap > avg_char_width * self._horizontal_gap_threshold:
+                    text_parts.append(" ")
+
+            text_parts.append(block.text)
+            prev_block = block
+
+        merged_text = "".join(text_parts)
+        y_center = sum(b.center[1] for b in sorted_blocks) / len(sorted_blocks)
+        x_min = min(min(p[0] for p in b.bbox) for b in sorted_blocks)
+
+        return TextLine(
+            text=merged_text,
+            blocks=sorted_blocks,
+            y_center=y_center,
+            x_min=x_min
+        )
+
+    def _lines_to_text(self, lines: List[TextLine]) -> str:
+        """将文本行列表转换为完整文本"""
+        return "\n".join(line.text for line in lines)
+
+    def _extract_info(self, full_text: str, lines: List[TextLine]) -> ExpressInfo:
+        """
+        从文本中提取快递单信息
+
+        Args:
+            full_text: 完整文本
+            lines: 文本行列表
+
+        Returns:
+            结构化的快递单信息
+        """
+        info = ExpressInfo()
+
+        # 提取快递公司
+        info.courier_company = self._extract_courier_company(full_text)
+
+        # 提取各字段
+        for field_name, patterns in self.FIELD_PATTERNS.items():
+            value = self._extract_field(full_text, patterns)
+            if value:
+                setattr(info, field_name, value)
+
+        # 尝试从上下文推断地址
+        if not info.receiver_address:
+            info.receiver_address = self._extract_address_from_context(lines, "收")
+
+        if not info.sender_address:
+            info.sender_address = self._extract_address_from_context(lines, "寄")
+
+        return info
+
+    def _extract_courier_company(self, text: str) -> Optional[str]:
+        """提取快递公司名称"""
+        text_upper = text.upper()
+        for keyword, company in self.COURIER_KEYWORDS.items():
+            if keyword.upper() in text_upper:
+                return company
+        return None
+
+    def _extract_field(self, text: str, patterns: List[str]) -> Optional[str]:
+        """
+        使用正则表达式列表提取字段值
+
+        Args:
+            text: 待匹配文本
+            patterns: 正则表达式列表
+
+        Returns:
+            匹配到的字段值，或 None
+        """
+        for pattern in patterns:
+            match = re.search(pattern, text, re.MULTILINE | re.IGNORECASE)
+            if match:
+                value = match.group(1).strip()
+                # 清理常见的干扰字符
+                value = re.sub(r'[【】\[\]()（）]', '', value)
+                if value:
+                    return value
+        return None
+
+    def _extract_address_from_context(
+        self,
+        lines: List[TextLine],
+        context_keyword: str
+    ) -> Optional[str]:
+        """
+        从上下文中提取地址
+
+        查找包含省/市/区/县/街/路等关键词的行
+        """
+        address_keywords = ["省", "市", "区", "县", "镇", "村", "街", "路", "号", "栋", "楼", "室"]
+
+        # 查找包含上下文关键词的行索引
+        context_line_idx = -1
+        for i, line in enumerate(lines):
+            if context_keyword in line.text:
+                context_line_idx = i
+                break
+
+        # 在上下文行附近查找地址
+        search_range = range(
+            max(0, context_line_idx),
+            min(len(lines), context_line_idx + 3 if context_line_idx >= 0 else len(lines))
+        )
+
+        address_parts = []
+        for i in search_range:
+            line_text = lines[i].text
+            # 检查是否包含地址关键词
+            if any(kw in line_text for kw in address_keywords):
+                # 清理行首的标签（如 "地址："）
+                cleaned = re.sub(r'^[^：:]*[：:]\s*', '', line_text)
+                if cleaned and cleaned != line_text:
+                    address_parts.append(cleaned)
+                elif any(kw in line_text for kw in address_keywords[:4]):  # 省/市/区/县
+                    address_parts.append(line_text)
+
+        if address_parts:
+            return "".join(address_parts)
+
+        return None
+
+    def merge_text_blocks(self, text_blocks: List[TextBlock]) -> str:
+        """
+        仅合并文本块，不进行字段提取
+
+        用于获取完整的合并文本
+
+        Args:
+            text_blocks: 文本块列表
+
+        Returns:
+            合并后的完整文本
+        """
+        lines = self._merge_blocks_to_lines(text_blocks)
+        return self._lines_to_text(lines)
--- a/ocr/pipeline.py
+++ b/ocr/pipeline.py
@ -0,0 +1,303 @@
+# -*- coding: utf-8 -*-
+"""
+OCR 处理管道模块
+提供图片 OCR 识别和结果解析的完整处理流程
+"""
+
+import time
+import numpy as np
+from typing import List, Optional, Dict, Any, Callable, TYPE_CHECKING
+from dataclasses import dataclass, field
+
+from ocr.engine import OCREngine, TextBlock
+from utils.config import PipelineConfig, OCRConfig
+
+if TYPE_CHECKING:
+    from ocr.express_parser import ExpressInfo, ExpressParser
+
+
+@dataclass
+class OCRResult:
+    """
+    OCR 处理结果数据结构
+
+    Attributes:
+        image_index: 图片索引（批量处理时使用）
+        image_path: 图片路径
+        timestamp: 处理时间戳
+        processing_time_ms: 处理耗时（毫秒）
+        text_blocks: 识别出的文本块列表
+        roi_applied: 是否应用了 ROI 裁剪
+        roi_rect: ROI 矩形 (x, y, w, h)，如果应用了 ROI
+    """
+    image_index: int
+    image_path: Optional[str]
+    timestamp: float
+    processing_time_ms: float
+    text_blocks: List[TextBlock]
+    roi_applied: bool = False
+    roi_rect: Optional[tuple] = None
+
+    @property
+    def text_count(self) -> int:
+        """识别出的文本数量"""
+        return len(self.text_blocks)
+
+    @property
+    def all_texts(self) -> List[str]:
+        """获取所有识别出的文本"""
+        return [block.text for block in self.text_blocks]
+
+    @property
+    def full_text(self) -> str:
+        """获取所有文本拼接结果"""
+        return "\n".join(self.all_texts)
+
+    @property
+    def average_confidence(self) -> float:
+        """获取平均置信度"""
+        if not self.text_blocks:
+            return 0.0
+        return sum(b.confidence for b in self.text_blocks) / len(self.text_blocks)
+
+    def to_dict(self) -> Dict[str, Any]:
+        """转换为字典格式，便于 JSON 序列化"""
+        return {
+            "image_index": self.image_index,
+            "image_path": self.image_path,
+            "timestamp": self.timestamp,
+            "processing_time_ms": self.processing_time_ms,
+            "text_count": self.text_count,
+            "average_confidence": self.average_confidence,
+            "roi_applied": self.roi_applied,
+            "roi_rect": self.roi_rect,
+            "text_blocks": [block.to_dict() for block in self.text_blocks]
+        }
+
+    def filter_by_confidence(self, min_confidence: float) -> "OCRResult":
+        """
+        按置信度过滤结果
+
+        Args:
+            min_confidence: 最小置信度阈值
+
+        Returns:
+            过滤后的 OCRResult
+        """
+        filtered_blocks = [
+            block for block in self.text_blocks
+            if block.confidence >= min_confidence
+        ]
+        return OCRResult(
+            image_index=self.image_index,
+            image_path=self.image_path,
+            timestamp=self.timestamp,
+            processing_time_ms=self.processing_time_ms,
+            text_blocks=filtered_blocks,
+            roi_applied=self.roi_applied,
+            roi_rect=self.roi_rect
+        )
+
+    def parse_express(self) -> "ExpressInfo":
+        """
+        解析快递单信息
+
+        将分散的文本块合并并提取结构化的快递单信息
+
+        Returns:
+            结构化的快递单信息
+        """
+        from ocr.express_parser import ExpressParser
+        parser = ExpressParser()
+        return parser.parse(self.text_blocks)
+
+    def merge_text(self) -> str:
+        """
+        合并文本块为完整文本
+
+        基于位置信息智能合并，同一行的文本会被合并
+
+        Returns:
+            合并后的完整文本
+        """
+        from ocr.express_parser import ExpressParser
+        parser = ExpressParser()
+        return parser.merge_text_blocks(self.text_blocks)
+
+
+class OCRPipeline:
+    """
+    OCR 处理管道
+    负责 ROI 裁剪、OCR 调用、结果封装
+    """
+
+    def __init__(
+        self,
+        ocr_config: OCRConfig,
+        pipeline_config: Optional[PipelineConfig] = None
+    ):
+        """
+        初始化 OCR 管道
+
+        Args:
+            ocr_config: OCR 引擎配置
+            pipeline_config: 管道配置（可选）
+        """
+        self._ocr_config = ocr_config
+        self._pipeline_config = pipeline_config or PipelineConfig()
+        self._engine = OCREngine(ocr_config)
+        self._image_counter: int = 0
+
+        # 预留扩展点：图片预处理回调
+        self._image_preprocessors: List[Callable[[np.ndarray], np.ndarray]] = []
+
+        # 预留扩展点：结果后处理回调
+        self._result_postprocessors: List[Callable[[OCRResult], OCRResult]] = []
+
+    def initialize(self) -> None:
+        """初始化管道（预加载 OCR 模型）"""
+        self._engine.initialize()
+
+    def add_preprocessor(
+        self,
+        preprocessor: Callable[[np.ndarray], np.ndarray]
+    ) -> None:
+        """
+        添加图片预处理器
+
+        Args:
+            preprocessor: 预处理函数，接收图像返回处理后的图像
+        """
+        self._image_preprocessors.append(preprocessor)
+
+    def add_postprocessor(
+        self,
+        postprocessor: Callable[[OCRResult], OCRResult]
+    ) -> None:
+        """
+        添加结果后处理器
+
+        Args:
+            postprocessor: 后处理函数，接收 OCRResult 返回处理后的结果
+        """
+        self._result_postprocessors.append(postprocessor)
+
+    def _apply_roi(
+        self,
+        image: np.ndarray
+    ) -> tuple:
+        """
+        应用 ROI 裁剪
+
+        Args:
+            image: 原始图片
+
+        Returns:
+            (裁剪后的图像, ROI 偏移量, ROI 矩形)
+        """
+        roi_config = self._pipeline_config.roi
+
+        if not roi_config.enabled:
+            return image, (0, 0), None
+
+        h, w = image.shape[:2]
+        x, y, roi_w, roi_h = roi_config.get_roi_rect(w, h)
+
+        # 边界检查
+        x = max(0, min(x, w - 1))
+        y = max(0, min(y, h - 1))
+        roi_w = min(roi_w, w - x)
+        roi_h = min(roi_h, h - y)
+
+        cropped = image[y:y + roi_h, x:x + roi_w]
+        return cropped, (x, y), (x, y, roi_w, roi_h)
+
+    def _preprocess_image(self, image: np.ndarray) -> np.ndarray:
+        """
+        执行图片预处理
+
+        Args:
+            image: 原始图片
+
+        Returns:
+            预处理后的图片
+        """
+        processed = image
+        for preprocessor in self._image_preprocessors:
+            processed = preprocessor(processed)
+        return processed
+
+    def _postprocess_result(self, result: OCRResult) -> OCRResult:
+        """
+        执行结果后处理
+
+        Args:
+            result: 原始结果
+
+        Returns:
+            后处理后的结果
+        """
+        processed = result
+        for postprocessor in self._result_postprocessors:
+            processed = postprocessor(processed)
+        return processed
+
+    def process(
+        self,
+        image: np.ndarray,
+        image_path: Optional[str] = None
+    ) -> OCRResult:
+        """
+        处理单张图片
+
+        Args:
+            image: 输入图片 (numpy array, BGR 格式)
+            image_path: 图片路径（可选，用于结果记录）
+
+        Returns:
+            OCR 结果
+        """
+        self._image_counter += 1
+        start_time = time.time()
+
+        # 应用 ROI 裁剪
+        cropped_image, roi_offset, roi_rect = self._apply_roi(image)
+
+        # 图片预处理
+        processed_image = self._preprocess_image(cropped_image)
+
+        # 执行 OCR
+        text_blocks = self._engine.recognize(processed_image, roi_offset)
+
+        # 计算处理耗时
+        processing_time_ms = (time.time() - start_time) * 1000
+
+        # 构建结果
+        result = OCRResult(
+            image_index=self._image_counter,
+            image_path=image_path,
+            timestamp=time.time(),
+            processing_time_ms=processing_time_ms,
+            text_blocks=text_blocks,
+            roi_applied=self._pipeline_config.roi.enabled,
+            roi_rect=roi_rect
+        )
+
+        # 结果后处理
+        result = self._postprocess_result(result)
+
+        return result
+
+    def reset_counter(self) -> None:
+        """重置图片计数器"""
+        self._image_counter = 0
+
+    @property
+    def image_counter(self) -> int:
+        """获取已处理的图片计数"""
+        return self._image_counter
+
+    @property
+    def config(self) -> PipelineConfig:
+        """获取管道配置"""
+        return self._pipeline_config
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,10 @@
+# Vision-OCR Dependencies
+
+# Core dependencies
+paddlepaddle>=2.5.0,<3.0.0
+paddleocr>=2.7.0,<3.0.0
+opencv-python>=4.8.0
+numpy>=1.24.0,<2.0
+
+# Optional dependencies (for Chinese font rendering)
+Pillow>=10.0.0
--- a/utils/init.py
+++ b/utils/init.py
@ -0,0 +1,27 @@
+# -*- coding: utf-8 -*-
+"""
+工具模块
+提供配置管理等通用功能
+"""
+
+from .config import (
+    Config,
+    OCRConfig,
+    InputConfig,
+    VisualizeConfig,
+    PipelineConfig,
+    ROIConfig,
+    OutputConfig,
+    InputMode
+)
+
+__all__ = [
+    "Config",
+    "OCRConfig",
+    "InputConfig",
+    "VisualizeConfig",
+    "PipelineConfig",
+    "ROIConfig",
+    "OutputConfig",
+    "InputMode"
+]
--- a/utils/config.py
+++ b/utils/config.py
@ -0,0 +1,242 @@
+# -*- coding: utf-8 -*-
+"""
+配置管理模块
+集中管理所有可配置参数，便于维护和扩展
+"""
+
+from dataclasses import dataclass, field
+from typing import Optional, Tuple, List
+from enum import Enum
+
+
+class InputMode(Enum):
+    """输入模式枚举"""
+    SINGLE = "single"      # 单张图片
+    BATCH = "batch"        # 多张图片列表
+    DIRECTORY = "directory"  # 目录批量
+
+
+@dataclass
+class InputConfig:
+    """
+    图片输入配置
+
+    Attributes:
+        mode: 输入模式
+        image_path: 单张图片路径
+        image_paths: 多张图片路径列表
+        directory: 图片目录路径
+        pattern: 文件匹配模式（如 "*.jpg"）
+        recursive: 是否递归搜索子目录
+    """
+    mode: InputMode = InputMode.SINGLE
+    image_path: Optional[str] = None
+    image_paths: Optional[List[str]] = None
+    directory: Optional[str] = None
+    pattern: Optional[str] = None
+    recursive: bool = False
+
+    def __post_init__(self):
+        """参数校验"""
+        if self.mode == InputMode.SINGLE and not self.image_path:
+            raise ValueError("单张图片模式下必须指定 image_path")
+        if self.mode == InputMode.BATCH and not self.image_paths:
+            raise ValueError("批量模式下必须指定 image_paths")
+        if self.mode == InputMode.DIRECTORY and not self.directory:
+            raise ValueError("目录模式下必须指定 directory")
+
+
+@dataclass
+class ROIConfig:
+    """
+    感兴趣区域(ROI)配置
+    使用归一化坐标 (0.0 ~ 1.0)，便于适配不同分辨率
+
+    Attributes:
+        enabled: 是否启用 ROI 裁剪
+        x_ratio: ROI 左上角 x 坐标比例
+        y_ratio: ROI 左上角 y 坐标比例
+        width_ratio: ROI 宽度比例
+        height_ratio: ROI 高度比例
+    """
+    enabled: bool = False
+    x_ratio: float = 0.1
+    y_ratio: float = 0.1
+    width_ratio: float = 0.8
+    height_ratio: float = 0.8
+
+    def get_roi_rect(self, frame_width: int, frame_height: int) -> Tuple[int, int, int, int]:
+        """
+        根据帧尺寸计算实际 ROI 矩形
+
+        Args:
+            frame_width: 帧宽度
+            frame_height: 帧高度
+
+        Returns:
+            (x, y, width, height) 像素坐标
+        """
+        x = int(frame_width * self.x_ratio)
+        y = int(frame_height * self.y_ratio)
+        width = int(frame_width * self.width_ratio)
+        height = int(frame_height * self.height_ratio)
+        return x, y, width, height
+
+
+@dataclass
+class OCRConfig:
+    """
+    OCR 引擎配置 (适配 PaddleOCR 2.x API)
+
+    Attributes:
+        lang: 识别语言，支持 "ch"(中文), "en"(英文) 等
+        use_angle_cls: 是否启用方向分类器
+        use_gpu: 是否使用 GPU 加速
+        det_db_thresh: 文本检测阈值
+        det_db_box_thresh: 检测框阈值
+        drop_score: 低于此置信度的结果将被过滤
+        show_log: 是否显示 PaddleOCR 日志
+        det_model_dir: 检测模型目录路径
+        rec_model_dir: 识别模型目录路径
+        cls_model_dir: 分类模型目录路径
+    """
+    lang: str = "ch"
+    use_angle_cls: bool = True
+    use_gpu: bool = False
+    det_db_thresh: float = 0.3
+    det_db_box_thresh: float = 0.5
+    drop_score: float = 0.5
+    show_log: bool = False
+    det_model_dir: Optional[str] = None
+    rec_model_dir: Optional[str] = None
+    cls_model_dir: Optional[str] = None
+
+
+@dataclass
+class PipelineConfig:
+    """
+    OCR 处理管道配置
+
+    Attributes:
+        roi: ROI 配置
+    """
+    roi: ROIConfig = field(default_factory=ROIConfig)
+
+
+@dataclass
+class VisualizeConfig:
+    """
+    可视化配置
+
+    Attributes:
+        show_window: 是否显示可视化窗口
+        window_name: 窗口名称
+        box_color: 文本框颜色 (B, G, R)
+        box_thickness: 文本框线宽
+        text_color: 文字颜色 (B, G, R)
+        text_scale: 文字缩放比例
+        text_thickness: 文字线宽
+        show_confidence: 是否在文字旁显示置信度
+        font_path: 中文字体路径，None 则使用 OpenCV 默认字体
+    """
+    show_window: bool = False
+    window_name: str = "OCR Result"
+    box_color: Tuple[int, int, int] = (0, 255, 0)
+    box_thickness: int = 2
+    text_color: Tuple[int, int, int] = (0, 0, 255)
+    text_scale: float = 0.6
+    text_thickness: int = 1
+    show_confidence: bool = True
+    font_path: Optional[str] = None
+
+
+@dataclass
+class OutputConfig:
+    """
+    输出配置
+
+    Attributes:
+        output_dir: 输出目录路径
+        save_json: 是否保存 JSON 结果
+        save_image: 是否保存标注后的图片
+        json_filename: JSON 文件名模板
+        image_suffix: 标注图片后缀
+    """
+    output_dir: Optional[str] = None
+    save_json: bool = True
+    save_image: bool = False
+    json_filename: str = "ocr_result.json"
+    image_suffix: str = "_ocr"
+
+
+@dataclass
+class Config:
+    """
+    全局配置类，聚合所有配置模块
+
+    Attributes:
+        input: 输入配置
+        ocr: OCR 引擎配置
+        pipeline: 处理管道配置
+        visualize: 可视化配置
+        output: 输出配置
+    """
+    input: Optional[InputConfig] = None
+    ocr: OCRConfig = field(default_factory=OCRConfig)
+    pipeline: PipelineConfig = field(default_factory=PipelineConfig)
+    visualize: VisualizeConfig = field(default_factory=VisualizeConfig)
+    output: OutputConfig = field(default_factory=OutputConfig)
+
+    @classmethod
+    def default(cls) -> "Config":
+        """创建默认配置"""
+        return cls()
+
+    @classmethod
+    def for_single_image(cls, image_path: str) -> "Config":
+        """
+        创建单张图片模式的配置
+
+        Args:
+            image_path: 图片路径
+        """
+        config = cls()
+        config.input = InputConfig(
+            mode=InputMode.SINGLE,
+            image_path=image_path
+        )
+        return config
+
+    @classmethod
+    def for_directory(cls, directory: str, pattern: Optional[str] = None, recursive: bool = False) -> "Config":
+        """
+        创建目录批量模式的配置
+
+        Args:
+            directory: 目录路径
+            pattern: 文件匹配模式
+            recursive: 是否递归搜索
+        """
+        config = cls()
+        config.input = InputConfig(
+            mode=InputMode.DIRECTORY,
+            directory=directory,
+            pattern=pattern,
+            recursive=recursive
+        )
+        return config
+
+    @classmethod
+    def for_batch(cls, image_paths: List[str]) -> "Config":
+        """
+        创建批量图片模式的配置
+
+        Args:
+            image_paths: 图片路径列表
+        """
+        config = cls()
+        config.input = InputConfig(
+            mode=InputMode.BATCH,
+            image_paths=image_paths
+        )
+        return config
--- a/visualize/init.py
+++ b/visualize/init.py
@ -0,0 +1,9 @@
+# -*- coding: utf-8 -*-
+"""
+可视化模块
+提供 OCR 结果的可视化绘制功能
+"""
+
+from .draw import OCRVisualizer
+
+__all__ = ["OCRVisualizer"]
--- a/visualize/draw.py
+++ b/visualize/draw.py
@ -0,0 +1,368 @@
+# -*- coding: utf-8 -*-
+"""
+可视化模块
+在图像上绘制 OCR 识别结果
+"""
+
+import os
+import sys
+import cv2
+import numpy as np
+from typing import List, Optional, Tuple
+
+from ocr.engine import TextBlock
+from ocr.pipeline import OCRResult
+from utils.config import VisualizeConfig
+
+
+# Windows 系统常用中文字体列表（按优先级排序）
+_WINDOWS_CHINESE_FONTS = [
+    "msyh.ttc",      # 微软雅黑
+    "msyhbd.ttc",    # 微软雅黑粗体
+    "simhei.ttf",    # 黑体
+    "simsun.ttc",    # 宋体
+    "simkai.ttf",    # 楷体
+]
+
+# Linux 系统常用中文字体路径
+_LINUX_CHINESE_FONTS = [
+    "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc",
+    "/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc",
+    "/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc",
+    "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf",
+]
+
+
+def _find_system_chinese_font() -> Optional[str]:
+    """
+    自动查找系统中文字体
+
+    Returns:
+        字体文件路径，未找到返回 None
+    """
+    if sys.platform == "win32":
+        # Windows 字体目录
+        fonts_dir = os.path.join(os.environ.get("WINDIR", "C:\\Windows"), "Fonts")
+        for font_name in _WINDOWS_CHINESE_FONTS:
+            font_path = os.path.join(fonts_dir, font_name)
+            if os.path.exists(font_path):
+                return font_path
+    else:
+        # Linux/macOS
+        for font_path in _LINUX_CHINESE_FONTS:
+            if os.path.exists(font_path):
+                return font_path
+
+    return None
+
+
+class OCRVisualizer:
+    """
+    OCR 结果可视化器
+    在图像上绘制文本框和识别结果
+    """
+
+    def __init__(self, config: VisualizeConfig):
+        """
+        初始化可视化器
+
+        Args:
+            config: 可视化配置
+        """
+        self._config = config
+        self._font = cv2.FONT_HERSHEY_SIMPLEX
+
+        # 尝试加载中文字体
+        self._pil_font = None
+        self._use_pil = False
+
+        # 确定字体路径：优先使用配置的路径，否则自动检测系统字体
+        font_path = config.font_path or _find_system_chinese_font()
+
+        if font_path:
+            try:
+                from PIL import ImageFont
+                self._pil_font = ImageFont.truetype(font_path, 20)
+                self._use_pil = True
+            except Exception:
+                # 字体加载失败，使用 OpenCV 默认字体
+                pass
+
+    def draw_text_blocks(
+        self,
+        frame: np.ndarray,
+        text_blocks: List[TextBlock],
+        copy: bool = True
+    ) -> np.ndarray:
+        """
+        在帧上绘制文本块
+
+        Args:
+            frame: 输入帧
+            text_blocks: 文本块列表
+            copy: 是否复制帧（避免修改原帧）
+
+        Returns:
+            绘制后的帧
+        """
+        if copy:
+            frame = frame.copy()
+
+        for block in text_blocks:
+            self._draw_single_block(frame, block)
+
+        return frame
+
+    def draw_result(
+        self,
+        frame: np.ndarray,
+        result: Optional[OCRResult],
+        copy: bool = True
+    ) -> np.ndarray:
+        """
+        在帧上绘制 OCR 结果
+
+        Args:
+            frame: 输入帧
+            result: OCR 结果
+            copy: 是否复制帧
+
+        Returns:
+            绘制后的帧
+        """
+        if copy:
+            frame = frame.copy()
+
+        if result is None:
+            return frame
+
+        # 绘制 ROI 区域（如果启用）
+        if result.roi_applied and result.roi_rect:
+            self._draw_roi(frame, result.roi_rect)
+
+        # 绘制所有文本块
+        for block in result.text_blocks:
+            self._draw_single_block(frame, block)
+
+        # 绘制状态信息
+        self._draw_status(frame, result)
+
+        return frame
+
+    def _draw_single_block(
+        self,
+        frame: np.ndarray,
+        block: TextBlock
+    ) -> None:
+        """
+        绘制单个文本块
+
+        Args:
+            frame: 帧
+            block: 文本块
+        """
+        # 获取带偏移的边界框坐标
+        bbox = block.bbox_with_offset
+        points = np.array(bbox, dtype=np.int32)
+
+        # 绘制多边形边框
+        cv2.polylines(
+            frame,
+            [points],
+            isClosed=True,
+            color=self._config.box_color,
+            thickness=self._config.box_thickness
+        )
+
+        # 准备显示文本
+        display_text = block.text
+        if self._config.show_confidence:
+            display_text = f"{block.text} ({block.confidence:.2f})"
+
+        # 计算文本位置（在边界框左上角上方）
+        text_x = int(min(p[0] for p in bbox))
+        text_y = int(min(p[1] for p in bbox)) - 5
+
+        # 确保文本不超出画面
+        text_y = max(text_y, 20)
+
+        # 绘制文本
+        if self._use_pil and self._pil_font:
+            self._draw_text_pil(frame, display_text, (text_x, text_y))
+        else:
+            self._draw_text_cv2(frame, display_text, (text_x, text_y))
+
+    def _draw_text_cv2(
+        self,
+        frame: np.ndarray,
+        text: str,
+        position: Tuple[int, int]
+    ) -> None:
+        """
+        使用 OpenCV 绘制文本（不支持中文，会显示方块）
+
+        Args:
+            frame: 帧
+            text: 文本
+            position: 位置 (x, y)
+        """
+        # 绘制文本背景（提高可读性）
+        (text_width, text_height), baseline = cv2.getTextSize(
+            text,
+            self._font,
+            self._config.text_scale,
+            self._config.text_thickness
+        )
+
+        x, y = position
+        cv2.rectangle(
+            frame,
+            (x, y - text_height - 5),
+            (x + text_width + 5, y + 5),
+            (255, 255, 255),
+            -1
+        )
+
+        # 绘制文本
+        cv2.putText(
+            frame,
+            text,
+            position,
+            self._font,
+            self._config.text_scale,
+            self._config.text_color,
+            self._config.text_thickness,
+            cv2.LINE_AA
+        )
+
+    def _draw_text_pil(
+        self,
+        frame: np.ndarray,
+        text: str,
+        position: Tuple[int, int]
+    ) -> None:
+        """
+        使用 PIL 绘制文本（支持中文）
+
+        Args:
+            frame: 帧
+            text: 文本
+            position: 位置 (x, y)
+        """
+        from PIL import Image, ImageDraw
+
+        # OpenCV 图像转 PIL
+        pil_image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
+        draw = ImageDraw.Draw(pil_image)
+
+        # 获取文本尺寸
+        bbox = draw.textbbox(position, text, font=self._pil_font)
+        text_width = bbox[2] - bbox[0]
+        text_height = bbox[3] - bbox[1]
+
+        x, y = position
+
+        # 绘制背景
+        draw.rectangle(
+            [x - 2, y - text_height - 2, x + text_width + 2, y + 2],
+            fill=(255, 255, 255)
+        )
+
+        # 绘制文本
+        text_color_rgb = (
+            self._config.text_color[2],
+            self._config.text_color[1],
+            self._config.text_color[0]
+        )
+        draw.text(position, text, font=self._pil_font, fill=text_color_rgb)
+
+        # PIL 图像转回 OpenCV
+        result = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)
+        np.copyto(frame, result)
+
+    def _draw_roi(
+        self,
+        frame: np.ndarray,
+        roi_rect: Tuple[int, int, int, int]
+    ) -> None:
+        """
+        绘制 ROI 区域
+
+        Args:
+            frame: 帧
+            roi_rect: ROI 矩形 (x, y, width, height)
+        """
+        x, y, w, h = roi_rect
+        cv2.rectangle(
+            frame,
+            (x, y),
+            (x + w, y + h),
+            (255, 255, 0),  # 青色
+            2,
+            cv2.LINE_AA
+        )
+
+    def _draw_status(
+        self,
+        frame: np.ndarray,
+        result: OCRResult
+    ) -> None:
+        """
+        绘制状态信息
+
+        Args:
+            frame: 帧
+            result: OCR 结果
+        """
+        h, w = frame.shape[:2]
+
+        # 状态文本
+        status_lines = [
+            f"Image: {result.image_index}",
+            f"Texts: {result.text_count}",
+            f"Time: {result.processing_time_ms:.1f}ms"
+        ]
+
+        y_offset = 25
+        for line in status_lines:
+            cv2.putText(
+                frame,
+                line,
+                (10, y_offset),
+                self._font,
+                0.5,
+                (0, 255, 0),
+                1,
+                cv2.LINE_AA
+            )
+            y_offset += 20
+
+    def show(
+        self,
+        frame: np.ndarray,
+        wait_key: int = 1
+    ) -> int:
+        """
+        显示帧并等待按键
+
+        Args:
+            frame: 帧
+            wait_key: 等待时间（毫秒）
+
+        Returns:
+            按下的键码
+        """
+        if not self._config.show_window:
+            return -1
+
+        cv2.imshow(self._config.window_name, frame)
+        return cv2.waitKey(wait_key)
+
+    def close(self) -> None:
+        """关闭所有窗口"""
+        cv2.destroyAllWindows()
+
+    @property
+    def config(self) -> VisualizeConfig:
+        """获取配置"""
+        return self._config