|
|
|
@ -0,0 +1,605 @@
|
|
|
|
|
|
|
|
# Vision-OCR: 图片 OCR 识别系统
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
基于 PaddleOCR 的图片 OCR 识别系统,支持单张图片、批量图片和目录扫描,提供文本检测、识别、方向分类,输出结构化识别结果。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 功能特性
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- **多输入模式**: 单张图片、多张图片列表、目录批量扫描
|
|
|
|
|
|
|
|
- **完整 OCR 能力**: 文本检测 + 文本识别 + 方向分类
|
|
|
|
|
|
|
|
- **结构化输出**: 文字内容、置信度、位置信息(4 点坐标)
|
|
|
|
|
|
|
|
- **快递单解析**: 自动合并分散文本块,提取运单号、收寄件人等结构化信息
|
|
|
|
|
|
|
|
- **可视化展示**: 在图片上绘制文本框和识别结果
|
|
|
|
|
|
|
|
- **结果导出**: 支持 JSON 结果导出和标注图片保存
|
|
|
|
|
|
|
|
- **ROI 裁剪**: 支持只识别图片指定区域
|
|
|
|
|
|
|
|
- **模块化设计**: 图片加载与 OCR 逻辑完全解耦,便于扩展
|
|
|
|
|
|
|
|
- **全本地运行**: 不依赖任何云服务
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 项目结构
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
vision-ocr/
|
|
|
|
|
|
|
|
├── input/ # 图片输入模块
|
|
|
|
|
|
|
|
│ ├── __init__.py
|
|
|
|
|
|
|
|
│ └── loader.py # 图片加载器
|
|
|
|
|
|
|
|
├── ocr/ # OCR 处理模块
|
|
|
|
|
|
|
|
│ ├── __init__.py
|
|
|
|
|
|
|
|
│ ├── engine.py # PaddleOCR 引擎封装
|
|
|
|
|
|
|
|
│ ├── pipeline.py # OCR 处理管道
|
|
|
|
|
|
|
|
│ └── express_parser.py # 快递单解析器
|
|
|
|
|
|
|
|
├── visualize/ # 可视化模块
|
|
|
|
|
|
|
|
│ ├── __init__.py
|
|
|
|
|
|
|
|
│ └── draw.py # 结果绘制器
|
|
|
|
|
|
|
|
├── utils/ # 工具模块
|
|
|
|
|
|
|
|
│ ├── __init__.py
|
|
|
|
|
|
|
|
│ └── config.py # 配置管理
|
|
|
|
|
|
|
|
├── models/ # 模型文件目录(运行 download_models.py 后生成)
|
|
|
|
|
|
|
|
├── main.py # 主入口
|
|
|
|
|
|
|
|
├── download_models.py # 模型下载脚本
|
|
|
|
|
|
|
|
├── requirements.txt # 依赖清单
|
|
|
|
|
|
|
|
└── README.md
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 环境要求
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Python 3.9+
|
|
|
|
|
|
|
|
- 支持的操作系统: Windows / Linux / macOS
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 安装
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 1. 克隆项目
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
git clone <repository-url>
|
|
|
|
|
|
|
|
cd vision-ocr
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 2. 创建虚拟环境(推荐)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
python -m venv venv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Windows
|
|
|
|
|
|
|
|
venv\Scripts\activate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Linux/macOS
|
|
|
|
|
|
|
|
source venv/bin/activate
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 3. 安装依赖
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
pip install -r requirements.txt
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 4. 模型说明
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
本项目已内置 PaddleOCR 模型文件(位于 `models/` 目录),clone 后即可直接使用,无需额外下载。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
> **备用方案**:如果模型文件缺失或需要更新,可运行 `python download_models.py` 重新下载。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### 模型详情
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
本项目使用 PaddleOCR 的 PP-OCRv4 系列模型,包含 3 个模型协同工作:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 模型类型 | 模型名称 | 作用 | 大小 |
|
|
|
|
|
|
|
|
|---------|---------|------|------|
|
|
|
|
|
|
|
|
| **det (检测模型)** | ch_PP-OCRv4_det_infer | 定位图像中所有文本区域的位置,输出每个文本块的 4 点边界框坐标 | ~4.7MB |
|
|
|
|
|
|
|
|
| **rec (识别模型)** | ch_PP-OCRv4_rec_infer | 将检测到的文本区域图像转换为实际文字内容,输出文本和置信度 | ~10MB |
|
|
|
|
|
|
|
|
| **cls (方向分类模型)** | ch_ppocr_mobile_v2.0_cls_infer | 判断文本是正向(0度)还是倒置(180度),用于矫正倒置文本后再识别 | ~1.4MB |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**OCR 处理流程:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
输入图像 -> [det 检测] -> 文本区域 -> [cls 分类] -> 方向矫正 -> [rec 识别] -> 文字结果
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**模型下载地址:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 检测模型: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar
|
|
|
|
|
|
|
|
- 识别模型: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar
|
|
|
|
|
|
|
|
- 方向分类模型: https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**注意:** 可通过 `--no-angle-cls` 参数禁用方向分类模型,适用于文本方向固定的场景,可略微提升处理速度。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 5. GPU 加速(可选)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
如需使用 GPU 加速,请安装对应 CUDA 版本的 PaddlePaddle:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
# CUDA 11.8
|
|
|
|
|
|
|
|
pip install paddlepaddle-gpu==2.5.2.post118 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# CUDA 12.0
|
|
|
|
|
|
|
|
pip install paddlepaddle-gpu==2.5.2.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 使用方法
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 基础用法
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
# 识别单张图片
|
|
|
|
|
|
|
|
python main.py --image path/to/image.jpg
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 识别目录中的所有图片
|
|
|
|
|
|
|
|
python main.py --dir path/to/images/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 识别目录中的特定格式图片
|
|
|
|
|
|
|
|
python main.py --dir path/to/images/ --pattern "*.png"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 递归搜索子目录
|
|
|
|
|
|
|
|
python main.py --dir path/to/images/ --recursive
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 高级选项
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
# 启用 GPU 加速
|
|
|
|
|
|
|
|
python main.py --image test.jpg --gpu
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 启用 ROI 区域裁剪(只识别画面中央 60% 区域)
|
|
|
|
|
|
|
|
python main.py --image test.jpg --roi 0.2 0.2 0.6 0.6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 调整置信度阈值(过滤低置信度结果)
|
|
|
|
|
|
|
|
python main.py --image test.jpg --drop-score 0.7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 切换识别语言
|
|
|
|
|
|
|
|
python main.py --image test.jpg --lang en # 英文
|
|
|
|
|
|
|
|
python main.py --image test.jpg --lang ch # 中文(默认)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 禁用方向分类(轻微提升速度)
|
|
|
|
|
|
|
|
python main.py --image test.jpg --no-angle-cls
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 显示可视化窗口
|
|
|
|
|
|
|
|
python main.py --image test.jpg --show-window
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 保存标注后的图片
|
|
|
|
|
|
|
|
python main.py --image test.jpg --save-image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 指定输出目录
|
|
|
|
|
|
|
|
python main.py --dir images/ --output-dir results/
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 完整参数列表
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 参数 | 简写 | 说明 | 默认值 |
|
|
|
|
|
|
|
|
|------|------|------|--------|
|
|
|
|
|
|
|
|
| `--image` | `-i` | 单张图片路径 | - |
|
|
|
|
|
|
|
|
| `--dir` | `-d` | 图片目录路径 | - |
|
|
|
|
|
|
|
|
| `--pattern` | `-p` | 文件匹配模式 | - |
|
|
|
|
|
|
|
|
| `--recursive` | `-r` | 递归搜索子目录 | False |
|
|
|
|
|
|
|
|
| `--lang` | `-l` | OCR 语言 | ch |
|
|
|
|
|
|
|
|
| `--gpu` | - | 启用 GPU 加速 | False |
|
|
|
|
|
|
|
|
| `--no-angle-cls` | - | 禁用方向分类 | False |
|
|
|
|
|
|
|
|
| `--drop-score` | - | 置信度阈值 | 0.5 |
|
|
|
|
|
|
|
|
| `--roi` | - | ROI 区域 (x y w h) | - |
|
|
|
|
|
|
|
|
| `--show-window` | - | 显示可视化窗口 | False |
|
|
|
|
|
|
|
|
| `--no-confidence` | - | 不显示置信度 | False |
|
|
|
|
|
|
|
|
| `--output-dir` | `-o` | 输出目录路径 | - |
|
|
|
|
|
|
|
|
| `--save-image` | - | 保存标注后的图片 | False |
|
|
|
|
|
|
|
|
| `--no-json` | - | 不保存 JSON 结果 | False |
|
|
|
|
|
|
|
|
| `--json-filename` | - | JSON 结果文件名 | ocr_result.json |
|
|
|
|
|
|
|
|
| `--express` | `-e` | 启用快递单解析模式 | False |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 运行时快捷键
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 按键 | 功能 |
|
|
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
|
|
| `q` | 退出程序 |
|
|
|
|
|
|
|
|
| 任意键 | 处理下一张图片 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 结果导出
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
程序处理完成后会自动将所有识别结果导出到 JSON 文件:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 默认输出文件:`ocr_result.json`
|
|
|
|
|
|
|
|
- 可通过 `--json-filename` 参数指定文件名
|
|
|
|
|
|
|
|
- 可通过 `--output-dir` 参数指定输出目录
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**汇总 JSON 格式:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"total_images": 10,
|
|
|
|
|
|
|
|
"total_text_blocks": 45,
|
|
|
|
|
|
|
|
"results": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"image_index": 1,
|
|
|
|
|
|
|
|
"image_path": "path/to/image.jpg",
|
|
|
|
|
|
|
|
"timestamp": 1704355200.123,
|
|
|
|
|
|
|
|
"processing_time_ms": 45.2,
|
|
|
|
|
|
|
|
"text_count": 3,
|
|
|
|
|
|
|
|
"average_confidence": 0.92,
|
|
|
|
|
|
|
|
"roi_applied": false,
|
|
|
|
|
|
|
|
"roi_rect": null,
|
|
|
|
|
|
|
|
"text_blocks": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "识别的文本",
|
|
|
|
|
|
|
|
"confidence": 0.95,
|
|
|
|
|
|
|
|
"bbox": [[100, 50], [200, 50], [200, 80], [100, 80]],
|
|
|
|
|
|
|
|
"bbox_with_offset": [[100, 50], [200, 50], [200, 80], [100, 80]],
|
|
|
|
|
|
|
|
"center": [150, 65],
|
|
|
|
|
|
|
|
"width": 100,
|
|
|
|
|
|
|
|
"height": 30
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### JSON 字段说明
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**顶层字段**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 字段 | 类型 | 说明 |
|
|
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
|
|
| `total_images` | int | 处理的图片总数 |
|
|
|
|
|
|
|
|
| `total_text_blocks` | int | 所有图片识别出的文本块总数 |
|
|
|
|
|
|
|
|
| `results` | array | 每张图片的识别结果数组 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**单张图片结果字段**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 字段 | 类型 | 说明 |
|
|
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
|
|
| `image_index` | int | 图片索引,从 1 开始递增 |
|
|
|
|
|
|
|
|
| `image_path` | string | 图片文件的完整路径 |
|
|
|
|
|
|
|
|
| `timestamp` | float | 处理完成时的 Unix 时间戳(秒) |
|
|
|
|
|
|
|
|
| `processing_time_ms` | float | OCR 处理耗时(毫秒) |
|
|
|
|
|
|
|
|
| `text_count` | int | 该图片识别出的文本块数量 |
|
|
|
|
|
|
|
|
| `average_confidence` | float | 所有文本块的平均置信度 (0.0 ~ 1.0) |
|
|
|
|
|
|
|
|
| `roi_applied` | bool | 是否应用了 ROI 区域裁剪 |
|
|
|
|
|
|
|
|
| `roi_rect` | array\|null | ROI 矩形区域 `[x, y, width, height]`,未应用时为 `null` |
|
|
|
|
|
|
|
|
| `text_blocks` | array | 识别出的文本块数组 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**文本块 (text_blocks) 字段**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 字段 | 类型 | 说明 |
|
|
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
|
|
| `text` | string | 识别出的文本内容 |
|
|
|
|
|
|
|
|
| `confidence` | float | 识别置信度 (0.0 ~ 1.0),越高表示识别结果越可靠 |
|
|
|
|
|
|
|
|
| `bbox` | array | 文本边界框的 4 个顶点坐标 `[[x1,y1], [x2,y2], [x3,y3], [x4,y4]]`,顺序为左上、右上、右下、左下。如果启用了 ROI,坐标相对于 ROI 区域 |
|
|
|
|
|
|
|
|
| `bbox_with_offset` | array | 带偏移的边界框坐标,已还原到原图坐标系。格式同 `bbox` |
|
|
|
|
|
|
|
|
| `center` | array | 文本块中心点坐标 `[cx, cy]` |
|
|
|
|
|
|
|
|
| `width` | float | 文本块宽度(像素) |
|
|
|
|
|
|
|
|
| `height` | float | 文本块高度(像素) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 快递单解析模式
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
使用 `--express` 参数启用快递单解析模式,系统会自动:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. **合并分散文本块**: 基于位置信息将同一行的文本块合并
|
|
|
|
|
|
|
|
2. **提取结构化信息**: 运单号、快递公司、收/寄件人姓名、电话、地址
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 使用方式
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
# 单张快递单图片
|
|
|
|
|
|
|
|
python main.py --image express.jpg --express
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 批量处理快递单图片
|
|
|
|
|
|
|
|
python main.py --dir express_images/ --express --output-dir results/
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 输出格式
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
快递单模式下的 JSON 输出格式:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"total_images": 5,
|
|
|
|
|
|
|
|
"total_text_blocks": 50,
|
|
|
|
|
|
|
|
"results": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"image_index": 1,
|
|
|
|
|
|
|
|
"image_path": "express.jpg",
|
|
|
|
|
|
|
|
"processing_time_ms": 45.2,
|
|
|
|
|
|
|
|
"express_info": {
|
|
|
|
|
|
|
|
"tracking_number": "SF1234567890",
|
|
|
|
|
|
|
|
"sender": {
|
|
|
|
|
|
|
|
"name": "张三",
|
|
|
|
|
|
|
|
"phone": "13800138000",
|
|
|
|
|
|
|
|
"address": "北京市朝阳区xxx路"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"receiver": {
|
|
|
|
|
|
|
|
"name": "李四",
|
|
|
|
|
|
|
|
"phone": "13900139000",
|
|
|
|
|
|
|
|
"address": "上海市浦东新区xxx路"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"courier_company": "顺丰速运",
|
|
|
|
|
|
|
|
"confidence": 0.95,
|
|
|
|
|
|
|
|
"extra_fields": {},
|
|
|
|
|
|
|
|
"raw_text": "顺丰速运\n运单号:SF1234567890\n..."
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"merged_text": "顺丰速运\n运单号:SF1234567890\n收件人:李四 13900139000\n..."
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### 快递单模式 JSON 字段说明
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**单张图片结果字段(快递单模式)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 字段 | 类型 | 说明 |
|
|
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
|
|
| `image_index` | int | 图片索引,从 1 开始递增 |
|
|
|
|
|
|
|
|
| `image_path` | string | 图片文件的完整路径 |
|
|
|
|
|
|
|
|
| `processing_time_ms` | float | OCR 处理耗时(毫秒) |
|
|
|
|
|
|
|
|
| `express_info` | object | 解析出的快递单结构化信息 |
|
|
|
|
|
|
|
|
| `merged_text` | string | 基于位置信息智能合并后的完整文本,同一行的文本块会被合并 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**快递单信息 (express_info) 字段**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 字段 | 类型 | 说明 |
|
|
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
|
|
| `tracking_number` | string\|null | 运单号/快递单号 |
|
|
|
|
|
|
|
|
| `sender` | object | 寄件人信息 |
|
|
|
|
|
|
|
|
| `sender.name` | string\|null | 寄件人姓名 |
|
|
|
|
|
|
|
|
| `sender.phone` | string\|null | 寄件人电话(11位手机号) |
|
|
|
|
|
|
|
|
| `sender.address` | string\|null | 寄件人地址 |
|
|
|
|
|
|
|
|
| `receiver` | object | 收件人信息 |
|
|
|
|
|
|
|
|
| `receiver.name` | string\|null | 收件人姓名 |
|
|
|
|
|
|
|
|
| `receiver.phone` | string\|null | 收件人电话(11位手机号) |
|
|
|
|
|
|
|
|
| `receiver.address` | string\|null | 收件人地址 |
|
|
|
|
|
|
|
|
| `courier_company` | string\|null | 快递公司名称(如:顺丰速运、圆通速递等) |
|
|
|
|
|
|
|
|
| `confidence` | float | 所有文本块的平均置信度 (0.0 ~ 1.0) |
|
|
|
|
|
|
|
|
| `extra_fields` | object | 其他识别到的额外字段(键值对形式) |
|
|
|
|
|
|
|
|
| `raw_text` | string | 原始合并文本,用于调试和验证 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 支持的快递公司
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
顺丰、圆通、中通、韵达、申通、极兔、京东、邮政、EMS、百世、德邦、天天、宅急送
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 编程接口
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
from ocr import OCRPipeline, ExpressParser
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 使用 OCRResult 的 parse_express() 方法
|
|
|
|
|
|
|
|
result = pipeline.process(image)
|
|
|
|
|
|
|
|
if result and result.text_count > 0:
|
|
|
|
|
|
|
|
# 解析快递单信息
|
|
|
|
|
|
|
|
express_info = result.parse_express()
|
|
|
|
|
|
|
|
print(f"运单号: {express_info.tracking_number}")
|
|
|
|
|
|
|
|
print(f"收件人: {express_info.receiver_name}")
|
|
|
|
|
|
|
|
print(f"收件电话: {express_info.receiver_phone}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 获取合并后的完整文本
|
|
|
|
|
|
|
|
merged_text = result.merge_text()
|
|
|
|
|
|
|
|
print(f"合并文本: {merged_text}")
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 编程接口
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 作为模块使用
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
from input import ImageLoader
|
|
|
|
|
|
|
|
from ocr import OCRPipeline
|
|
|
|
|
|
|
|
from visualize import OCRVisualizer
|
|
|
|
|
|
|
|
from utils import Config, InputConfig, InputMode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 创建配置
|
|
|
|
|
|
|
|
config = Config.for_single_image("path/to/image.jpg")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 创建组件
|
|
|
|
|
|
|
|
loader = ImageLoader()
|
|
|
|
|
|
|
|
pipeline = OCRPipeline(config.ocr, config.pipeline)
|
|
|
|
|
|
|
|
visualizer = OCRVisualizer(config.visualize)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 初始化
|
|
|
|
|
|
|
|
pipeline.initialize()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 加载并处理图片
|
|
|
|
|
|
|
|
image_info = loader.load("path/to/image.jpg")
|
|
|
|
|
|
|
|
if image_info:
|
|
|
|
|
|
|
|
result = pipeline.process(image_info.image, image_info.path)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
if result and result.text_count > 0:
|
|
|
|
|
|
|
|
# 获取识别结果
|
|
|
|
|
|
|
|
for block in result.text_blocks:
|
|
|
|
|
|
|
|
print(f"文本: {block.text}")
|
|
|
|
|
|
|
|
print(f"置信度: {block.confidence}")
|
|
|
|
|
|
|
|
print(f"位置: {block.bbox}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 导出为 JSON
|
|
|
|
|
|
|
|
json_data = result.to_dict()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 可视化
|
|
|
|
|
|
|
|
display_image = visualizer.draw_result(image_info.image, result)
|
|
|
|
|
|
|
|
visualizer.show(display_image, wait_key=0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 清理资源
|
|
|
|
|
|
|
|
visualizer.close()
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 批量处理
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
from input import ImageLoader
|
|
|
|
|
|
|
|
from ocr import OCRPipeline
|
|
|
|
|
|
|
|
from utils import Config
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 创建配置
|
|
|
|
|
|
|
|
config = Config.for_directory("path/to/images/", pattern="*.jpg")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 创建组件
|
|
|
|
|
|
|
|
loader = ImageLoader()
|
|
|
|
|
|
|
|
pipeline = OCRPipeline(config.ocr)
|
|
|
|
|
|
|
|
pipeline.initialize()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# 批量处理
|
|
|
|
|
|
|
|
for image_info in loader.load_directory("path/to/images/"):
|
|
|
|
|
|
|
|
result = pipeline.process(image_info.image, image_info.path)
|
|
|
|
|
|
|
|
print(f"{image_info.filename}: 识别到 {result.text_count} 个文本块")
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### OCRResult 数据结构
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"image_index": 1,
|
|
|
|
|
|
|
|
"image_path": "path/to/image.jpg",
|
|
|
|
|
|
|
|
"timestamp": 1704355200.123,
|
|
|
|
|
|
|
|
"processing_time_ms": 45.6,
|
|
|
|
|
|
|
|
"text_count": 3,
|
|
|
|
|
|
|
|
"average_confidence": 0.92,
|
|
|
|
|
|
|
|
"roi_applied": False,
|
|
|
|
|
|
|
|
"roi_rect": None,
|
|
|
|
|
|
|
|
"text_blocks": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"text": "识别的文本",
|
|
|
|
|
|
|
|
"confidence": 0.95,
|
|
|
|
|
|
|
|
"bbox": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
|
|
|
|
|
|
|
|
"bbox_with_offset": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
|
|
|
|
|
|
|
|
"center": [cx, cy],
|
|
|
|
|
|
|
|
"width": 120.0,
|
|
|
|
|
|
|
|
"height": 30.0
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 模块说明
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### input/loader.py - 图片加载模块
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
提供图片加载功能,支持单张、批量和目录加载。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `ImageLoader`: 图片加载器类
|
|
|
|
|
|
|
|
- `ImageInfo`: 图片信息数据结构
|
|
|
|
|
|
|
|
- `load_image()`: 便捷函数,加载单张图片
|
|
|
|
|
|
|
|
- `load_images()`: 便捷函数,批量加载图片
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### ocr/engine.py - OCR 引擎模块
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
封装 PaddleOCR,提供简洁的 OCR 调用接口。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `OCREngine`: OCR 引擎类
|
|
|
|
|
|
|
|
- `TextBlock`: 文本块数据结构
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### ocr/pipeline.py - OCR 处理管道
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
串联图片加载、ROI 裁剪、OCR 识别、结果封装。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `OCRPipeline`: 处理管道类
|
|
|
|
|
|
|
|
- `OCRResult`: OCR 结果数据结构
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### visualize/draw.py - 可视化模块
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
在图像上绘制 OCR 识别结果。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `OCRVisualizer`: 可视化器类
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### utils/config.py - 配置管理模块
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
集中管理所有可配置参数。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `Config`: 全局配置聚合类
|
|
|
|
|
|
|
|
- `InputConfig`: 输入配置
|
|
|
|
|
|
|
|
- `OCRConfig`: OCR 引擎配置
|
|
|
|
|
|
|
|
- `PipelineConfig`: 管道配置
|
|
|
|
|
|
|
|
- `VisualizeConfig`: 可视化配置
|
|
|
|
|
|
|
|
- `OutputConfig`: 输出配置
|
|
|
|
|
|
|
|
- `ROIConfig`: ROI 区域配置
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 扩展开发
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 添加图片预处理器
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
import cv2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def denoise_preprocessor(image):
|
|
|
|
|
|
|
|
"""降噪预处理"""
|
|
|
|
|
|
|
|
return cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 7, 21)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
pipeline.add_preprocessor(denoise_preprocessor)
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 添加结果后处理器
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
def filter_short_text(result):
|
|
|
|
|
|
|
|
"""过滤短文本"""
|
|
|
|
|
|
|
|
result.text_blocks = [
|
|
|
|
|
|
|
|
block for block in result.text_blocks
|
|
|
|
|
|
|
|
if len(block.text) >= 3
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
return result
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
pipeline.add_postprocessor(filter_short_text)
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 性能优化建议
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. **启用 ROI**: 使用 `--roi` 参数只处理感兴趣区域
|
|
|
|
|
|
|
|
2. **使用 GPU**: 使用 `--gpu` 参数启用 GPU 加速
|
|
|
|
|
|
|
|
3. **禁用方向分类**: 如果文本方向固定,使用 `--no-angle-cls`
|
|
|
|
|
|
|
|
4. **提高置信度阈值**: 使用 `--drop-score` 过滤低质量结果
|
|
|
|
|
|
|
|
5. **批量处理**: 使用目录模式批量处理多张图片
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 常见问题
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Q: Windows 中文用户名导致模型加载失败?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A: PaddlePaddle 的 C++ 推理引擎无法正确处理包含中文字符的路径。请运行以下命令将模型下载到项目目录:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
|
|
python download_models.py
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
程序会自动检测并使用 `models/` 目录中的模型。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 中文无法正常显示?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A: OpenCV 默认字体不支持中文。可以在 `VisualizeConfig` 中配置 `font_path` 指向系统中文字体文件:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
|
|
config.visualize.font_path = "C:/Windows/Fonts/simhei.ttf" # Windows
|
|
|
|
|
|
|
|
config.visualize.font_path = "/usr/share/fonts/truetype/wqy/wqy-microhei.ttc" # Linux
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Q: OCR 速度慢?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A: 参考上方「性能优化建议」部分。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Q: 支持哪些图片格式?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A: 支持以下格式:`.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.tif`, `.webp`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 贡献指南
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
欢迎提交 Issue 和 Pull Request。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 开发流程
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Fork 本仓库
|
|
|
|
|
|
|
|
2. 创建功能分支: `git checkout -b feature/your-feature`
|
|
|
|
|
|
|
|
3. 提交更改: `git commit -m "Add your feature"`
|
|
|
|
|
|
|
|
4. 推送分支: `git push origin feature/your-feature`
|
|
|
|
|
|
|
|
5. 创建 Pull Request
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 代码规范
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 遵循 PEP 8 代码风格
|
|
|
|
|
|
|
|
- 所有公共类和函数需要添加文档字符串
|
|
|
|
|
|
|
|
- 新功能需要添加相应的类型注解
|
|
|
|
|
|
|
|
- 提交前确保代码可正常运行
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### 目录结构规范
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- `input/`: 仅包含图片加载相关代码
|
|
|
|
|
|
|
|
- `ocr/`: 仅包含 OCR 处理相关代码
|
|
|
|
|
|
|
|
- `visualize/`: 仅包含可视化相关代码
|
|
|
|
|
|
|
|
- `utils/`: 通用工具和配置
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 许可证
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MIT License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## 致谢
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) - 强大的 OCR 工具库
|
|
|
|
|
|
|
|
- [OpenCV](https://opencv.org/) - 计算机视觉库
|