本文档提供解析服务的 API 接入指南,包含三大核心能力:
📚 想了解更多? 查看 产品白皮书及接入指南,了解产品详细介绍、应用场景和部署方案。
task_idhttps://insightdoc.memect.cn所有接口请求需在 HTTP Header 中携带 API Key:
Authorization: Bearer YOUR_API_KEY
将 PDF 文档转换为多种格式,支持 Word、PPT、Markdown、JSON、HTML。
以下接口参数和响应格式完全相同,仅路径和输出格式不同:
| 接口路径 | 功能 | 输出格式 |
|---|---|---|
POST /api/parse/pdf2word |
PDF 转 Word | .docx |
POST /api/parse/pdf2ppt |
PDF 转 PPT | .pptx |
POST /api/parse/pdf2markdown |
PDF 转 Markdown | .md |
POST /api/parse/pdf2json |
PDF 转 JSON | .json |
POST /api/parse/pdf2html |
PDF 转 HTML | .html |
请求参数:
| 参数名 | 类型 | 必填 | 说明 |
|---|---|---|---|
| file | File | 是 | PDF 文件,最大 50MB |
请求示例:
curl -X POST "https://insightdoc.memect.cn/api/parse/pdf2word" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf"
响应示例:
{
"task_id": "abc123def456",
"status": "pending"
}
接口地址:GET /api/parse/result/{task_id}
curl "https://insightdoc.memect.cn/api/parse/result/abc123def456" \
-H "Authorization: Bearer YOUR_API_KEY"
处理中:
{
"task_id": "abc123def456",
"status": "processing",
"created_at": "2026-03-04T06:00:00"
}
处理完成:
{
"task_id": "abc123def456",
"status": "done",
"created_at": "2026-03-04T06:00:00",
"duration": 12.5,
"download_url": "https://cdn.example.com/files/result.docx?sign=xxx",
"filename": "document.docx"
}
Markdown/HTML/JSON 格式结果中若包含图片,会额外返回 images 字段:
{
"task_id": "abc123def456",
"status": "done",
"duration": 15.2,
"download_url": "https://cdn.example.com/files/result.md?sign=xxx",
"filename": "doc.md",
"images": {
"images/page_0_img_0.png": "https://cdn.example.com/page_0_img_0.png?sign=xxx",
"images/page_0_img_1.png": "https://cdn.example.com/page_0_img_1.png?sign=xxx"
}
}
images中 key 为文档内的相对路径,value 为实际下载链接(有效期 1 小时)。可替换为 CDN URL 在线使用,也可下载到本地后替换为本地路径。
处理失败:
{
"task_id": "abc123def456",
"status": "failed",
"created_at": "2026-03-04T06:00:00",
"error": "文件格式不支持或文件损坏"
}
接口地址:GET /api/parse/download/{task_id}
直接返回文件流,支持中文文件名,仅 status=done 时可用。
curl "https://insightdoc.memect.cn/api/parse/download/abc123def456" \
-H "Authorization: Bearer YOUR_API_KEY" \
-o result.docx
import requests
import time
API_BASE = "https://insightdoc.memect.cn"
API_KEY = "your_api_key_here"
headers = {"Authorization": f"Bearer {API_KEY}"}
# 1. 提交任务
with open("document.pdf", "rb") as f:
response = requests.post(
f"{API_BASE}/api/parse/pdf2word",
headers=headers,
files={"file": f},
timeout=30
)
task_id = response.json()["task_id"]
print(f"任务已提交: {task_id}")
# 2. 轮询结果
while True:
response = requests.get(
f"{API_BASE}/api/parse/result/{task_id}",
headers=headers,
timeout=60
)
result = response.json()
if result["status"] == "done":
print(f"完成,耗时: {result['duration']:.1f}秒")
# 3. 下载文件
file_response = requests.get(result["download_url"], timeout=60)
with open(result["filename"], "wb") as f:
f.write(file_response.content)
print(f"已保存: {result['filename']}")
break
elif result["status"] == "failed":
print(f"失败: {result.get('error')}")
break
time.sleep(3)
自动识别和提取财务报表中的数据,支持资产负债表、利润表、现金流量表等,结果同时提供结构化 JSON 和 Excel 文件。
接口地址:POST /api/finance/extract
请求参数:
| 参数名 | 类型 | 必填 | 说明 |
|---|---|---|---|
| file | File | 是 | PDF 或图片(JPG/PNG),最大 50MB |
curl -X POST "https://insightdoc.memect.cn/api/finance/extract" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@financial_report.pdf"
响应示例:
{
"task_id": "fin789xyz012",
"status": "pending"
}
接口地址:GET /api/finance/result/{task_id}
curl "https://insightdoc.memect.cn/api/finance/result/fin789xyz012" \
-H "Authorization: Bearer YOUR_API_KEY"
处理完成:
{
"task_id": "fin789xyz012",
"status": "done",
"created_at": "2026-03-04T06:00:00",
"duration": 18.3,
"result": [
{
"tables": [
[
["", "附注", "2024年12月31日", "2023年12月31日"],
["流动资产:", "", "", ""],
["货币资金", "六、(一)", "141,579,324.33", "275,237,984.55"]
]
],
"type_index": 0,
"angle": 0
}
],
"download_url": "https://cdn.example.com/files/result.xlsx?sign=xxx",
"filename": "financial_data.xlsx"
}
result 为按页(或按表)组织的数组,每项中 tables 为二维数组,可直接映射到 Excel 行列。
处理失败:
{
"task_id": "fin789xyz012",
"status": "failed",
"created_at": "2026-03-04T06:00:00",
"error": "文件格式不支持或文件损坏"
}
接口地址:GET /api/finance/download/{task_id}
直接返回 Excel 文件流,支持中文文件名,仅 status=done 时可用。
curl "https://insightdoc.memect.cn/api/finance/download/fin789xyz012" \
-H "Authorization: Bearer YOUR_API_KEY" \
-o result.xlsx
import requests
import time
API_BASE = "https://insightdoc.memect.cn"
API_KEY = "your_api_key_here"
headers = {"Authorization": f"Bearer {API_KEY}"}
# 1. 提交任务
with open("financial_report.pdf", "rb") as f:
response = requests.post(
f"{API_BASE}/api/finance/extract",
headers=headers,
files={"file": f},
timeout=30
)
task_id = response.json()["task_id"]
print(f"任务已提交: {task_id}")
# 2. 轮询结果
while True:
response = requests.get(
f"{API_BASE}/api/finance/result/{task_id}",
headers=headers,
timeout=60
)
result = response.json()
if result["status"] == "done":
print(f"完成,耗时: {result['duration']:.1f}秒")
print(f"识别到 {len(result['result'])} 个表格")
# 3. 下载 Excel
dl = requests.get(
f"{API_BASE}/api/finance/download/{task_id}",
headers=headers,
timeout=60
)
with open(f"finance_{task_id}.xlsx", "wb") as f:
f.write(dl.content)
print("Excel 已保存")
break
elif result["status"] == "failed":
print(f"失败: {result.get('error')}")
break
time.sleep(3)
识别增值税发票、普通发票、电子发票等各类票据,结构化输出字段信息。
接口地址:POST /api/invoice/recognize
请求参数:
| 参数名 | 类型 | 必填 | 说明 |
|---|---|---|---|
| file | File | 是 | PDF 或图片(JPG/PNG),最大 10MB |
curl -X POST "https://insightdoc.memect.cn/api/invoice/recognize" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf"
响应示例:
{
"task_id": "inv456abc789",
"status": "pending"
}
接口地址:GET /api/invoice/result/{task_id}
curl "https://insightdoc.memect.cn/api/invoice/result/inv456abc789" \
-H "Authorization: Bearer YOUR_API_KEY"
处理完成:
结果按页组织,每页包含从票据中识别出的结构化字段列表。上传 PDF 时每页对应一张票据。
{
"task_id": "inv456abc789",
"status": "done",
"created_at": "2026-03-04T06:00:00",
"duration": 55.0,
"result": [
{
"page": 1,
"total_tokens": 512,
"fields": [
{"key": "invoice_type", "value": "增值税专用发票"},
{"key": "invoice_code", "value": "044002000211"},
{"key": "invoice_number", "value": "12458796"},
{"key": "invoice_date", "value": "2024-03-15"},
{"key": "buyer_name", "value": "某某科技有限公司"},
{"key": "buyer_tax_id", "value": "91310115XXXXXXXX2G"},
{"key": "seller_name", "value": "某某供应商有限公司"},
{"key": "seller_tax_id", "value": "91310115YYYYYYYY3H"},
{"key": "amount", "value": "10000.00"},
{"key": "tax_rate", "value": "13%"},
{"key": "tax_amount", "value": "1300.00"},
{"key": "total_amount", "value": "11300.00"}
]
}
]
}
处理失败:
{
"task_id": "inv456abc789",
"status": "failed",
"created_at": "2026-03-04T06:00:00",
"error": "文件格式不支持或文件损坏"
}
import requests
import time
API_BASE = "https://insightdoc.memect.cn"
API_KEY = "your_api_key_here"
headers = {"Authorization": f"Bearer {API_KEY}"}
# 1. 提交任务
with open("invoice.pdf", "rb") as f:
response = requests.post(
f"{API_BASE}/api/invoice/recognize",
headers=headers,
files={"file": f},
timeout=30
)
task_id = response.json()["task_id"]
print(f"任务已提交: {task_id}")
# 2. 轮询结果
while True:
response = requests.get(
f"{API_BASE}/api/invoice/result/{task_id}",
headers=headers,
timeout=60
)
result = response.json()
if result["status"] == "done":
print(f"完成,耗时: {result['duration']:.1f}秒")
for page_data in result["result"]:
print(f"\n第 {page_data['page']} 页:")
for field in page_data["fields"]:
print(f" {field['key']}: {field['value']}")
break
elif result["status"] == "failed":
print(f"失败: {result.get('error')}")
break
time.sleep(2)
| 状态 | 说明 | 建议操作 |
|---|---|---|
| pending | 任务已提交,等待处理 | 继续轮询 |
| processing | 正在处理中 | 继续轮询 |
| done | 处理完成 | 获取结果或下载文件 |
| failed | 处理失败 | 查看 error 字段,检查文件格式或重试 |
| 功能 | 支持格式 | 文件大小 | 备注 |
|---|---|---|---|
| 文档解析 | 最大 50MB | 建议 ≤ 200 页 | |
| 财报解析 | PDF, JPG, PNG | 最大 50MB | 建议 ≤ 50 页 |
| 发票识别 | PDF, JPG, PNG | 最大 10MB | PDF 多页时每页识别一张票据 |
| HTTP 状态码 | 说明 | 解决方案 |
|---|---|---|
| 200 | 请求成功 | - |
| 400 | 请求参数错误 | 检查文件格式、大小是否符合要求 |
| 401 | 认证失败 | 检查 API Key 是否正确 |
| 403 | 权限不足 | 联系我们确认账号权限 |
| 404 | 任务不存在 | 检查 task_id 是否正确 |
| 413 | 文件过大 | 压缩文件或分割后重试 |
| 429 | 请求频率超限 | 降低请求频率,稍后重试 |
| 500 | 服务器错误 | 稍后重试或联系技术支持 |
文件格式不支持
{"detail": "Unsupported file type: application/xxx"}
解决:确保上传的是 PDF 文件(文档解析)或 PDF/图片(财报/发票)
认证失败
{"detail": "Could not validate credentials"}
解决:检查 Authorization Header 格式是否正确,确认 API Key 有效
任务处理失败
{"task_id": "xxx", "status": "failed", "error": "请求票据解析异常了"}
解决:检查文件是否完整清晰,重新提交任务;若持续失败请联系技术支持并提供 task_id
访问 API Key 管理页面 创建密钥:
如遇技术问题请提供以下信息: - API Key(前 8 位) - task_id - 错误信息及请求时间
v1.1.0 (2026-03-06) - 修复财报解析 Excel 下载中文文件名编码问题 - 修复发票识别结果未正确返回的问题 - 重新整理文档结构,查询接口合并到各模块中
v1.0.0 (2026-03-04) - 发布文档解析、财报解析、发票识别 API