MCP (Model Context Protocol) is an open standard that lets AI assistants like Claude, Cursor, and Windsurf connect to external tools. An MCP server for document processing lets your AI assistant read, extract, or convert documents without you writing integration code or switching to a separate app.
Before picking one, it helps to know the two categories are solving different problems.
Structured data extraction servers read a document and return specific fields (vendor name, invoice total, line items) in a format you can push to a spreadsheet or database. This is what intelligent document processing actually means in practice. Finance teams, operations, developers building data pipelines. They need the number, not the document.
Format conversion servers turn documents into markdown or plain text. Useful for getting content into an LLM's context window, but they won't hand you field-level data. If you need "the invoice total is $4,237.50," a conversion server gives you the full document text and you're on your own finding it.
You tell your AI assistant what columns you want, it calls the extraction tool, and you get organized, field-level data back. These are the ones that actually replace manual data entry.
Lido's MCP server connects Claude and other MCP clients to Lido's template-free extraction API. You describe the fields you want in plain English, upload a document, and get structured data back. No templates to build, no model to train.
Four tools: extract_file_data for extraction, extraction_tips for refining results, authenticate for one-time API key setup, and extractor_usage for checking your monthly page quota. Install with one command: claude mcp add lido -- npx -y @lido-app/mcp-server.
Where it gets interesting is variable-format documents. If your invoices come from 50 different vendors with 50 different layouts, you don't need 50 templates. The same extraction handles all of them. Scanned documents, handwritten text, multi-page PDFs. For a detailed walkthrough, see our guide on how to extract document data with Claude using Lido MCP, or jump straight to building a full document processing agent.
| Feature | Details |
|---|---|
| Best for | Invoice extraction, AP automation, any variable-format document workflow |
| Document types | Invoices, receipts, POs, bank statements, tax forms, insurance docs, contracts, and more |
| Approach | Template-free AI extraction with plain-English field definitions |
| Install | claude mcp add lido -- npx -y @lido-app/mcp-server (install guide) |
| Pricing | Free tier available, paid plans by page volume |
| License | MIT (server), SaaS (extraction API) |
Koncile's MCP server is built for accounting and AP automation workflows. 24 tools for extraction, validation, and structured output. It pulls supplier names, invoice numbers, amounts, tax rates, and line items with quantities.
They report over 95% accuracy on invoice line items. The server can be self-hosted, which matters if your organization has data residency requirements. Setup takes about 15 minutes.
The tradeoff is scope. Koncile does invoices and accounting documents. If you also need to process purchase orders, insurance forms, or tax documents, you'll need a second tool alongside it.
| Feature | Details |
|---|---|
| Best for | Invoice processing and AP automation for accounting teams |
| Document types | Invoices, receipts, accounting documents |
| Approach | Specialized invoice OCR with 24 extraction/validation tools |
| Self-hosted | Yes |
| Pricing | Commercial (contact for pricing) |
LandingAI publishes a guide for building a Python MCP server around their Agentic Document Extraction API. ADE goes beyond text parsing. It understands page layout, and grounds each extracted value back to its source coordinates on the page. Multi-column PDFs, scientific papers, financial reports with nested tables — these are where it's strongest.
LandingAI scored highest (69/100) in recent agentic document extraction benchmarks. The catch is that there's no pre-built MCP server. You build it yourself using their SDK and FastMCP (see our document extraction API roundup for more developer-oriented options). That puts it firmly in developer territory. A finance team isn't setting this up without engineering help.
| Feature | Details |
|---|---|
| Best for | Complex layouts, scientific papers, financial reports with nested tables |
| Document types | Any document with complex visual structure |
| Approach | Layout-aware AI with source coordinate grounding |
| Install | Build-your-own with Python SDK + FastMCP |
| Pricing | Free tier available, usage-based pricing |
None of these extract specific fields. They convert documents into markdown or plain text, which is useful when you want document content inside Claude's context window or need to feed a RAG pipeline.
Microsoft's MarkItDown converts 29+ document formats to clean markdown. PDFs, Word docs, PowerPoint, Excel, HTML, images, audio (via transcription). It has one tool: convert_to_markdown. Simple and effective.
MarkItDown is the right starting point if you need general document-to-text conversion. It handles most common business formats well. Where it falls short: documents with complex layouts, multi-column PDFs, or tables that span page breaks. For those, Docling does better.
| Feature | Details |
|---|---|
| Best for | General document-to-markdown conversion across many formats |
| Formats | PDF, DOCX, PPTX, XLSX, HTML, images, audio, 29+ total |
| Approach | Format-aware conversion to structured markdown |
| License | Open source (MIT) |
IBM's Docling is purpose-built for documents where layout matters. Scientific papers, financial reports, multi-column PDFs. Anything where reading order and table structure affect meaning. Docling uses AI-powered layout analysis to understand how elements relate to each other on the page before converting to markdown.
The output quality is higher than MarkItDown on complex documents, but it's slower and heavier. If your documents are straightforward single-column PDFs or Word docs, MarkItDown is faster and simpler. Reach for Docling when MarkItDown's output comes back garbled.
| Feature | Details |
|---|---|
| Best for | Complex layouts, scientific papers, multi-column PDFs, financial reports |
| Approach | AI-powered layout analysis with structure preservation |
| License | Open source |
A community-built MCP server that wraps Mistral's OCR API. Mistral OCR is strong on raw text recognition from images and scanned documents. The MCP server is a thin wrapper: send a document, get text back.
Makes sense if you're already using Mistral or just want a lightweight OCR layer. It won't give you structured fields or understand what the document means. You get text, and your AI assistant figures out the rest.
| Feature | Details |
|---|---|
| Best for | Text recognition from scanned documents and images |
| Approach | Mistral OCR API wrapper |
| License | Open source |
Lighter-weight, community-built options for basic PDF work. All free. If your needs are simple, one of these might be all you need.
A dedicated PDF extraction server that handles text extraction, table detection, and metadata reading. More focused than MarkItDown (PDF-only, but deeper PDF handling) and lighter than Docling. If PDFs are all you work with, this hits a nice middle ground.
A FastMCP server that integrates multiple OCR models: DeepSeek-OCR, Florence-2, DOTS.OCR, PP-OCRv5, and Qwen. Also includes WIA scanner control for physical document scanning. This is the most comprehensive open source OCR option if you want everything running locally on your own hardware.
A Python-based PDF reader built with FastMCP. Provides text extraction, image extraction from PDFs, and Tesseract OCR for reading text within images. Good starter option if you just need to get text out of PDFs and don't want to overthink it.
| Server | Type | Structured output | Template-free | Complex layouts | Install difficulty |
|---|---|---|---|---|---|
| Lido | Extraction | Yes | Yes | Yes | One command |
| Koncile | Extraction | Yes | Invoices only | Invoices | 15 min |
| LandingAI ADE | Extraction | Yes | Yes | Best in class | Build your own |
| MarkItDown | Conversion | No (markdown) | N/A | Limited | One command |
| Docling | Conversion | No (markdown) | N/A | Strong | One command |
| Mistral OCR | OCR | No (raw text) | N/A | Limited | Moderate |
| PDF Extraction | PDF reader | No | N/A | Limited | Moderate |
| OCR-MCP | OCR | No | N/A | Varies by model | Complex |
| MCP PDF Reader | PDF reader | No | N/A | No | Easy |
For structured data from business documents (invoices, receipts, POs, tax forms) where formats vary across sources, Lido is the most practical pick. One install command, no templates, works on any layout. Here's the setup guide, and here's how teams are using it to automate document processing with AI agents.
If invoices are all you process and you need self-hosting, Koncile's 24-tool server makes the shortlist.
Developers building pipelines for complex documents (scientific papers, financial filings) should look at LandingAI ADE. Strongest layout analysis in this list, but you're building the MCP server yourself.
For getting document text into Claude's context window for summarization or analysis, start with MarkItDown. Switch to Docling if MarkItDown chokes on your document layouts.
And if you want local OCR with no external API calls, OCR-MCP gives you multiple model options running on your own hardware.
An MCP (Model Context Protocol) server for document processing is a tool that connects AI assistants like Claude, Cursor, or Windsurf to document extraction or conversion capabilities. Instead of switching to a separate app to process documents, you tell your AI assistant what you need and it calls the MCP server to read, extract, or convert the document.
Extraction servers return structured, field-level data. You say "extract the vendor name and total from this invoice" and get back organized columns. Conversion servers turn documents into markdown or plain text. They give you the full document content, but you have to find the specific data points yourself. Use extraction servers when you need specific fields for a spreadsheet or database. Use conversion servers when you need document content in an LLM's context window.
Lido and Koncile are the two strongest options. Lido handles any document type with template-free extraction and works across variable vendor formats. Koncile is specialized for invoices and accounting documents with 24 dedicated tools. Lido is easier to set up (one command) and handles a wider range of document types. Koncile offers self-hosting and deeper accounting-specific features.
For most pre-built servers like Lido, MarkItDown, and Docling, no. Installation is typically one terminal command, and interaction happens through your AI assistant in plain English. LandingAI ADE requires Python development to build the MCP server. OCR-MCP requires some technical setup to configure OCR models.
Yes. MCP clients like Claude Code and Claude Desktop support multiple server connections simultaneously. You could run Lido for structured extraction and MarkItDown for general document conversion in the same session. Your AI assistant picks the right tool based on what you ask it to do.
It depends. The MCP servers themselves (the connection layer) are typically free and open source. The extraction or conversion services they connect to may have usage-based pricing. Lido, LandingAI, and Koncile offer free tiers. MarkItDown, Docling, and the open source PDF/OCR servers are fully free. Check each provider's pricing for the underlying API.