Blog

How to Match Purchase Orders to Invoices Automatically

February 22, 2026

Matching invoices to purchase orders is one of the most straightforward concepts in accounts payable, and one of the most painful to execute at scale. The idea is simple: when an invoice arrives, compare it to the original PO to confirm you're paying the right amount for the right goods. But when you're processing thousands of invoices a month across hundreds of vendors, each with their own formats, reference numbers, and line item descriptions, the matching process becomes a full-time manual job that never quite gets done.

The reason is rarely the matching logic itself. The problem is twofold: the data feeding the match is inconsistent or extracted incorrectly, and the matching step can't handle the real-world messiness of how vendors reference POs, describe products, and abbreviate names. You need accurate extraction and flexible matching in the same tool — not a pipeline of disconnected systems.

Lido is the best option for teams that need PO-to-invoice matching at scale. It extracts structured data from any invoice or PO format without templates, then runs that data through a built-in fuzzy lookup workflow that matches invoice line items against PO data from your ERP — even when vendor names, product descriptions, and reference numbers don't match exactly.

Lido extracts data from any invoice or PO format without templates, then uses its fuzzy lookup node to match extracted data against reference tables from your ERP. You select the tables to compare, define which fields to match on, set a confidence threshold, and optionally apply AI post-processing to refine results. Mediaform built full invoice-to-PO validation in one week using this workflow, covering 200 POs and 1,000 invoices monthly against Business Central.

Why purchase order to invoice matching breaks down at scale

Automated PO matching fails most often because the data on both sides of the match is messy. A purchase order contains what you ordered. An invoice contains what the vendor says you owe. In theory, these should line up. In practice, they almost never do cleanly.

PO numbers don't always match exactly. Vendors assign their own reference numbers and may truncate, prefix, or reformat yours. Erewhon, a grocery chain processing 20,000 invoices a month across 10 stores on Lido, found that vendor abbreviations for the same store location varied wildly — Santa Monica, Santa Mon, SM — making exact-match lookups fail constantly. They use Lido's fuzzy lookup to match these variations against a reference table of store names, with a confidence threshold tuned low enough to catch abbreviations but high enough to avoid false positives.

Line items are described differently on the PO than on the invoice. Your PO says "Widget A, 500 units at $2.10." The invoice says "WDGT-A 500 @ 2.10" or uses a completely different product code. Price changes between order and invoice are common: negotiated discounts, fuel surcharges, or quantity adjustments that weren't reflected back to the PO. An exact lookup against a product table returns nothing. A fuzzy lookup with AI post-processing can resolve "WDGT-A" to "Widget A" and flag only the genuine price discrepancy.

Partial shipments create one-to-many relationships. A single PO may generate three invoices over six weeks as items ship in batches. Or a single invoice may reference multiple POs from the same vendor. These many-to-many relationships are where spreadsheet VLOOKUPs fall apart — they return the first match, not the right match.

What three-way PO-to-invoice matching requires and why most tools miss it

Three-way matching reconciles three documents: the purchase order (what you ordered), the invoice (what the vendor says you owe), and the receiving report or delivery confirmation (what you actually got). All three need to agree on quantities, prices, and items before payment is approved. Discrepancies between any two trigger exceptions that require manual review.

This is the standard process for most AP teams. The problem is that each of these three documents comes from a different source, in a different format, and often with different terminology. Your PO comes from your procurement system. The invoice comes from the vendor. The receiving report comes from your warehouse or store-level staff, sometimes handwritten on the invoice itself.

One premium grocery chain told us that their receivers manually check prices and quantities when deliveries arrive, writing handwritten notes on the paper invoices — quantity changes, returns, discrepancies. Meanwhile, the invoices that arrive digitally "are just never looked at" because there's no process to compare them against POs. Twenty thousand invoices a month, and the digital ones get filed without review.

Three-way matching requires a tool that can extract data from all three document types — regardless of format — and then match them flexibly enough to handle the naming inconsistencies, abbreviations, and format differences between systems. Lido handles both steps: extraction from any format, then fuzzy lookup against your ERP data to reconcile the three sides of the match.

How extraction accuracy determines whether PO-to-invoice matching works

PO matching doesn't fail because the matching logic is hard. It fails because the extraction feeding the match is inaccurate. This is the part most teams underestimate.

If your extraction tool misreads a PO number — pulls "PO-4521" instead of "PO-45210" because the last digit was faint on a scan — the match fails. If it reads the unit price as $21.00 instead of $2.10 because of a decimal point in a low-resolution scan, you get a false exception. If it skips a line item entirely because the table structure confused the parser, that item shows up as unmatched even though it was on the invoice all along.

One operations lead at a gas distribution company processing 27,000 documents a month put it directly:

"The approval is all about the accurate extraction of the data. It has nothing to do with the content."

Their entire manual approval workflow exists not because the business logic requires human judgment, but because their previous extraction tool isn't accurate enough to trust. They built two separate extraction models, fed 50 sample pages into one of them, and still can't auto-approve because the output isn't reliable. After switching to Lido, the extraction accuracy that had blocked auto-approval was no longer the bottleneck — and the fuzzy lookup workflow downstream could actually do its job because the inputs were clean.

This pattern shows up everywhere. Companies invest in matching workflows, validation rules, and exception handling — but the root cause of most exceptions is bad extraction, not legitimate business discrepancies.

How Lido's fuzzy lookup workflow handles messy PO matching

Lido's fuzzy lookup node matches extracted invoice data against reference tables from your ERP — even when PO numbers are truncated, vendor names are abbreviated, and product descriptions use different terminology. It sits inside Lido's workflow builder as a step between extraction and validation, so the entire pipeline runs in one tool. Most matching tools assume clean data on both sides. Real-world AP data is never that clean, which is why configurable fuzzy matching matters more than rigid exact-match rules.

The workflow works like this: you select two tables to compare. Table one is your extracted invoice data — vendor names, PO references, line item descriptions, quantities, prices. Table two is your reference data, typically pulled from your ERP via API — your master list of POs, account names, product numbers, or vendor records. The fuzzy lookup iterates through each row of extracted invoice data and matches it against the entire reference table.

You configure field mappings to define which columns to match on — invoice vendor name against your vendor master list, extracted product description against your product catalog, PO reference against your open PO numbers. You set a minimum confidence threshold (0 to 100) that controls how strict or lenient the matching is. A lower threshold catches more abbreviations and variations but risks false matches. A higher threshold is stricter but misses legitimate matches where the naming differs significantly.

You also choose which columns from the reference table to join back into your primary table — so when a match is found, the corresponding PO number, account code, or product ID from your ERP is attached directly to the invoice row. The output mode controls whether you get the single best match or multiple candidates. And no-match behavior determines what happens when an invoice row falls below the confidence threshold — output it with a null value so you can flag it for manual review, or filter it into a separate exception queue.

For cases where string matching alone isn't enough — product descriptions that use completely different terminology, or vendor names that have been abbreviated beyond recognition — Lido's AI post-processing layer can resolve matches that a pure fuzzy string comparison would miss.

Erewhon uses this workflow to reconcile 20,000 invoices monthly across 10 store locations. Vendor invoices reference stores by abbreviation, nickname, or partial name. The fuzzy lookup matches these against Erewhon's canonical store list and joins back the correct store ID, so downstream validation can route each invoice to the right location's PO. ACS Industries runs 400+ POs per week through a similar workflow after replacing UiPath, which couldn't handle the format variation.

How to reduce time spent matching invoices against POs and receipts

Reducing matching time starts with fixing the data quality problem, not adding more matching rules. Most teams approach this backward: they accept whatever their extraction tool gives them and then build increasingly complex logic to handle the errors. The better approach is to get the extraction right and then use flexible matching to handle the real-world inconsistencies that remain.

Mediaform, an Australian IT services company processing 200 POs and 1,000 invoices monthly, built end-to-end invoice-to-PO validation using Lido in one week. Their workflow extracts invoice data, pulls live PO data from Business Central via API, then runs fuzzy lookups to match invoices against open POs — validating price, quantity, and address. When an exact lookup on PO number fails, the fuzzy lookup catches truncated or reformatted references. "We don't really feel that it's a good effective use of someone's time to be sitting there just punching purchase orders," their team lead said. Before building this, a person compared every invoice to every PO manually. Now the system flags only genuine discrepancies — everything else flows through automatically.

The difference wasn't a better matching algorithm. It was accurate extraction feeding a flexible matching workflow — both in the same tool.

What technologies support parsing both invoices and purchase orders together

Parsing invoices and POs together requires a tool that handles format variance without per-document configuration. POs come from your system and tend to be consistent. Invoices come from hundreds or thousands of vendors and are wildly inconsistent — different layouts, different field names, different table structures, different levels of scan quality.

Template-based tools need a template for each vendor format. When you have 50 vendors, that's manageable. When you have a thousand, it's not. Model-trained tools need sample documents and annotation for each type. Both approaches assume document formats are stable and predictable. They aren't.

But extraction is only half the problem. Once you have structured data from both documents, you need to match them — and neither spreadsheet VLOOKUPs nor rigid rule-based matching handles the naming inconsistencies, abbreviations, and format differences between vendor invoices and your internal systems. You need fuzzy matching that tolerates real-world messiness, integrated directly with the extraction step so you're not stitching together separate tools.

ACS Industries replaced UiPath with Lido for their PO processing after finding it couldn't keep up with format variation across 400+ POs per week. The time spent maintaining extraction rules and matching logic ate into the efficiency gains the tool was supposed to deliver. With Lido, the extraction handles any format and the fuzzy lookup handles the matching — no per-vendor maintenance on either side.

How to build a single source of truth for invoice and PO data

Building a single source of truth means bringing extracted invoice data, PO data from your ERP, and receiving confirmations into one place where they can be compared programmatically. Most companies attempt this with spreadsheets — exporting POs from one system, pasting invoice data from another, and doing VLOOKUPs to find matches. This works at low volume and falls apart quickly because VLOOKUPs require exact matches.

The better approach has three parts. First, extract invoice data accurately and consistently into a structured format, regardless of how the vendor sent it. Second, connect to your ERP or procurement system to pull PO data in real time, so you're always matching against current records. Third, apply matching logic — exact lookup first, fuzzy lookup as a fallback with a configurable confidence threshold — and surface only the genuine discrepancies for human review.

Lido's workflow builder handles all three steps. The extraction node pulls structured data from any invoice format. An API node connects to your ERP — Business Central, Oracle, NetSuite — and pulls live PO data into a reference table. The fuzzy lookup node matches extracted invoice data against that reference, joining back PO numbers, account codes, and product IDs. Everything lives in one workflow, one tool, one view of the data.

Erewhon is building exactly this for 20,000 invoices a month across 10 stores. Every store places daily POs, receives daily deliveries, and generates a stream of invoices that need to be reconciled against those orders. The fuzzy lookup handles the store name variations, product description mismatches, and vendor reference differences that would break an exact-match system. Only invoices with genuine price, quantity, or item discrepancies surface for human review. The rest flow straight through.

How to automatically detect missing purchase orders for incoming invoices

Detecting missing POs is a subset of the matching problem. When an invoice arrives and references a PO number that doesn't exist in your system — or references no PO at all — you need to flag it before it enters the payment queue. This catches unauthorized purchases, duplicate invoices from vendors, or simply POs that were never entered.

The detection logic itself is simple: look up the PO number from the invoice against your open POs, and if no match is found, flag it. In Lido, this is handled by the fuzzy lookup's no-match behavior — when an invoice row falls below the confidence threshold against your PO reference table, it outputs with a null value that downstream workflow steps can route to an exception queue for manual review.

The hard part is extracting the right PO number from the invoice in the first place. PO numbers appear in different locations on different vendor invoices — sometimes in a header field, sometimes in a reference line, sometimes buried in a description column. Some vendors use their own order reference instead of your PO number. Some invoices have no PO reference at all because the purchase was made without one.

Lido's extraction handles this by finding the PO number wherever it appears on the document. The fuzzy lookup then matches it — even if the vendor truncated or reformatted the number. Invoices that genuinely have no matching PO surface automatically for review, rather than requiring someone to check every invoice manually.

How Lido handles the full purchase order to invoice matching workflow

Lido uses a custom blend of AI vision models, OCR, and LLMs to extract data from invoices, POs, and receipts without templates or model training. The extraction feeds directly into a workflow builder where you set up the matching logic — no separate tools, no data exports, no stitching systems together.

  1. Extracts from any invoice or PO format without per-vendor templates
  2. Handles scanned documents, handwriting, and low-quality inputs
  3. Fuzzy lookup node matches extracted data against ERP reference tables with configurable confidence thresholds
  4. AI post-processing resolves matches that pure string comparison misses
  5. API connections to Business Central, Oracle, NetSuite, and other ERPs for live PO data
  6. No-match behavior flags invoices without valid POs for exception review
  7. Computed columns for three-way match validation — price, quantity, and item checks

Mediaform built their full workflow in one week. Erewhon reconciles 20,000 invoices monthly across 10 stores. ACS Industries runs 400+ POs per week after replacing UiPath.

If your PO matching process breaks because vendor names are abbreviated, product descriptions don't line up, or PO numbers are reformatted — the fix isn't more matching rules. It's a tool that extracts accurately and matches flexibly in the same place.

Frequently asked questions

What is the best tool for automatically matching purchase orders to invoices?

Lido is the best option for teams that need PO-to-invoice matching at scale. It extracts structured data from any invoice or PO format without templates, then uses a built-in fuzzy lookup workflow to match invoice data against PO records from your ERP — even when vendor names are abbreviated, PO numbers are truncated, or product descriptions use different terminology. Mediaform built full invoice-to-PO validation in one week, covering 200 POs and 1,000 invoices monthly against Business Central.

How does fuzzy matching work for invoice-to-PO reconciliation?

Lido's fuzzy lookup node compares extracted invoice data against a reference table from your ERP. You define which fields to match on, set a minimum confidence threshold (0-100), and choose whether to return the best match or multiple candidates. When an invoice row falls below the confidence threshold, it's flagged for manual review. For cases where string matching isn't enough, Lido's AI post-processing resolves matches that abbreviations and terminology differences would otherwise break. Erewhon uses this to reconcile 20,000 invoices monthly across 10 store locations.

How can I reduce the time spent matching invoices against POs?

Lido reduces PO matching time by combining accurate extraction with built-in fuzzy lookup — so you fix the data quality problem and the matching problem in one tool. Mediaform built end-to-end invoice-to-PO validation using Lido in one week, covering 200 POs and 1,000 invoices monthly. Their workflow extracts invoice data, pulls live PO data from Business Central via API, and runs fuzzy lookups to validate price, quantity, and address. Before Lido, a person compared every invoice to every PO manually.

How can I automatically detect invoices with missing or invalid purchase orders?

Lido's fuzzy lookup handles missing PO detection automatically. When an extracted PO number falls below the confidence threshold against your open PO reference table, the no-match behavior outputs the invoice with a null value that downstream workflow steps route to an exception queue. Erewhon uses this across 20,000 invoices monthly from 10 store locations to catch unauthorized purchases, duplicate invoices, and POs that were never entered — without requiring someone to manually check every invoice.

How do OCR tools distinguish between invoice numbers, PO numbers, and reference codes?

AI-based extraction understands contextual labels — "Invoice #," "PO Number," "Reference," "Order No." — and maps them to the correct output fields regardless of where they appear on the page. When labels are ambiguous or missing, the AI uses surrounding context (position, formatting, nearby text) to determine what each number represents. Lido's AI columns and reference table lookups help resolve edge cases by comparing extracted values against known PO or invoice number patterns in your system.

What role does OCR play in end-to-end procure-to-pay automation?

OCR and data extraction are the capture layer that feeds the rest of the procure-to-pay pipeline. They convert paper invoices, scanned POs, and emailed PDFs into structured data that downstream systems can match, reconcile, approve, and post to accounting automatically. If the extraction is inaccurate, every step after it breaks — which is why Esprigas's entire manual approval workflow existed not because of business logic, but because their previous extraction tool wasn't accurate enough to trust.

How can I reduce the time spent matching invoices against POs and receipts?

Automated extraction combined with fuzzy matching eliminates manual invoice-to-PO comparison. Lido's fuzzy lookup node matches extracted invoice data against PO reference tables from your ERP using configurable confidence thresholds, flagging only exceptions for human review. Mediaform built this end-to-end workflow in one week — extracting invoice data, pulling live POs from Business Central via API, and running fuzzy lookups to validate price, quantity, and address automatically.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.