How to Convert Purchase Orders to Excel Automatically

To convert purchase orders to Excel automatically, use Lido to extract PO fields like vendor name, PO number, line items, quantities, and totals from any format without per-customer templates. The output downloads as Excel or CSV, formatted for direct ERP import.

If your operations team processes purchase orders from retailers or distributors, you already know the problem. Every customer sends POs in a different format. One sends a single-page PDF. Another sends a multi-page spreadsheet with line items split across tabs. A third sends scanned paper with handwritten quantities. A fourth emails a PO as plain text in the body of the message. And every one of them needs to end up as rows in your system before you can pick, pack, ship, or invoice.

The manual version is exactly what it sounds like: someone opens each PO, reads the fields, and types them into Excel or directly into the ERP. When you're processing 50 POs a week from a handful of customers, it's tedious but manageable. When you're processing 200 or 400 per week from dozens of customers, each with their own format, you're hiring headcount just to do data entry.

This post covers how to set up a document extraction workflow that converts purchase orders from any format into structured Excel rows without manual retyping, and how to format that output so it goes straight into your ERP.

Why copy-paste and template OCR both fail on purchase orders

The first thing most teams try is copy-paste. Open the PDF, select the table, paste it into Excel. Sometimes it works. More often, it doesn't. Multi-column layouts paste as a single column. Tables that span page breaks split into disconnected fragments. Scanned documents don't have selectable text at all. And even when copy-paste produces something that looks right, the data usually needs cleanup: merged cells, stray characters, amounts that pasted as text instead of numbers.

The second thing teams try is template-based OCR. You define zones on the document: "PO number is at this position, line items start here, totals are at the bottom." That works for one customer's PO format. It breaks when the next customer sends a different layout. If you have 30 customers sending POs, you need 30 templates. When customer #14 updates their ERP and their PO layout shifts, template #14 breaks and someone has to rebuild it. For a deeper look at why this approach fails at scale, see our post on why template-based OCR breaks.

Purchase orders have more format variation than almost any other business document. Unlike invoices, which your company generates in a single format, POs come from your customers' systems. You don't control the layout, the field names, or the structure. "Item Number" on one PO is "SKU" on another and "Product Code" on a third. The quantity column might be labeled "Qty," "Quantity Ordered," "Order Qty," or just "Units." You need a method that handles this variation without per-customer configuration.

Large retailers often send POs via EDI (Electronic Data Interchange), which is structured and machine-readable. But many of your customers won't be on EDI. Smaller retailers, distributors, one-off accounts, and international buyers still send POs as PDFs, Excel files, scans, or emails. These non-EDI purchase orders are the ones that create the data entry bottleneck, and they're the ones where purchase order OCR and AI-based extraction make the biggest difference. The goal is to extract data from purchase orders in any format and produce clean PO-to-Excel output that's ready for your system.

What fields you need from a purchase order

Before setting up any extraction workflow, define what your ERP or order management system actually needs. Here are the standard fields for PO-to-Excel conversion.

The PO number is the customer's purchase order reference, and it's the linking field between the PO and everything downstream: the sales order, the shipment, the invoice. It appears under various labels like "PO #," "Purchase Order Number," "Order Number," or just "Reference."

Customer name and address tells you who placed the order. For retailer POs, this is often the "Ship To" or "Bill To" entity. Some POs list both a buying entity and a shipping destination, and you may need both.

Line items are the products ordered. Each line typically includes a product identifier (SKU, item number, UPC), a description, quantity ordered, and unit price. Retailer POs often include additional fields like size breakdowns, color codes, or department numbers that your system needs for routing.

Unit prices and totals cover the agreed price per unit and the extended total per line. Some POs include tax and freight estimates; others don't.

Dates include order date, requested ship date, and cancel date. Retailer POs almost always have a "must ship by" or "cancel after" date that drives fulfillment priority.

Shipping instructions cover carrier preference, routing guide references, and special handling notes. These usually appear in a header block rather than the line item table.

Your Excel output columns should match what your ERP's sales order import expects, not what the PO calls these fields. If your ERP wants "CustomerPO" as the column header, use that, not "PO Number."

How to convert POs to Excel with AI extraction

First, create your extraction template. In Lido, set up a sheet with column headers that match your ERP's sales order or PO import format. If your system expects columns like CustomerPO, SKU, Description, QtyOrdered, UnitPrice, LineTotal, use those exact names. Each column tells the AI what to look for on the document.

Next, add formatting instructions. Specify how you need the data formatted. "Output dates as MM/DD/YYYY." "Output amounts as plain numbers with two decimal places, no dollar signs." "Preserve leading zeros on item codes." These instructions make sure every PO you process outputs in the same format, regardless of how the source document displays the data.

Then upload a PO and extract. Drop in a PDF, image, or scan. The AI reads the document, identifies the PO number, customer information, and line item table, and maps each field to your output columns. A PO that labels the quantity column "Qty Ordered" and another that labels it "Units" both populate the same QtyOrdered column in your output. No template to build. No zones to define.

Finally, review and export. The extracted data appears in a spreadsheet view where each row is one PO line item. A typical output looks like: CustomerPO = PO-29841, SKU = WDG-A-BLU, Description = Widget A Blue, QtyOrdered = 500, UnitPrice = 4.25, LineTotal = 2125.00 — one clean row per line, every field in the column your ERP expects. Spot-check a few fields against the source document. If everything looks right, export as Excel or CSV. The file is ready for ERP import without a reformatting step in between.

For teams processing POs daily, you can skip the manual upload entirely. Set up email forwarding so any PO sent to a dedicated inbox gets processed automatically against your template. The extracted data is waiting when you're ready to pull it.

Handling multi-page and multi-line POs

Retailer purchase orders are often long. A single PO from a large retailer might run 10 or 15 pages with hundreds of line items. The line item table starts on page 2 and continues through page 12, with headers repeated on each page and subtotals interspersed throughout.

AI extraction handles this by treating the entire document as one continuous data set. Page breaks in the middle of a table don't create separate extractions. Repeated headers on each page don't duplicate rows. Subtotal rows get excluded from the line-item output because they're summary data, not order lines. You get one row per line item in your Excel file, regardless of how many pages the PO spans.

This is where template-based tools struggle most. A template that expects line items to start at row 5 on page 1 doesn't know what to do when the same table continues at row 3 on page 2 with a different header layout. AI extraction doesn't rely on fixed positions, so multi-page tables work the same as single-page ones.

How much time does automated PO-to-Excel conversion save?

ACS Industries, a manufacturer processing 400 purchase orders per week, runs every PO through Lido regardless of format: PDFs, scans, images, plain-text emails. They previously evaluated UiPath for the workflow and found the template-based approach couldn't handle the format variation across their customer base. With AI extraction, they process every customer's POs through a single template. No per-customer configuration. They avoided hiring an additional FTE for data entry and eliminated roughly 30 hours per week of manual PO processing.

YMI Jeans, an apparel company, had a different version of the same problem. Their retail customers send purchase orders that include size breakdowns, color codes, and reference tables specific to each retailer's system. Some of these POs arrive as handwritten forms. After migrating to NetSuite, they needed a way to get PO data into their new ERP without rebuilding manual workflows. AI extraction handles the format variation across their retail partners and outputs structured data formatted for NetSuite import.

The math is straightforward. If each PO takes 5 to 10 minutes to manually retype into your system, and you're processing 100 POs per week, that's 8 to 16 hours of data entry per week. At 400 POs per week, it's a full-time role. Extraction converts that time cost from linear (more POs = more hours) to nearly flat (more POs = more rows in a spreadsheet that's already populated).

The accuracy improvement matters just as much as the time savings. Manual retyping introduces errors: transposed digits in a quantity field, a wrong unit price, a missed line item. These errors propagate downstream. Wrong quantities get shipped, wrong amounts get invoiced, customers dispute charges, and someone spends time investigating and correcting. When the data comes directly from the document through extraction rather than through manual retyping, the error rate drops to near zero on fields the AI reads with high confidence.

Matching extracted PO data against your product master

Getting the data out of the PO is half the workflow. The other half is validating that data against your internal systems before it becomes a sales order.

Say the customer's PO says "Widget A, Blue, 500 units at $4.25." Your product master has it as "WDG-A-BLU" at a current price of $4.20. Is this the same item? Is the price within tolerance? Does the customer have a negotiated price that differs from list?

Lido handles this through context documents and fuzzy matching. Upload your product master, price list, or customer-specific pricing as a context document alongside your extraction template. The AI cross-references the item descriptions and codes from the PO against your internal records, mapping "Widget A, Blue" to "WDG-A-BLU" even when the names don't match exactly. Price discrepancies get flagged rather than silently accepted.

This turns the Excel output from raw extracted data into validated, ERP-ready data. Your operations team reviews flagged exceptions (a new item code that doesn't match anything in the product master, a price that's outside the agreed tolerance) rather than checking every line manually. For a deeper look at this validation workflow, see our post on automating PO matching.

Handling handwritten and scanned POs

Not every purchase order arrives as a digital PDF. Some customers still fax POs. Others fill out printed PO forms by hand, especially in industries like construction, food service, and smaller retail operations where orders are placed from a warehouse floor or a kitchen office.

AI-based extraction reads handwritten documents at roughly the level a human reader can. Block handwriting, the kind typically found on order forms where someone prints quantities and item numbers in boxes, extracts reliably. Cursive or heavily stylized handwriting may need review on specific fields, but the system flags low-confidence extractions rather than guessing silently.

Scanned documents, including faxes, follow the same extraction path as digital PDFs. The AI reads the document visually rather than relying on embedded text layers, so a 200 DPI scan of a faxed PO gets the same treatment as a born-digital PDF from a customer's ERP. For more on this, see our post on extracting data from handwritten documents.

Formatting the output for your specific ERP

The Excel file your extraction produces needs to match what your ERP's import tool expects. This is where most workflows add a manual reformatting step that shouldn't be necessary.

If your ERP is NetSuite, your import template has specific column names, date formats, and number formats it requires. Dynamics 365 has different requirements. QuickBooks, SAP, Sage, each has its own expectations. They're all strict about formatting and will reject files that don't conform exactly.

The fix is to configure your document automation template to output in your ERP's format from the start, not to extract in a generic format and reformat afterward. Use your ERP's exact column headers. Specify date and number formatting in your extraction instructions. Include any required fields that don't appear on the PO, like internal customer IDs, through context document lookups.

We wrote a detailed breakdown of what each major ERP requires in our post on getting extracted data into your ERP.

We hope this guide helps you convert purchase orders to Excel without manual data entry.

Frequently asked questions

Can I convert purchase orders from multiple customers using one template?

Yes. AI extraction reads documents contextually, not by matching fixed positions. One template handles every customer's PO format, regardless of layout, field names, or table structure, without per-customer configuration. ACS Industries processes 400 POs per week from dozens of customers through a single extraction template.

Does this work with scanned or handwritten purchase orders?

Yes. AI extraction reads documents visually rather than relying on embedded text, so scanned paper, faxed POs, and handwritten order forms all process through the same workflow as digital PDFs. Block handwriting extracts reliably. Low-confidence fields get flagged for review rather than guessed at.

Can the Excel output go directly into my ERP?

Yes, if you configure your extraction template to match your ERP's import format. Use your ERP's exact column headers, specify date and number formatting in your extraction instructions, and use context documents for fields like internal customer IDs that don't appear on the PO. The output downloads as Excel or CSV, ready for import without manual reformatting.

How does the extraction handle POs with hundreds of line items across multiple pages?

The AI treats the entire document as one continuous data set. Page breaks mid-table, repeated headers, and interspersed subtotals are handled automatically. A 15-page PO with 300 line items produces 300 rows in your Excel output, one per line item, with no duplicates from repeated headers and no fragments from page breaks.

How long does it take to convert a purchase order to Excel?

Extraction typically takes under a minute per document. A single-page PO processes in seconds. A 15-page PO with hundreds of line items takes slightly longer but still completes in under a minute. The initial template setup takes 10 to 15 minutes, and then every subsequent PO from any customer runs through that same template automatically with no additional configuration.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.

Schedule a demo