Blog

OCR for Insurance: Automate COI and Claims Processing

March 20, 2026

AI-powered OCR for insurance automates data extraction from certificates of insurance (COIs), authorization documents, EOBs, and claims paperwork. Gallagher (AJG), a Fortune 500 insurance broker, uses Lido to process COIs where a single certificate produces three separate table imports with exact-match coverage wording requirements. Libertana, a home health services provider funded by Health Net, LA Care, CalOptima, and Anthem, automates authorization document processing across five standardized tiers, including unit calculations and outlier flagging at a $33/unit rate.

COI processing: why one document becomes three imports

A Certificate of Insurance looks simple. It is a single page, usually an ACORD 25 or ACORD 28 form, listing an insured party’s coverage details. Brokers and risk management teams process thousands of these. On the surface, it seems like a straightforward extraction job: pull the carrier name, policy number, coverage limits, and expiration date.

In practice, COI processing is one of the more complex document workflows in insurance operations.

Gallagher (AJG), a Fortune 500 insurance broker, processes certificates where a single document produces three different imports that load into three separate tables. The general liability section maps to one table. The auto liability section maps to another. The workers compensation section maps to a third. Each section has its own field structure, its own validation rules, and its own downstream consumers. Treating a COI as a single flat extraction misses the relational structure of the data entirely.

Then there is the exact-match problem. Coverage fields on the certificate must use specific wording that matches the requirements table maintained by the risk management team. If a certificate says "Commercial General Liability" but the requirements table expects "CGL," the import fails validation. If a certificate lists "Each Occurrence" at $1,000,000 but the requirement specifies "Per Occurrence" at $1,000,000, that is a mismatch that needs to be flagged even though the intent is identical. The extraction system needs to compare extracted text against the requirements table using both exact matching and semantic matching, depending on the field.

NAIC codes add another lookup step. Every carrier on the certificate has a National Association of Insurance Commissioners code. The extraction pipeline needs to pull the carrier name from the certificate, look up the corresponding NAIC code from a reference table, and include both in the output. If the carrier name on the certificate does not match the reference table exactly (and it often does not, because certificates abbreviate, misspell, or use trade names instead of legal names), the system needs to perform a fuzzy match against the NAIC database.

Named insured matching is the final complication. The named insured field on a certificate may not match the client name in the broker’s system. "Smith Construction LLC" on the certificate might correspond to "Smith Construction Group, Inc." in the CRM. Manual processors spend significant time on this matching step. Automated extraction needs to handle it through configurable fuzzy matching with confidence thresholds.

Insurance authorization workflows: the Libertana model

Libertana provides home health services funded by managed care organizations including Health Net, LA Care, CalOptima, and Anthem. Their document processing challenge is different from COIs but equally structured: insurance authorization documents that determine how many service units a patient receives and at what rate.

Each authorization follows one of five standardized tiers. The extraction needs to identify which tier applies, pull the approved units, total days, and dollar rate, then perform a calculation: approved units divided by total days, multiplied by the dollar rate ($33/unit). This is not a simple field extraction. It is extraction plus arithmetic, with the calculated result becoming part of the output record.

Outlier flagging adds a quality control layer. When the calculated value falls outside expected ranges for a given tier, the system flags it for human review. This catches data entry errors on the authorization form itself (a typo in the approved units field, for instance) before the error propagates into billing. It also catches legitimate outliers that require clinical review, like an unusually high unit count that may indicate a changed care plan.

Duplicate handling is a daily operational concern. Libertana receives duplicate authorization submissions throughout the day as payers resubmit or correct previous authorizations. The extraction pipeline needs to detect duplicates (same patient, same authorization number, same date range) and either suppress them or flag them for review, depending on whether the duplicate is identical or contains updated information.

The final step is auto-filling acknowledgement forms. Once an authorization is processed and validated, Libertana needs to send an acknowledgement back to the payer. The extraction system populates the acknowledgement form with data pulled from the authorization, eliminating manual re-entry of patient information, authorization numbers, and service details.

EOB extraction across multiple payers

Explanation of Benefits documents are the bane of healthcare billing departments everywhere. Every payer formats their EOB differently. United Healthcare EOBs look nothing like Aetna EOBs, which look nothing like Medicare remittance advice forms. The data is conceptually the same (what was billed, what was allowed, what was paid, what the patient owes) but the layout, terminology, and field placement vary wildly.

US Neurology processes 1,400 arbitration payment determination documents, which are a specialized form of EOB used in out-of-network payment disputes. These documents contain the arbitrator’s determination of the appropriate payment amount, references to the original claim, and the payer’s final payment obligation. Extracting structured data from 1,400 of these documents manually would require a dedicated team member working full-time for weeks.

The extraction challenge with EOBs is not just format variation. It is the density of data on each document. A single EOB may contain payment details for dozens of claims, each with its own procedure code, billed amount, allowed amount, adjustment codes, and patient responsibility breakdown. Line-item extraction from EOBs requires understanding the tabular structure of each payer’s format and correctly associating each payment line with the right claim.

Relay, a healthcare operations company, processes 16,000 Medicaid claims through automated extraction. At that volume, even a 2% error rate means 320 claims need manual correction, each of which can take 15 to 30 minutes to research and fix. The cost of errors at scale makes extraction accuracy the single most important metric in healthcare document processing. (For more on healthcare document automation, see our guide on OCR for medical billing.)

Compliance and exact-match requirements

Insurance document processing has a compliance dimension that most other industries do not. When Gallagher processes a COI, the output is not just data for a spreadsheet. It is evidence of coverage that may be referenced in a claim, an audit, or litigation. The extracted values need to be traceable back to the source document, and the extraction needs to be reproducible.

Exact-match requirements make this harder. In most document processing workflows, a close match is good enough. If the system extracts "ABC Corp" instead of "ABC Corporation," a human can verify and move on. In insurance compliance, certain fields require exact character-for-character matching against a requirements table. Coverage descriptions, policy endorsement wording, and additional insured language all have compliance significance. The difference between "Additional Insured - Owners, Lessees or Contractors" and "Additional Insured - Owners, Lessees, or Contractors" (note the Oxford comma) can determine whether a coverage requirement is met.

This creates a two-tier matching system within the same extraction workflow. Some fields (carrier name, policy number) can use fuzzy matching with confidence scores. Other fields (coverage descriptions, endorsement wording) require exact matching or explicit flagging for human review. The extraction pipeline needs to know which fields fall into which category and apply the appropriate matching logic to each.

Audit trails matter more in insurance than in most industries. Every extraction, every match, every flag needs to be logged with the source document, timestamp, and confidence score. (For background on how automated extraction pipelines work, see our guide on document automation.) When a claim surfaces two years after the COI was processed, the broker needs to demonstrate that the certificate was validated, that coverage requirements were met, and that the data in the system matches the original document.

What implementation looks like for insurance operations

Gallagher is past evaluation and in the onboarding phase with Lido. The implementation involves three main configuration steps.

First, defining the three-table import structure so that each section of the COI maps to the correct destination table with the correct field mapping. This is a one-time setup that reflects the broker’s existing data architecture.

Second, loading the requirements table and NAIC lookup table into Lido so that extracted values can be validated against requirements and carrier names can be cross-referenced to NAIC codes automatically. These reference tables update periodically and the integration keeps them in sync.

Third, configuring the matching rules: which fields use exact matching, which use fuzzy matching, what confidence threshold triggers a human review flag, and how named insured variations are handled.

For Libertana, implementation centered on encoding the five authorization tiers, the unit calculation formula, the outlier thresholds, and the acknowledgement form template. Because the authorization documents follow a standardized format within each payer, the extraction configuration is more predictable than COI processing, but the downstream calculation and form-filling steps add workflow complexity.

Both implementations share a common pattern: the hard part is not the extraction itself. AI-powered extraction handles the document reading. The hard part is the business logic that wraps around the extraction, the validation rules, lookup tables, calculations, and routing decisions that turn raw extracted data into actionable output. Lido handles both layers.

Frequently asked questions

Why does one Certificate of Insurance produce three separate imports?

A COI contains multiple coverage sections (general liability, auto liability, workers compensation) that map to different tables in the broker’s system. Each section has its own field structure and validation rules. Treating the entire certificate as a single flat record loses the relational structure of the data and makes downstream compliance checking much harder. Gallagher (AJG) processes COIs where each certificate produces three distinct table imports.

Can AI extraction handle exact-match compliance requirements on COIs?

Yes. Lido supports a two-tier matching system within the same workflow. Fields like carrier name and policy number use fuzzy matching with confidence scores. Fields like coverage descriptions and endorsement wording use exact matching against a requirements table. When exact matching fails, the system flags the certificate for human review rather than silently accepting a close-but-not-exact match.

How does the system handle NAIC code lookups when carrier names don’t match exactly?

Certificates often abbreviate carrier names, use trade names instead of legal names, or contain misspellings. Lido performs a fuzzy match against the NAIC reference database to find the correct carrier code. When multiple potential matches exist, the system returns the match with the highest confidence score and flags low-confidence matches for human verification.

Can OCR perform calculations on extracted data, like unit rate calculations for authorizations?

Yes. Libertana’s authorization workflow requires calculating approved units divided by total days multiplied by the dollar rate ($33/unit). Lido extracts the raw fields from the authorization document and performs the calculation as part of the extraction pipeline. The calculated result becomes part of the output record, and values that fall outside expected ranges are flagged as outliers for review.

How are duplicate authorization submissions handled?

Libertana receives duplicate submissions throughout the day as payers resubmit or correct authorizations. Lido detects duplicates by matching patient identifiers, authorization numbers, and date ranges. Identical duplicates are suppressed. Duplicates with updated information are flagged for review so the team can determine whether the new submission supersedes the original.

What types of insurance documents can AI extraction process?

Lido processes COIs (ACORD 25, ACORD 28), EOBs from any payer format, insurance authorization documents, arbitration payment determinations, Medicaid claims, and remittance advice forms. The system handles both standardized forms and payer-specific formats without requiring templates for each payer or document type.

How long does it take to implement COI processing automation?

Implementation involves three main steps: defining the multi-table import structure, loading requirements and NAIC lookup tables, and configuring matching rules (exact vs. fuzzy, confidence thresholds, named insured handling). Gallagher moved from evaluation to onboarding in a matter of weeks. The extraction itself works on the first document; the implementation time is spent configuring the business logic layer around it.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.