So much business data still lives in emails, PDFs, and spreadsheets, and teams often spend hours every week moving data from one place to another.
Today, we’ll do a deep dive into how one California school district purchasing team uses Lido with an assist from GPT-4 to reduce a three-hour task to just five minutes.
Like many other school districts, the Campbell Union High School District uses Quintessential School Systems (QSS) to manage their general ledger and accounts payable. Whenever a school needs to make a purchase the order information is entered into QSS. Once entered, these purchase orders (POs) are exported from QSS and emailed to all parties involved, including the vendor, school, and accounts payable department.
This process is very manual and time consuming for several reasons:
Splitting up the long PDF into the requisite number of individual PDFs, removing the unnecessary “Terms and Conditions” pages, then manually configuring and sending the emails
The Lido team worked with Cecilia and her team to build a custom AI powered workflow in Lido to streamline this process.
Lido is a new spreadsheet built with custom formulas and data connectors to automate repetitive tasks. By combining custom Lido formulas, regular spreadsheet formulas, and targeted GPT formulas we were able to build a custom automation tool that reliably works.
First, we needed to bring our PO PDFs into Lido. Cecilia created a separate Google Drive folder, added her POs to it, then connected it directly to Lido using a built-in data connector.
This resulted in a table of PDF url links in the Lido spreadsheet:
This connected table in Lido stays in sync with the latest contents of the Google Drive folder, so as new files are added or deleted from the folder this list stays updated.
Next, we needed to extract the contents of each PDF into our spreadsheet. Lido’s IMPORTPDF formula takes in a Google Drive link containing a PDF and outputs all of the text contents of the PDF into a spreadsheet cell. By placing this formula into a Computed Column, we’re able to apply the same formula dynamically to every row in the connected table.
Running IMPORTPDF turns all of the contents of the PDF into a gigantic text string. We initially tried extracting all of the contents of the multi-page PDF at once. However, we quickly realized that GPT-4’s hallucination rate increases with input size. So instead we split things up to run page-by-page instead, where the contents of each PDF page are extracted into their own spreadsheet cell.
While it’s nice to have this data out of a PDF, this text blob isn’t all that useful as-is. We need to pick out from this big blob a few data points that we care about: PO number, vendor number, and school.
This is the step in our process where GPT-4 really shines. Since the contents of each PDF are slightly different, we can’t just write a series of rules-based formulas to extract the data we need, because there’s no guarantee it would work for every PDF. Luckily, GPT-4 is pretty good at pulling things out from unstructured data. After some trial and error we were able to find a prompt that consistently and correctly pulled out the fields that we needed.
After running the GPT formulas, we’re left with only the data we need from each PDF, which we then split into individual columns.
We now have a table with one row per PDF page. For each row / page, the specific fields we need have been pulled out into individual columns using GPT-4.
If we go back to Campbell Union’s workflow, they need to use this extracted data to split out their one big QSS exported PDF with many POs into individual PDFs by PO number, then determine who to email each individual PO PDF to.
There are often multiple pages in the large PDF with the same PO number. We need to find all of the specific pages for each PO number, then combine them together so that we have one consolidated PDF per PO.
In a new spreadsheet tab, we used the UNIQUE and FILTER formulas to create a new table with one PO number per row.
By filtering the page numbers from the previous step and matching them to each PO, we’re now able to output a list of page numbers from our large PDF that correspond to each unique PO number.
Next, we need to turn this list of page numbers into smaller PDFs that contain only those pages. The PICKPAGESFROMPDF formula lets us do exactly this.
After adding this formula to every row in our table with a computed column, we now have a neat way to go from one big PDF with lots of different POs into a clean table with one PDF per unique PO number.
Now that we have our individual PDFs processed, all that’s left is determining who they need to be emailed to. Going back to our requirements, there are a few different conditions we need to account for:
Luckily, spreadsheet formulas are perfect for this kind of task. We can account for all of these with a combination of XLOOKUP to match vendor numbers to email addresses, and IF statements to account for the different text conditions.
I won’t go through how we implemented all of the different rules here, but here’s one example for looking up the vendor email based on vendor number. First, we set up a mini-table with all of our known vendor numbers and their corresponding vendor email addresses:
Then, we added a computed column with an XLOOKUP formula to our PO table to lookup the vendor email based on the vendor number in that row in our table.
After adding columns for all of the other rules, we’re finally ready to configure and actually send out our emails.
Lido’s SENDGMAIL formula lets you send out emails directly from Lido and pass through different values from the spreadsheet dynamically (like the recipient and attachment link). So in my example, I can reference different columns in my table for my email recipients, cc recipients, and the attachment link.
And that’s it! The end product is a custom AI tool, built in Lido, that fits CUHSD’s workflow like a glove. After exporting long PDFs out of the QSS system, the purchasing tech now simply needs to upload it to a Google Drive folder. Lido automations then take over to separate the big PO into individual POs, and then tee up emails to different vendors based on the contents of each PO. This process used to take 15 minutes per individual PO and now takes under 30 seconds.
Automating a complex multi-step workflow like this one requires a combination of deterministic actions (e.g. formulas that do the same thing every time) and generative actions (e.g. using AI to interpret and adapt to varied data inputs dynamically). In its current form, AI often struggles to do the same thing every time, which leads to inconsistencies and hallucinations.
Instead of trying to craft a single AI prompt that can reliably complete a task from start to finish, we believe the best approach is to leverage AI for the things it can uniquely do and program the rest deterministically.
If you’re interested in building your own AI powered data extraction workflow, schedule a demo with the Lido team today.