In this article:

IMPORTXML Google Sheets: The Ultimate Guide for 2024

The IMPORTXML function in Google Sheets is a built-in function you can use to import various types of  data from webpages across the Internet. You will learn all of these in this article alongside important information and common errors you may encounter. 

IMPORTXML Function Google Sheets Syntax

The IMPORTXML function uses the following syntax:

=IMPORTXML(“URL”,”xpath”)

Where 

URL is the URL of the webpage you want to import

xpath is the specific XPath code to identify the data you want to import

Learn about XPath in the next section.

IMPORTXML function Google Sheets XPath Guide

What is XPath?

XPath, which stands for XML Path Language, is used to access data stored in XML files. XML files are used to store data, and its XML format is also used as the basis for HTML format today. As HTML format is used in web pages, XPath can also be used to access data from web pages.

The HTML syntax consists of a system of tags and attributes that dictate how the content should be displayed. The tags describe in general terms how the content should be displayed. The attributes further refine the formatting and also include ways to identify the specific data stored in the given tag. 

For example, we want to access the page title of a specific webpage. The page title is the one that appears on the top of your browsers. The page title is enclosed in title tag:

<title>Page Title</title>

To access the page title, the XPath that should be used is

//title

The double slash tells Google Sheets that all the instances of <title></title> should be imported. There is only one instance for each webpage and so will give us the page title. If you want to only find one instance of the tag, use single slash.

You can pinpoint the content of a specific tag if it has an identifying attribute such as class and id. This is important since most of the webpage content is wrapped in div tags. For example, we want to extract content in a div tag for the table of contents. We discover the HTML syntax as

<div class=”toc”>

.

.

.

</div>

The XPath would be:

//div[contains(@class,'toc')]

Google Sheets will scan for all div tags that contain the attribute class and then narrow down the list to those whose attribute value is “toc”

Some HTML tags have their own unique set of attributes. For example, the hyperlink tag has the href attribute that contains the URL. Its syntax is

<a href=”URL”>Link text</a>

You can directly list all the links in the webpage by using the following XPath:

//@href

Where the @ sign indicates that href is an attribute of a tag. The a tag is not included because in this case we want to extract the URL which is the value of the attribute href. If we format this to include the a tag, what we will extract is the link text instead of the URL. Google Sheets will list down all the URLs from the webpage. 

Finally, if the specified XPath gives a blank result, you can add a series of tags and attributes to ensure that Google Sheets are directed to the right content. This can be useful when handling div and p tags. For example, you can add the following in the start of the XPath

/html/body

As these are the top-level tags that every web page should have. 

You can also go further within the tag if it has branches inside. For example, the div tag that we have for table of contents have the following structure

<div class=”toc”>

<ul>

<li>Entry one</li>

<li>Entry two</li>

<li>Entry three</li>

.

.

.

</ul>

</div>

You can include the ul and li tags, that are used for specifying lists, into the XPath:

//div[contains(@class,'toc')]/ul/li/*

Google Sheets will identify the div tag that contains the attribute class with value “toc”, then go further to the ul and li tags to extract the specific contents for each li tag.

We will see these in action through the IMPORTXML function examples in Google Sheets. 

How to Identify the XPath for specific data?

Sometimes you need to dig into the webpage code itself and identify the XPath from it.

Here’s an example:

Webpage data to import

We want to import the table of contents of this page to Google Sheets. Here are the steps:

Step 1: Right-click on the specific data.

Right-click on the element

Step 2: Select Inspect element.

The word Inspect is used in Chrome; for Firefox it’s Inspect element. The webpage code will show up on the right side of the browser. 

Inspect element tab on the right side

Screencap of step 1 and 2

Inspect element to check the code for webpage

Step 3: Take note of the tags and properties.

Sometimes the exact line may not be useful for our purpose. We should look around the highlighted line for the best XPath code. 

Portion of the code to be used as xpath highlighted

We will use the line with class=”toc” included. To confirm it’s the right line, hover the cursor on the line and observe which part of the webpage.

The resulting XPath is

//div[contains(@class,'toc')]/ul/li/*

This will look into the div tag with class=”toc” then select the values stored in li tags within. 

Step 4: Use with formula 

Finally you can combine the XPath with the URL in Google Sheets. The formula is

=importxml("https://en.wikipedia.org/wiki/List_of_spreadsheet_software","//div[contains(@class,'toc')]/ul/li/*")

The list is now imported to Google Sheets

Data imported to Google Sheets using IMPORTXML function with custom XPath

IMPORTXML Function Google Sheets Examples

We have five examples to demonstrate how IMPORTXML function works in Google Sheets. 

Import the Page title of a Webpage to Google Sheets

Step 1: Identify URL and XPath

We want to import the page title of the home page of Lido. The URL is www.lido.app

The XPath for page title is 

//title

Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")

The formula is

=importxml("https://www.lido.app/","//title")

The page title will be imported to Google Sheets.

Google Sheets IMPORTXML function import page title

Import the Meta description of a Web Page to Google Sheets

The meta description of a web page also helps in its visibility in search engines. Here are the steps in importing the meta description of a webpage to Google Sheets using IMPORTXML function:

Step 1: Identify URL and XPath

We want to import the meta description of the home page of Lido. The URL is www.lido.app

The XPath for meta description is 

/html/head/meta[@name='description']/@content

Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")

The formula is

=importxml("https://www.lido.app/","/html/head/meta[@name='description']/@content")

The meta description is now imported to your Google Sheets:

Google Sheets IMPORTXML function import meta description

Import Headers from a Web page to Google Sheets

Web page structure consists of headers and paragraphs. A well-structured web page is easier to read and therefore attracts more traffic. Here are the steps to import headers present in the webpage and list them in your Google Sheets using the IMPORTXML function: 

Step 1: Identify URL and XPath

We want to import the headers on the Wikipedia page for “Spreadsheet” and analyze it in Google Sheets. The URL is

https://en.wikipedia.org/wiki/Spreadsheet

The XPaths for the headers are:

H1: //h1

H2: //h2

H3: //h3

Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")

The formula for importing h1 headings is:

=importxml("https://en.wikipedia.org/wiki/Spreadsheet","//h1")

For h2:

=importxml("https://en.wikipedia.org/wiki/Spreadsheet","//h2")

For h3:

=importxml("https://en.wikipedia.org/wiki/Spreadsheet","//h3”)

The list of headers will be loaded to Google Sheets:

Google Sheets IMPORTXML function import headers

Import Links from a Web page to Google Sheets

Step 1: Identify URL and XPath

We want to import the links in the Wikipedia page for Spreadsheet and analyze them in Google Sheets. The URL is

https://en.wikipedia.org/wiki/Spreadsheet

The XPath for the links is

//@href

Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")

The formula for is: 

=importxml("https://en.wikipedia.org/wiki/Spreadsheet","//@href")

The links are then listed in Google Sheets:

Google Sheets IMPORTXML function import all links

Import  Tables from a Web Page to Google Sheets

Step 1: Identify URL and XPath

We want to import the table in the Wikipedia page for “List of spreadsheet software” and analyze it in Google Sheets. The URL is

https://en.wikipedia.org/wiki/List_of_spreadsheet_software

The XPath for the links is

//table[n]/tbody/*

Where n points to what table to import. Use 1 if there is only one table in the page; else identify the order in which the table you want to import appears in the page.

Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")

The formula for is: 

=importxml("https://en.wikipedia.org/wiki/List_of_spreadsheet_software","//table[1]/tbody/*")

The table is then loaded to Google Sheets:

Google Sheets IMPORTXML function import tables

IMPORTXML Function Google Sheets FAQs

What are some Common IMPORTXML function Errors in Google Sheets?

Import Xml content cannot be parsed

It means that the XPath input does not follow the correct syntax; thus Google Sheets cannot import any data from the webpage. Check the formula to make sure there is no typo and that the proper syntax is followed.

Imported content is empty

It can mean two things:

  1. The XPath is not a valid HTML element; thus Google Sheets cannot identify any content corresponding to it.
  2. There is no content corresponding to the given XPath as interpreted by Google Sheets.

You may need to check the webpage itself to identify the proper XPath.

Can all webpage data be imported to Google Sheets using IMPORTXML function?

No. There are websites that actively block codes that import their content. This will cause Google Sheets to output errors in the IMPORTXML function when otherwise the XPath should work. 

If what you need is to import analytics data from your e-Commerce and marketing platforms to Google Sheets, the IMPORTXML function is not the solution. I recommend Lido. There’s no need to dig through the webpage code and use the XPath. Instead, you just need to do a few clicks and the data will seamlessly load in your spreadsheet.

Click here to get started. 

IMPORTXML Function Google Sheets Summary

The IMPORTXML function lets you use the XPath to pinpoint the specific data you want from a web page and import it to Google Sheets.  You can import a wide variety of data present in web pages such as page titles, metadata, lists, links, and paragraphs.

IMPORTXML function depends on using the correct XPath for each type of data that can be found in a webpage. Some examples you can use in Google Sheets are the following:

Page title: //title

Meta description: /html/head/meta[@name='description']/@content

H1: //h1

H2: //h2

H3: //h3

Links: //@href

Tables: //table[n]/tbody/*

You can review the previous section about XPath so you can further use the IMPORTXML function in Google Sheets with even better results!

We hope this article has helped you and given you a better understanding of how to use the IMPORTXML function in Google Sheets. You might also like our articles on how to use the Google Sheets IMPORTRANGE function and the TODAY function in Google Sheets.

On a side note, we also recommend reading our guide on how to get email notifications from Google Sheets and trying our expiration reminder software.

Related Articles

-IMPORTHTML Google Sheets

-IMPORTRANGE Google Sheets

-Import Multiple Sheets With IMPORTRANGE

-Google Sheets Import Functions

Schedule a free automation consult
Learn more

Automate repetitive tasks with Lido

Save hours on repetitive and tedious work. Lido is a new spreadsheet that connects your spreadsheets, forms, PDFs, and email inbox.

Trigger
Action
Select trigger
When a cell value in Google Sheets changes
Check Google Sheets for today's date
When a new row is added to Google Sheets
On new Google Form submission
Send me a daily reminder
On new Typeform submission
When a cell value in database changes
Check database for today's date
When a new row is added in database
When a new HubSpot customer is created
Select action
ADDCALENDAREVENT
Creates a one- or multiple-day calendar event with optional attendees
ADDHUBSPOT
Adds an object to Hubspot
ADDSLACKCHANNEL
Create a Slack channel, and optionally add a topic or members
CALLURL
Makes any HTTP request
CREATEGOOGLEDOC
Replaces [@column_name] values in a Google Doc with the corresponding table row's values
CREATEPDF
Replaces [@column_name] values in a Google Doc with the corresponding table row's values, then export it as a PDF
FETCH
Makes any HTTP request and returns the response
INSERTROWS
Inserts given array below defined values in given worksheet
SENDGMAIL
Sends an email using your Google account
SENDOUTLOOK
Sends an email using your Microsoft Outlook account
SENDSLACK
Sends a Slack message
SENDSMS
Sends an SMS message using your Twilio account
UPDATECELL
Updates cells with given values
UPDATEHUBSPOT
Updates a property of a Hubspot object
Trigger is required
Action is required
Get started