IMPORTXML Google Sheets: The Ultimate Guide for 2023
The IMPORTXML function in Google Sheets is a built-in function you can use to import various types of data from webpages across the Internet. You will learn all of these in this article alongside important information and common errors you may encounter.
IMPORTXML Function Google Sheets Syntax
The IMPORTXML function uses the following syntax:
URL is the URL of the webpage you want to import
xpath is the specific XPath code to identify the data you want to import
Learn about XPath in the next section.
IMPORTXML function Google Sheets XPath Guide
What is XPath?
XPath, which stands for XML Path Language, is used to access data stored in XML files. XML files are used to store data, and its XML format is also used as the basis for HTML format today. As HTML format is used in web pages, XPath can also be used to access data from web pages.
The HTML syntax consists of a system of tags and attributes that dictate how the content should be displayed. The tags describe in general terms how the content should be displayed. The attributes further refine the formatting and also include ways to identify the specific data stored in the given tag.
For example, we want to access the page title of a specific webpage. The page title is the one that appears on the top of your browsers. The page title is enclosed in title tag:
To access the page title, the XPath that should be used is
The double slash tells Google Sheets that all the instances of <title></title> should be imported. There is only one instance for each webpage and so will give us the page title. If you want to only find one instance of the tag, use single slash.
You can pinpoint the content of a specific tag if it has an identifying attribute such as class and id. This is important since most of the webpage content is wrapped in div tags. For example, we want to extract content in a div tag for the table of contents. We discover the HTML syntax as
The XPath would be:
Google Sheets will scan for all div tags that contain the attribute class and then narrow down the list to those whose attribute value is “toc”.
Some HTML tags have their own unique set of attributes. For example, the hyperlink tag has the href attribute that contains the URL. Its syntax is
<a href=”URL”>Link text</a>
You can directly list all the links in the webpage by using the following XPath:
Where the @ sign indicates that href is an attribute of a tag. The a tag is not included because in this case we want to extract the URL which is the value of the attribute href. If we format this to include the a tag, what we will extract is the link text instead of the URL. Google Sheets will list down all the URLs from the webpage.
Finally, if the specified XPath gives a blank result, you can add a series of tags and attributes to ensure that Google Sheets are directed to the right content. This can be useful when handling div and p tags. For example, you can add the following in the start of the XPath
As these are the top-level tags that every web page should have.
You can also go further within the tag if it has branches inside. For example, the div tag that we have for table of contents have the following structure
You can include the ul and li tags, that are used for specifying lists, into the XPath:
Google Sheets will identify the div tag that contains the attribute class with value “toc”, then go further to the ul and li tags to extract the specific contents for each li tag.
We will see these in action through the IMPORTXML function examples in Google Sheets.
How to Identify the XPath for specific data?
Sometimes you need to dig into the webpage code itself and identify the XPath from it.
Here’s an example:
We want to import the table of contents of this page to Google Sheets. Here are the steps:
Step 1: Right-click on the specific data.
Step 2: Select Inspect element.
The word Inspect is used in Chrome; for Firefox it’s Inspect element. The webpage code will show up on the right side of the browser.
Screencap of step 1 and 2
Step 3: Take note of the tags and properties.
Sometimes the exact line may not be useful for our purpose. We should look around the highlighted line for the best XPath code.
We will use the line with class=”toc” included. To confirm it’s the right line, hover the cursor on the line and observe which part of the webpage.
The resulting XPath is
This will look into the div tag with class=”toc” then select the values stored in li tags within.
Step 4: Use with formula
Finally you can combine the XPath with the URL in Google Sheets. The formula is
The meta description is now imported to your Google Sheets:
Import Headers from a Web page to Google Sheets
Web page structure consists of headers and paragraphs. A well-structured web page is easier to read and therefore attracts more traffic. Here are the steps to import headers present in the webpage and list them in your Google Sheets using the IMPORTXML function:
Step 1: Identify URL and XPath
We want to import the headers on the Wikipedia page for “Spreadsheet” and analyze it in Google Sheets. The URL is
The XPaths for the headers are:
Step 2: Use formula =IMPORTXML("https://www.yourwebsite.com/","XPATH")
What are some Common IMPORTXML function Errors in Google Sheets?
Import Xml content cannot be parsed
It means that the XPath input does not follow the correct syntax; thus Google Sheets cannot import any data from the webpage. Check the formula to make sure there is no typo and that the proper syntax is followed.
Imported content is empty
It can mean two things:
The XPath is not a valid HTML element; thus Google Sheets cannot identify any content corresponding to it.
There is no content corresponding to the given XPath as interpreted by Google Sheets.
You may need to check the webpage itself to identify the proper XPath.
Can all webpage data be imported to Google Sheets using IMPORTXML function?
No. There are websites that actively block codes that import their content. This will cause Google Sheets to output errors in the IMPORTXML function when otherwise the XPath should work.
If what you need is to import analytics data from your e-Commerce and marketing platforms to Google Sheets, the IMPORTXML function is not the solution. I recommend Lido. There’s no need to dig through the webpage code and use the XPath. Instead, you just need to do a few clicks and the data will seamlessly load in your spreadsheet.
The IMPORTXML function lets you use the XPath to pinpoint the specific data you want from a web page and import it to Google Sheets. You can import a wide variety of data present in web pages such as page titles, metadata, lists, links, and paragraphs.
IMPORTXML function depends on using the correct XPath for each type of data that can be found in a webpage. Some examples you can use in Google Sheets are the following:
Page title: //title
Meta description: /html/head/meta[@name='description']/@content
You can review the previous section about XPath so you can further use the IMPORTXML function in Google Sheets with even better results!