In this article:

Import Data from a Website to Google Sheets [2024 Update]

March 31, 2024

What is XPath?

XPath, which stands for XML Path Language, is used to access data stored in XML files. XML files are used to store data, and its XML format is also used as the basis for HTML format today. As HTML format is used in web pages, XPath can also be used to access data from web pages.

The HTML syntax consists of a system of tags and attributes that dictate how the content should be displayed. The tags describe in general terms how the content should be displayed. The attributes further refine the formatting and also include ways to identify the specific data stored in the given tag. 

Here are some examples of the HTML tags with their corresponding XPaths for the different types of data stored in a webpage:

Page title: //title

Meta description: /html/head/meta[@name='description']/@content

H1: //h1

H2: //h2

H3: //h3

Links: //@href

Tables: //table[n]/tbody/*

Bulleted Lists: //ul/li

Numbered Lists: //ol/li

Often HTML pages will use a certain tag multiple times and when you use it with the IMPORTXML function, you can get more data than you need. In constructing the XPath, you can include the tags enclosing the data with their unique attributes so that you can filter the output to the ones you need. 

If you want to include the class attribute of the tag, you add 

[contains(@class, ‘class_value’)]

Where the class_value is the class value attribute. For example, if the data we want to include is inside the following code:

<div class=”toc”>

<ul>

<li>Entry one</li>

<li>Entry two</li>

<li>Entry three</li>

.

.

.

</ul>

</div>

Then the XPath is

//div[contains(@class,'toc')]/ul/li/*

Google Sheets will identify the div tag that contains the attribute class with value “toc”, then go further to the ul and li tags to extract the specific contents for each li tag.

How to Import Data from a Website to Google Sheets

1. Identify the Data to Import from a Webpage

The data in websites come in different formats. They can be in the form of 

  • Table
  • Lists
  • Paragraphs
  • Etc. 

For example, we want to extract the list of spreadsheet software from this Wikipedia page:

Import data from website to google sheets

2. Right-Click the Data then Select Inspect

Once we identified the data, we then right-click the data we want to import then select Inspect. The Inspect option allows you to view the HTML code used to build the webpage. The Inspect option also centers the code to the portion containing the data.

Import website data google sheets‍

The Inspect tab shows that the items are stored in <li> tags. 

3. Assemble the XPath

When you move your cursor through the HTML code of the webpage, you will notice that the highlighted part of the webpage changes. You can use it to identify the topmost tag containing the data you need. For our example, it is the div tag with the class mw-content-ltr mw-parser-output

Import website data to google sheets

Additionally, we just want the name of the software. Fortunately, all of them are hyperlinks.

We can include the <a> tag in the XPath. 

We can then construct the XPath by taking note of the div class and the list tags present. The XPath becomes:

//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li/a

4. Use the Formula =IMPORTXML(URL, xpath)

The formula to use is 

=IMPORTXML(URL, xpath)

Where URL  is the URL of the webpage while xpath is the XPath we use to extract the selected data. We can enclose The URL is 

https://en.wikipedia.org/wiki/List_of_spreadsheet_software

And the XPath is

//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li

The formula is

=IMPORTXML("https://en.wikipedia.org/wiki/List_of_spreadsheet_software","//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li/a")

The result is:

Google sheets import website data

FAQs

What are some Common IMPORTXML function Errors in Google Sheets?

Here are two common errors:

Import Xml content cannot be parsed. It means that the XPath input does not follow the correct syntax; thus Google Sheets cannot import any data from the webpage. Check the formula to make sure there is no typo and that the proper syntax is followed.

Imported content is empty. It can mean two things:

  • The XPath is not a valid HTML element; thus Google Sheets cannot identify any content corresponding to it.
  • There is no content corresponding to the given XPath as interpreted by Google Sheets.

You may need to check the webpage itself to identify the proper XPath.

Can all website data be imported to Google Sheets using IMPORTXML function?

No. There are websites that actively block codes that import their content. This will cause Google Sheets to output errors in the IMPORTXML function when otherwise the XPath should work.

Additionally, there is data that you can only access by logging in to your account. These include important data about your customers, inventory, and sales stored in the online eCommerce and marketing platforms you use. 

These can be accessed by using a built-in API that the platform accesses. It is tricky to set up the access via API in Google Sheets! There is a workaround, however. Try Lido. Unlike Google Sheets, importing data is a built-in feature, and Lido is optimized to import data from different eCommerce and marketing platforms you use such as Hubspot, Salesforce, MySQL databases, etc. 

Try it now for free!

Get Google Sheets productivity and automation tips delivered straight to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We'll email you 1-3 times a week — and never share your information.
Spreadsheet busywork slowing you down?

Schedule a free automation consult and learn how to get back 5+ hours each week!

    Work less, automate more!

    Use Lido to connect your spreadsheets to email, Slack, calendars, and more to automate data transfers and eliminate manual copying and pasting. View all use cases ->