In this article:

Data Analysis 101: Steps in Conducting Data Analysis

How do we actually conduct data analysis? 

We learned a lot in our previous articles:

Data Analysis 101: Why should you do data analytics and data analysis? 

We learned what data analytics and analysis are, what systematic data analysis can bring to businesses, and how data analysis can be applied to improve the growth of your business.

Data Analysis 101: Types of Data You Encounter in Data Analysis 

We learned the different types of data that you can encounter in data analysis, and the corresponding methods that you can apply to extract new insights from it.

Data Analysis 101: Types of Analysis You Can Conduct 

We learned the five main types of analysis that you can conduct that can help you in your decision-making for your business.

Big Sky Associates lists down three things you need to consider while conducting data analysis: 

  • You need to know it is the right data for answering your question
  • You need to draw accurate conclusions from that data
  • You need data that informs your decision making process

You need to make sure that these three considerations are properly addressed throughout the process. In this article, we will now finally learn how to do so. Here are five main steps in conducting data analysis:


We will discuss them one-by-one here.

Defining the problem

Defining the problem is like building the pyramid. You should start from the bottom, building the foundation, then work upwards.
Defining the problem is like building the pyramid. You should start from the bottom, building the foundation, then work upwards.


You are now dealing with data that arrive continuously in huge amounts and would require elaborate processes to be sorted. Therefore, expect that you will have to spend a significant amount of time sorting the data before applying algorithms to it. In fact, preparing the data before applying analysis takes the most time.

It is therefore important that you define your question well, as a well-defined question will point you to the right data that you need to analyze in order to answer it. Harvard Business Review elaborates on the process of defining the question that will guide your analysis. There are four steps in defining the problem, and we include a summary of the questions you need to answer as you define the problem:

  1. Establish the need for a solution - the first thing you should look for is whether there is a problem that has a need for a solution. 
  • What is the basic need? Focus on the need that’s at the heart of the problem. Define the scope of the problem. 
  • What is the desired outcome? Take the perspectives of the stakeholders, especially customers and beneficiaries. Express this as qualitatively and quantitatively as possible; avoid favoring an approach or solution.
  • Who stands to benefit and why? Identify those who will benefit from the solution to this problem. This will help shape how you frame the problem and how you can deliver the solution to its beneficiaries
  1. Justify the need - you should next check whether you should solve this existing problem.
  • Is the effort aligned with our strategy? Your business should have its own strategic goals so you can foster growth and do so in the most efficient manner. If solving a problem does not further your strategic goals, you should reconsider whether the current scope is not sufficient or whether you should undertake the effort in the first place. 
  • What are the desired benefits for the company, and how will we measure them? Depending on the problem, it can manifest in the form of increased revenue, increased efficiency, or reduced losses. These are also considered in framing the problem and the solution. 
  • How will we ensure that a solution is implemented? An existing group or division can implement the solution or an ad-hoc group can be assembled to be in charge of implementation. Whatever the case, someone who understands both the problem and the solution should lead the implementation. 
  1. Contextualize the problem - you should consider whether the same problem existed in the past and whether it was already solved. If not, you should look at the attempts to solve it. If the problem is industry-wide, look for the reasons why it was not fully addressed before. 
  • What approaches have we tried? If the same problem persisted after several attempts by your business to fix it, the failures will serve as guides for you to not repeat the same mistakes but to instead look for novel approaches.
  • What have others tried? If the same problem is prevalent across your industry or niche, prioritize looking for the cases whether the problem was already solved by others. If no such case exists, look for the attempts in solving the problem, and take note of them. 
  • What are the internal and external constraints on implementing a solution? Whether a solution already existed or not, you can gain insights on how possible it is to implement the solution in your business. This can include constraints on the resources, company culture, or even legal constraints. 

There is one thing you should remember when contextualizing the problem: you should also contextualize the solution and the failed attempts at coming up with it. It is possible that the existing solutions won’t work in your context, but the failed attempts would.

  1. Write the problem statement - after grasping the need, the justification, and the context, it is now time for you to frame your problem. Consider the following questions:
  • Is the problem actually a combination of problems? It is rare for major problems to be simple and straightforward, and the solutions may also be complex and multi-faceted. Thoroughly investigate the problem to see if it can be divided into several sub-problems that can be tackled by different divisions or groups of your business. 
  • What requirements must a solution meet? The context of the problem will guide you to determine the requirements that a solution must meet. Often this includes the cost of the solution, but other factors as well.
  • Which problem solvers should we engage? Can the in-house experts tackle the problem? Or might it need the help of external consultants?
  • What information and language should the problem statement include? If you decide to solicit the help of external consultants, you should frame the problem statement such that it will be able to solicit the widest variety of solutions possible. In fact, it can help if you rephrase the problem as a purely technical or scientific one. 
  • What do solvers need to submit? You need to provide enough details of the requirements that they need to satisfy in order for their proposal to qualify. These requirements range from the technical aspects to documentation. 
  • What incentives do solvers need? Look at the market to see the compensation the solvers receive. Try to provide that, or exceed it. Do the same thing if your in-house experts will solve the problem. They will feel appreciated, and will stay in your company longer. 
  • How will solutions be evaluated and success measured? You can use your initial analysis of the problem to define how you would evaluate the solution and how the success will be measured. 

This is a lengthy guide in defining the problem, but the most crucial part of analysis is defining the problem to solve. A properly-defined problem will make the problem solvable, but an improperly-defined one will make you chase dead ends, tiring your mind and straining your eyes. 

Collecting the relevant data

There are several sources of data that you can use in data analysis. Collect as many as you can while avoiding sacrificing their relevance to your problem.
There are several sources of data that you can use in data analysis. Collect as many as you can while avoiding sacrificing their relevance to your problem.


Now that you have framed the problem, it is now possible to determine the data relevant to the problem, and then to collect them. There are several sources of relevant data. The first one you should look for are the existing internal data in your databases. They can include transaction records and records of metrics and key performance indicators (KPIs). They are already available so you can immediately start your analysis. They are also well-structured; thus, can be easily processed by conventional methods and algorithms.

If the internal data is not sufficient, then there are open sources of data over the internet. The open sources of data are often hosted by international institutions, governments, and universities. They are also well-structured, and can provide an overview of your industry and your market. Firms that regularly conduct market research also allow free access to their reports through a free subscription model; they offer up-to-date information and data for analysis. 

Next are the data repositories that are not fully open-source and require subscription. You should subscribe, if necessary, in order to frequently access market data from external sources. One example is Statista. Statista offers free access to diverse market data but also requires paid subscription for access to its entire repository. 

If existing data is not enough, you should then consider collecting new data yourself through market research. There are several ways to do so. You do not have to cover all of them; they are conducted depending on the type of data you need. In fact, certain problems require you to collect new data in the first place. Some of these methods include the following:

Surveys and questionnaires - the best way to gather customer data is by asking them directly. Surveys and questionnaires help you gather crucial information by allowing you to be exploratory–explore what is still unknown to gather new data. Surveys and questionnaires can be delivered face-to-face, by mail, telephone, or internet.

Surveys and questionnaires, however, are not generally preferred by the customers, as surveys and questionnaires can take a lot of time and can be perceived as quite a hassle to complete. They therefore have a low response rate, delay in response, and the possibility of ambiguous or missing answers. To make your surveys more effective, here are some tips, as suggested by Fulcrum:

  • Keep it short and simple
  • Include an introduction with basic directions
  • List questions in a logical sequence
  • Avoid jargon and complex language
  • Provide adequate space for answers

Interviews - one of the main problems with surveys and questionnaires is that you cannot verify ambiguous answers and cannot follow up on missing ones. Interviews with a selected group of customers can address this problem because it gives you freedom to clarify ambiguous answers and add impromptu questions to further elaborate on interesting answers. Interviews allow you to drill deep into customers’ perspectives and gain valuable insights.

The main downside of interviews is that it takes a long time to conduct: from the process of looking for willing interviewees to actual interviews, it can take weeks or months. Additionally, not all interviewees are willing to answer all the questions you may ask them, which can be a bit frustrating. The most important thing you shouldn’t forget to do during interviews is to respect your interviewee. This will help loosen some of the restraint they may have during the interview. 

Online analytics - users generate a lot of data even if they only stayed in your online store for a few seconds. Make sure that you have set up tools for tracking their interactions with your online store, ads, and even marketing campaigns. They tend to generate a huge amount of data that you can find useful in your analysis. 

The new Lido app allows you to collect these analytics through integrations with several marketing and e-Commerce services such as Shopify and Facebook Ads. It not only collects data but also condenses them to your chosen KPIs so you can skip the messier steps of data analysis. Get started for free here. 

Social media monitoring - nowadays it is easier to monitor your brand perception due to the ubiquity of social media networks. Billions now use them, and one main use of social media is to talk about brands (positively or negatively). Tools now exist for you to track the posts about your brand and business. These tools crawl social media sites for a certain keyword, group of keywords, or hashtags, and put them in a database where you can do sophisticated analysis.

Beyond the constraints of the problem, you should continuously do social media monitoring in order to watch in real-time how your brand fares in comparison to competition. BigCommerce lists more benefits of real-time social media monitoring:

  • React in real time with consumers on social media platforms
  • Determine how select demographics feel about your brand
  • Use positive feedback in marketing, etc.
  • Use negative feedback to correct errors in your business
  • Build brand credibility and authenticity
  • Refine marketing spending by eliminating channels with the lowest or worst engagement levels
  • See which social media marketing campaigns are performing the best and the worst
  • Calculate return on investment through advanced reporting capabilities

There are two more things you should remember while collecting data: ethics and privacy. Chances are, the data you collect contain personal information that can negatively affect the lives of your customers if leaked. You should therefore do as much as you can to ensure that the data is collected ethically, and that the data collected is securely stored. If you are unsure, the best way to start is to put yourself in the customers’ perspective: 

How would you feel if your personal data was harvested without your consent? How would you feel if your personal data can easily be accessed by hackers? 

Regulations regarding data ethics and privacy have started to be implemented across the world. Pay attention to it, and if possible, go a step further. This will help improve the perception of your brand and business. 

Cleaning the data


Cleaning the data has several steps and takes a significant amount of time. Data cleaning, however, ensures that the data can be easily processed by data analysis algorithms and will serve to minimize errors due to ambiguity.
Cleaning the data has several steps and takes a significant amount of time. Data cleaning, however, ensures that the data can be easily processed by data analysis algorithms and will serve to minimize errors due to ambiguity.


You would first encounter a combination of structure and unstructured data. Structured data is easy to process as it is highly organized and stored systematically in databases. It is, therefore, easy to process by automated algorithms. Unstructured data is hard to process as conventional algorithms of data processing cannot be applied to it. While unstructured data requires sophisticated algorithms to process, unstructured data offer the biggest and deepest amount of information possible, as they give you deep insights useful for your business.

To be able to use both structured and unstructured data, you need to clean them first. This process is called data munging or data wrangling. 

As this process takes the longest time, one way to do it is to preprocess the data at its source before you collect it. This is now possible with several services; they store data using a set of standards that are specified in their documentation.

Since they come from different sources, the data arriving may not match in terms of standards and conventions used. For a simple example, the prices listed in records from source A may use a comma to mark the thousands marks while the records from source B may use a period to do the same thing. When this happens, you need to decide which standard and/or convention to use. 

These standards and conventions can range from the decimal point/comma differences to file formats and data encoding. You should keep in mind which analysis software or algorithms you will use, as they can only accept certain standards and may not work when you use a different one. 

There are six steps in data cleaning, as listed by Trifacta:

  1. Discovering: this process involves understanding the data that you are about to process. To help you understand the data, you look at their source and the context in which they are created. 
  2. Structuring: this process organizes the data to prepare them for easier analysis. 
  3. Cleaning: this process irons out possible errors and outliers. The format of the data is standardized.
  4. Enriching: this process considers whether new data or information can already be derived from the existing data set and identifies them. 
  5. Validating: this process cross-checks the dataset for data consistency, quality, and security. This is important in order to recheck the data for missed inconsistencies. 
  6. Publishing:  this process prepares the data for use in analysis. The requirements of the analysis software that you will use should guide this process. 

If you regularly get the data from the same set of sources, then you can implement algorithms to automate this step. You can learn more about data munging here. 

Applying the relevant analysis method

You may end up doing all of these five methods of analysis to solve the problem at hand.
You may end up doing all of these five methods of analysis to solve the problem at hand.


What analysis method should you use? It depends on the problem at hand. 

We have already discussed some of these methods in our previous article:

  1. Descriptive analysis aims to describe the basic features of your data, without making any inferences nor any predictions.
  • Measures of frequency: shows how often something occurs. The quantities that fall under this category include the frequency, relative frequency, and the cumulative relative frequency. You can visualize frequency using a frequency distribution. 
  • Measures of central tendency: shows the averages of your dataset. These include the mean, median, and mode. 
  • Measures of dispersion or variation:  shows how dispersed or diverse the values of the dataset is. These include the range, variance, standard deviation, skewness, and kurtosis. 
  • Measures of position: shows how the values fall in relation to one another. These include the percentile and quartile ranks. 
  1. Inferential analysis makes inferences about the larger population by analyzing a dataset sample and then finding the relationship between two or more variables of one or more related datasets and/or testing hypotheses about the dataset.
  • Parameter estimation involves calculating the sample statistics to estimate the parameters of the population.  Parameter estimates can either be a point estimate or a confidence interval. 
  • Hypothesis testing involves testing the validity of statements concerning the population by analyzing the samples. 
  1. Diagnostic analysis seeks to uncover previously-unknown patterns and relationships to see what led to a certain event happening. 
  • Data mining combines methods of machine learning, statistics, and database systems management in order to uncover previously-unknown patterns that will help explain what led to a certain event to occur. 
  1. Predictive analysis processes existing data from the past to see what will most likely happen in the future.
  • Regression analysis is a mathematical method for estimating the relationship between two or more variables. The result of regression analysis is a model in the form of an equation. 
  1. Prescriptive analysis aims to prescribe the best course of action using all the given data and insights available, with consideration to inherent uncertainty present in all data.
  • Prescriptive analysis uses AI, machine learning, pattern recognition, and other advanced tools to analyze the data, find the possible actions, and weigh the consequence of each action, thus giving the user an analysis of the best course of action. 

There are more methods of analysis. Our next article focuses on them.

Visualizing and interpreting the results

A careful interpretation of results should be done in order to derive the best recommendations from it.
A careful interpretation of results should be done in order to derive the best recommendations from it.

Whatever the analysis method you use, you would get a set of data as results and a set of data visualized in the form of a chart.

In interpreting our results, you should go back to your problem. What is the problem? How does your analysis fit into solving that problem? Did your results answer the questions surrounding the problem?  Using the results, you should be able to make conclusions and recommendations regarding the problem defined.

Here are the steps in interpreting data:

  1. Assemble the information you need - the relevant analysis methods you applied to the data will generate a set of new results and information. Some of them will generate visualizations of the data that can help in analysis.
  2. Develop findings - from the new results and information, you can make observations that summarize the important points. 
  3. Develop conclusions - the important observations from the results can be used to answer the questions defined by the problem statement.
  4. Develop recommendations - from the conclusions, a course of action can be recommended or prescribed.

Data visualization in the form of charts and graphs helps in interpreting and understanding the results. What chart type should we use?  HubSpot outlines the basic uses of charts in the form of questions that will also help you determine the right chart type for your data:

  1. Do you want to compare values?
  2. Do you want to show the composition of something?
  3. Do you want to understand the distribution of your data?
  4. Are you interested in analyzing trends in your data set?
  5. Do you want to better understand the relationship between value sets?

Here are some of the chart types available in Google Sheets:

  • Line charts are used to visualize changes in the value of a metric over time. 
  • Stacked area charts are used to visualize the changes in the contribution of various sources to a certain quantity or metric over time.
  • Column charts and bar charts are used to compare the values of a certain metric over different items. They only differ in the orientation of the boxes: column charts have vertical boxes while bar charts have horizontal boxes. 
  • Pie charts are best used to represent the composition of a single item. 
  • Scatter charts are used to show the relationship between two variables with a given distribution of data.
  • Waterfall charts are used to visualize the cumulative effects of various sources or factors to a specific metric or variable.
  • Histogram charts are used to visualize the distribution of frequency of certain values in a sample data. 
  • Radar charts are used to visualize the values of different variables on a certain thing.

There are more types of charts not listed here; they are the ones you can use in Google Sheets. You can start learning how to visualize results in Google Sheets through our tutorials:

The Ultimate Starter Guide to Charts In Google Sheets

Selecting the Correct Chart Type in Google Sheets [Ultimate Guide]

Last piece of advice

To summarize, we have five steps in conducting analysis:

  1. Defining the problem - we define the problem by establishing and justifying the need for solution, contextualizing the problem, and writing the problem statement. 
  2. Collecting the relevant data - using the problem statement that we have, we collect the relevant data by accessing internal and external sources and by collecting new data relevant to the problem.  
  3. Cleaning the data - we mung or wrangle the data so that it will follow a uniform format, making it easier to process using algorithms. 
  4. Applying the relevant analysis method - we finally apply the chosen methods to get the results needed to solve the problem. 
  5. Visualizing and interpreting the results - we visualize not just to make it easier to present the data but to also further analyze the results. We interpret the results using the framework established by the problem statement to solve it. 

I have a last piece of advice. Make sure that you have your process of data analysis well-documented. As you notice in our first step, defining the problem involves looking at past attempts to solve it if it already appeared in the past. Well-documented failures in solving the problem are useful in narrowing down possible solutions. Additionally, you should also take note of the methods you use, as you may use them again in the future.

I hope you learned a lot from this article. We aren’t done yet! We still have a lot to learn about data analysis. Check our blog for the next article!

References

The Data Analysis Process: 5 Steps To Better Decision Making

7 Fundamental Steps to Complete a Data Analytics Project It's hard to know where to start

The 7-step Business Analytics Process

5 Steps of the Data Analysis Process

Structured vs Unstructured Data – What's the Difference?

The Data Analysis Process Step by Step

What is Data Analysis? Process, Methods, and Types Explained

Are You Solving the Right Problem?

Definition of Big Data

Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says 

8 Simple Data Collection Techniques for Businesses

Data Collection Methods and Why They Are Critical for Business

Data Collection Is The Heart Of Every Business, Even For Startups

What is a questionnaire - Definition, samples and examples

What is social media monitoring?

Data Collection Techniques | Methods of Collecting Data

Data Collection | A Step-by-Step Guide with Methods and Examples

What Are the Methods of Data Collection? | How to Collect Data

Data Collection Methods: Definition, Examples and Sources

Data Collection Methods

What is Data Wrangling?

Data Visualization 101: How to Choose the Right Chart or Graph for Your Data

Data Interpretation Page1 Data Interpretation Geoff Dates, River Watch Network Jerry Schoen, Massachusetts Water Watch Partn

Work less, automate more!

Use Lido to connect your spreadsheets to email, Slack, calendars, and more to automate data transfers and eliminate manual copying and pasting. View all use cases ->