Data Analysis 101: Steps in Conducting Data Analysis

Learn how to conduct data analysis in the most efficient manner. Become acquainted with the various levels to the process of data analysis in order to become a more effective analyst.

14 minute reading time.
Side by side image of a checklist.
Table of Contents
  1. Defining the problem
  2. Collecting the relevant data
  3. Cleaning the data
  4. Applying the relevant analysis method
  5. Visualizing and interpreting the results
  6. Last piece of advice
  7. References

How do we actually conduct data analysis? 

We learned a lot in our previous articles:

Data Analysis 101: Why should you do data analytics and data analysis? 

We learned what data analytics and analysis are, what systematic data analysis can bring to businesses, and how data analysis can be applied to improve the growth of your business.

Data Analysis 101: Types of Data You Encounter in Data Analysis 

We learned the different types of data that you can encounter in data analysis, and the corresponding methods that you can apply to extract new insights from it.

Data Analysis 101: Types of Analysis You Can Conduct 

We learned the five main types of analysis that you can conduct that can help you in your decision-making for your business.

Big Sky Associates lists down three things you need to consider while conducting data analysis: 

You need to make sure that these three considerations are properly addressed throughout the process. In this article, we will now finally learn how to do so. Here are five main steps in conducting data analysis:

We will discuss them one-by-one here.

Defining the problem

Defining the problem is like building the pyramid. You should start from the bottom, building the foundation, then work upwards.
Defining the problem is like building the pyramid. You should start from the bottom, building the foundation, then work upwards.

You are now dealing with data that arrive continuously in huge amounts and would require elaborate processes to be sorted. Therefore, expect that you will have to spend a significant amount of time sorting the data before applying algorithms to it. In fact, preparing the data before applying analysis takes the most time.

It is therefore important that you define your question well, as a well-defined question will point you to the right data that you need to analyze in order to answer it. Harvard Business Review elaborates on the process of defining the question that will guide your analysis. There are four steps in defining the problem, and we include a summary of the questions you need to answer as you define the problem:

  1. Establish the need for a solution - the first thing you should look for is whether there is a problem that has a need for a solution. 
  1. Justify the need - you should next check whether you should solve this existing problem.
  1. Contextualize the problem - you should consider whether the same problem existed in the past and whether it was already solved. If not, you should look at the attempts to solve it. If the problem is industry-wide, look for the reasons why it was not fully addressed before. 

There is one thing you should remember when contextualizing the problem: you should also contextualize the solution and the failed attempts at coming up with it. It is possible that the existing solutions won’t work in your context, but the failed attempts would.

  1. Write the problem statement - after grasping the need, the justification, and the context, it is now time for you to frame your problem. Consider the following questions:

This is a lengthy guide in defining the problem, but the most crucial part of analysis is defining the problem to solve. A properly-defined problem will make the problem solvable, but an improperly-defined one will make you chase dead ends, tiring your mind and straining your eyes. 

Collecting the relevant data

There are several sources of data that you can use in data analysis. Collect as many as you can while avoiding sacrificing their relevance to your problem.
There are several sources of data that you can use in data analysis. Collect as many as you can while avoiding sacrificing their relevance to your problem.

Now that you have framed the problem, it is now possible to determine the data relevant to the problem, and then to collect them. There are several sources of relevant data. The first one you should look for are the existing internal data in your databases. They can include transaction records and records of metrics and key performance indicators (KPIs). They are already available so you can immediately start your analysis. They are also well-structured; thus, can be easily processed by conventional methods and algorithms.

If the internal data is not sufficient, then there are open sources of data over the internet. The open sources of data are often hosted by international institutions, governments, and universities. They are also well-structured, and can provide an overview of your industry and your market. Firms that regularly conduct market research also allow free access to their reports through a free subscription model; they offer up-to-date information and data for analysis. 

Next are the data repositories that are not fully open-source and require subscription. You should subscribe, if necessary, in order to frequently access market data from external sources. One example is Statista. Statista offers free access to diverse market data but also requires paid subscription for access to its entire repository. 

If existing data is not enough, you should then consider collecting new data yourself through market research. There are several ways to do so. You do not have to cover all of them; they are conducted depending on the type of data you need. In fact, certain problems require you to collect new data in the first place. Some of these methods include the following:

Surveys and questionnaires - the best way to gather customer data is by asking them directly. Surveys and questionnaires help you gather crucial information by allowing you to be exploratory–explore what is still unknown to gather new data. Surveys and questionnaires can be delivered face-to-face, by mail, telephone, or internet.

Surveys and questionnaires, however, are not generally preferred by the customers, as surveys and questionnaires can take a lot of time and can be perceived as quite a hassle to complete. They therefore have a low response rate, delay in response, and the possibility of ambiguous or missing answers. To make your surveys more effective, here are some tips, as suggested by Fulcrum:

Interviews - one of the main problems with surveys and questionnaires is that you cannot verify ambiguous answers and cannot follow up on missing ones. Interviews with a selected group of customers can address this problem because it gives you freedom to clarify ambiguous answers and add impromptu questions to further elaborate on interesting answers. Interviews allow you to drill deep into customers’ perspectives and gain valuable insights.

The main downside of interviews is that it takes a long time to conduct: from the process of looking for willing interviewees to actual interviews, it can take weeks or months. Additionally, not all interviewees are willing to answer all the questions you may ask them, which can be a bit frustrating. The most important thing you shouldn’t forget to do during interviews is to respect your interviewee. This will help loosen some of the restraint they may have during the interview. 

Online analytics - users generate a lot of data even if they only stayed in your online store for a few seconds. Make sure that you have set up tools for tracking their interactions with your online store, ads, and even marketing campaigns. They tend to generate a huge amount of data that you can find useful in your analysis. 

The new Lido app allows you to collect these analytics through integrations with several marketing and e-Commerce services such as Shopify and Facebook Ads. It not only collects data but also condenses them to your chosen KPIs so you can skip the messier steps of data analysis. Get started for free here. 

Social media monitoring - nowadays it is easier to monitor your brand perception due to the ubiquity of social media networks. Billions now use them, and one main use of social media is to talk about brands (positively or negatively). Tools now exist for you to track the posts about your brand and business. These tools crawl social media sites for a certain keyword, group of keywords, or hashtags, and put them in a database where you can do sophisticated analysis.

Beyond the constraints of the problem, you should continuously do social media monitoring in order to watch in real-time how your brand fares in comparison to competition. BigCommerce lists more benefits of real-time social media monitoring:

There are two more things you should remember while collecting data: ethics and privacy. Chances are, the data you collect contain personal information that can negatively affect the lives of your customers if leaked. You should therefore do as much as you can to ensure that the data is collected ethically, and that the data collected is securely stored. If you are unsure, the best way to start is to put yourself in the customers’ perspective: 

How would you feel if your personal data was harvested without your consent? How would you feel if your personal data can easily be accessed by hackers? 

Regulations regarding data ethics and privacy have started to be implemented across the world. Pay attention to it, and if possible, go a step further. This will help improve the perception of your brand and business. 

Cleaning the data

Cleaning the data has several steps and takes a significant amount of time. Data cleaning, however, ensures that the data can be easily processed by data analysis algorithms and will serve to minimize errors due to ambiguity.
Cleaning the data has several steps and takes a significant amount of time. Data cleaning, however, ensures that the data can be easily processed by data analysis algorithms and will serve to minimize errors due to ambiguity.

You would first encounter a combination of structure and unstructured data. Structured data is easy to process as it is highly organized and stored systematically in databases. It is, therefore, easy to process by automated algorithms. Unstructured data is hard to process as conventional algorithms of data processing cannot be applied to it. While unstructured data requires sophisticated algorithms to process, unstructured data offer the biggest and deepest amount of information possible, as they give you deep insights useful for your business.

To be able to use both structured and unstructured data, you need to clean them first. This process is called data munging or data wrangling. 

As this process takes the longest time, one way to do it is to preprocess the data at its source before you collect it. This is now possible with several services; they store data using a set of standards that are specified in their documentation.

Since they come from different sources, the data arriving may not match in terms of standards and conventions used. For a simple example, the prices listed in records from source A may use a comma to mark the thousands marks while the records from source B may use a period to do the same thing. When this happens, you need to decide which standard and/or convention to use. 

These standards and conventions can range from the decimal point/comma differences to file formats and data encoding. You should keep in mind which analysis software or algorithms you will use, as they can only accept certain standards and may not work when you use a different one. 

There are six steps in data cleaning, as listed by Trifacta:

  1. Discovering: this process involves understanding the data that you are about to process. To help you understand the data, you look at their source and the context in which they are created. 
  2. Structuring: this process organizes the data to prepare them for easier analysis. 
  3. Cleaning: this process irons out possible errors and outliers. The format of the data is standardized.
  4. Enriching: this process considers whether new data or information can already be derived from the existing data set and identifies them. 
  5. Validating: this process cross-checks the dataset for data consistency, quality, and security. This is important in order to recheck the data for missed inconsistencies. 
  6. Publishing:  this process prepares the data for use in analysis. The requirements of the analysis software that you will use should guide this process. 

If you regularly get the data from the same set of sources, then you can implement algorithms to automate this step. You can learn more about data munging here. 

Applying the relevant analysis method

You may end up doing all of these five methods of analysis to solve the problem at hand.
You may end up doing all of these five methods of analysis to solve the problem at hand.

What analysis method should you use? It depends on the problem at hand. 

We have already discussed some of these methods in our previous article:

  1. Descriptive analysis aims to describe the basic features of your data, without making any inferences nor any predictions.
  1. Inferential analysis makes inferences about the larger population by analyzing a dataset sample and then finding the relationship between two or more variables of one or more related datasets and/or testing hypotheses about the dataset.
  1. Diagnostic analysis seeks to uncover previously-unknown patterns and relationships to see what led to a certain event happening. 
  1. Predictive analysis processes existing data from the past to see what will most likely happen in the future.
  1. Prescriptive analysis aims to prescribe the best course of action using all the given data and insights available, with consideration to inherent uncertainty present in all data.

There are more methods of analysis. Our next article focuses on them.

Visualizing and interpreting the results

A careful interpretation of results should be done in order to derive the best recommendations from it.
A careful interpretation of results should be done in order to derive the best recommendations from it.

Whatever the analysis method you use, you would get a set of data as results and a set of data visualized in the form of a chart.

In interpreting our results, you should go back to your problem. What is the problem? How does your analysis fit into solving that problem? Did your results answer the questions surrounding the problem?  Using the results, you should be able to make conclusions and recommendations regarding the problem defined.

Here are the steps in interpreting data:

  1. Assemble the information you need - the relevant analysis methods you applied to the data will generate a set of new results and information. Some of them will generate visualizations of the data that can help in analysis.
  2. Develop findings - from the new results and information, you can make observations that summarize the important points. 
  3. Develop conclusions - the important observations from the results can be used to answer the questions defined by the problem statement.
  4. Develop recommendations - from the conclusions, a course of action can be recommended or prescribed.

Data visualization in the form of charts and graphs helps in interpreting and understanding the results. What chart type should we use?  HubSpot outlines the basic uses of charts in the form of questions that will also help you determine the right chart type for your data:

  1. Do you want to compare values?
  2. Do you want to show the composition of something?
  3. Do you want to understand the distribution of your data?
  4. Are you interested in analyzing trends in your data set?
  5. Do you want to better understand the relationship between value sets?

Here are some of the chart types available in Google Sheets:

There are more types of charts not listed here; they are the ones you can use in Google Sheets. You can start learning how to visualize results in Google Sheets through our tutorials:

The Ultimate Starter Guide to Charts In Google Sheets

Selecting the Correct Chart Type in Google Sheets [Ultimate Guide]

Last piece of advice

To summarize, we have five steps in conducting analysis:

  1. Defining the problem - we define the problem by establishing and justifying the need for solution, contextualizing the problem, and writing the problem statement. 
  2. Collecting the relevant data - using the problem statement that we have, we collect the relevant data by accessing internal and external sources and by collecting new data relevant to the problem.  
  3. Cleaning the data - we mung or wrangle the data so that it will follow a uniform format, making it easier to process using algorithms. 
  4. Applying the relevant analysis method - we finally apply the chosen methods to get the results needed to solve the problem. 
  5. Visualizing and interpreting the results - we visualize not just to make it easier to present the data but to also further analyze the results. We interpret the results using the framework established by the problem statement to solve it. 

I have a last piece of advice. Make sure that you have your process of data analysis well-documented. As you notice in our first step, defining the problem involves looking at past attempts to solve it if it already appeared in the past. Well-documented failures in solving the problem are useful in narrowing down possible solutions. Additionally, you should also take note of the methods you use, as you may use them again in the future.

I hope you learned a lot from this article. We aren’t done yet! We still have a lot to learn about data analysis. Check our blog for the next article!


The Data Analysis Process: 5 Steps To Better Decision Making

7 Fundamental Steps to Complete a Data Analytics Project It's hard to know where to start

The 7-step Business Analytics Process

5 Steps of the Data Analysis Process

Structured vs Unstructured Data – What's the Difference?

The Data Analysis Process Step by Step

What is Data Analysis? Process, Methods, and Types Explained

Are You Solving the Right Problem?

Definition of Big Data

Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says 

8 Simple Data Collection Techniques for Businesses

Data Collection Methods and Why They Are Critical for Business

Data Collection Is The Heart Of Every Business, Even For Startups

What is a questionnaire - Definition, samples and examples

What is social media monitoring?

Data Collection Techniques | Methods of Collecting Data

Data Collection | A Step-by-Step Guide with Methods and Examples

What Are the Methods of Data Collection? | How to Collect Data

Data Collection Methods: Definition, Examples and Sources

Data Collection Methods

What is Data Wrangling?

Data Visualization 101: How to Choose the Right Chart or Graph for Your Data

Data Interpretation Page1 Data Interpretation Geoff Dates, River Watch Network Jerry Schoen, Massachusetts Water Watch Partn