Data Analysis 101: The types of analysis you can conduct
Here we will help you clear up information by displaying the different types of analysis that you can conduct, and under what circumstances you should use them.
16 minute read.
There is a good reason data analysis can be overwhelming: there is a wide range of ways to do it! Which should you choose then? This article will help you clear up things by showing you the different types of analysis that you can do, and when you should use them.
Population - A collection of units being studied. Units can be people, places, objects, epochs, drugs, procedures, or many other things.
Sample - a collection of units from a population.
Variable - a numerical value of a characteristic. Both the parameter and the statistic are variables.
Parameter - A numerical property of a population, such as its mean.
Statistic - A number that can be computed from a data sample. Statistics are used to estimate population parameters.
Descriptive analysis is the simplest of these types of analysis, not only in its purpose but also in its mathematical and statistical methods. It simply aims to describe the basic features of your data, without making any inferences or predictions. Descriptive analysis is a requirement before doing any other analysis, as it helps one to select the proper mathematical or statistical method to apply to the dataset.
The first thing you need to identify in your dataset is the number of variables involved. You can have a dataset involving a single variable, or two or more variables. If you have a single variable, then you can do univariate analysis. If you have two or more variables, you can also have multivariate analysis, which would allow you later to do more sophisticated analyses relating two or more variables. If you have two or more variables, you can conduct univariate analysis individually to each variable.
Once you have sorted out the variables, you can do each of the following (or all four of them), as listed by CampusLabs:
Measures of frequency: shows how often something occurs. The quantities that fall under this category include the frequency, relative frequency, and the cumulative relative frequency. You can visualize frequency using a frequency distribution.
Measures of central tendency: shows the averages of your dataset. These include the mean, median, and mode.
Measures of dispersion or variation: shows how dispersed or diverse the values of the dataset is. These include the range, variance, standard deviation, skewness, and kurtosis.
Measures of position: shows how the values fall in relation to one another. These include the percentile and quartile ranks.
Inferential analysis, on the other hand, seeks to analyze the dataset or datasets, finding the relationship between two or more variables of one or more related datasets or testing hypotheses about the dataset. The dataset in this case is a sample of a larger population, and the conclusions drawn from the sample are inferred to apply to the larger population, but with inherent uncertainty. Inferential analysis is essential when conducting market research, especially when measuring the effects of a campaign on samples
For inferential analysis, the most important step involves the process of sample collection, called sampling, as the sample must represent the population that the sample was taken from as closely as possible. This is important because–most of time–gathering data from the entire population is impossible
Due to the diversity of the possible makeup of samples, a large arsenal of mathematical and statistical methods fall under inferential analysis. Their use falls in two ways:
Parameter estimation involves calculating the sample statistics to estimate the parameters of the population.
Parameter estimates can either be a point estimate or a confidence interval.
A point estimate is a single quantity that is the closest possible value (in terms of the sample given) for the parameter. The point estimate depends on the size of the sample and its quality. For a sufficiently large sample, the point estimate will be equal to the parameter it is representing.
A confidence interval is a range of possible values of the parameter within an acceptable range of error as set by the analyst or researcher and depending on the quality of the sample. The point estimate falls within the confidence interval.
Hypothesis testing involves testing the validity of statements concerning the population by analyzing the samples. Hypothesis testing has four main steps:
The first step involves setting two hypotheses: the null hypothesis, or the claim that there is no relationship between two or more variables, and the alternative hypothesis, or the claim that one variable affects another variable. The hypotheses can be determined from the given bounds of the study. You essentially test which of these two is most likely correct, given the data that you have.
The second step involves setting the level of significance, which is the main criterion used in choosing whether the null hypothesis or the alternative hypothesis holds. The level of significance also sets the acceptable amount of deviation the relevant statistic can have from its parameter counterpart.
In the third step, you calculate the test statistic. The test statistic is the quantity that we will calculate in order to determine which of the two hypotheses is most likely correct. In the fourth step, we then use the test statistic to choose whether the null hypothesis or the alternative hypothesis is correct.
These are the two primary ways inferential statistics is conducted. Their use is quite broad depending on the question at hand.
From the word itself, diagnostic analysis seeks to see what led to a certain event happening. Often, the word “Diagnosis” implies checking for what went wrong–but it doesn’t need to be confined to solely negative events. You can apply diagnostic analysis to unexpected improvements in the performance of your businesses.
While you can use inferential analysis to test possible relationships between two or more variables, diagnostic analysis requires a broader view, actively seeking possible patterns and correlations on the data that will point to potential causes that caused a certain event to occur. According to QuickStart, establishing a set of key performance indicators (KPIs) is crucial in watching the trends to see unusual patterns and anomalies, which can indicate an impending or ongoing event. Click here to learn more about KPIs.
Step 1: identify the anomalies. Certain events and trends may not make sense when you first look at them. You need to diagnose these events to uncover the circumstances that caused them to occur.
Step 2: Drill into the data. Not all data will be useful for diagnostic analysis, but it may not be obvious what data will help illuminate the causes of an anomaly. To do so, the analyst must not only look at the existing datasets but also consider external datasets that describe a similar anomaly and see how they help diagnose the similar anomaly. From here, the relevant data can be identified.
Step 3: Determine causal relationships. Data analysis techniques can then be applied to the relevant data in order to uncover hidden relationships that led to the anomaly. Different techniques such as probability theory, regression analysis, filtering, and time-series data analytics can be used.
The exact technique or techniques to use depends on the given event. One example is data mining. According to Wikipedia, data mining combines methods of machine learning, statistics, and database systems management in order to derive patterns between variables in the dataset. As diagnostic analysis is applied often when an unexpected event or behavior occurs, data mining is crucial in uncovering the previously-unknown patterns that will help explain what led to a certain event to occur.
The mathematical tools and statistical tools can not only be used to analyze the past and present, as with the previous types of analysis, but to also see what can most likely happen in the future. This is where predictive analysis enters the picture. With a sufficient amount of data, it becomes possible to see what may happen in the future.
The following are some of the common uses of predictive analysis:
Predicting customer behavior
Ability to set desirable prices
Customer targeting and segmentation
Enhancing marketing campaigns
One of the techniques used in predictive analysis is regression analysis. Regression analysis is a statistical method for estimating the relationship between two or more variables. One of these variables is called the dependent variable while the other one is the independent variable. It is often considered that the change in the independent variable leads to a change in the dependent variable. The result of regression analysis is a model in the form of an equation.
The most common and simplest type of regression analysis is linear regression. Linear regression is the process of determining and fitting a linear model to describe the relationship between two variables described in the data. Let’s consider the following data:
When you apply linear regression to it, you will get the following conclusion:
For every coin that you have, you get $0.10.
This is an example of a linear model. The linear model relates the increase or decrease in the dependent variable to the increase or decrease in the independent variable. It is called linear because the rate of increase or decrease is fixed.
You can use the linear model to predict the most probable value given an input. For our example above, we do not have data on the value of the money if you have 7 coins. To know the corresponding amount of equivalent money for it, you can use the linear model. The value of money of 7 coins is 7 * $0.10 = $0.70. You can apply it to higher values: if you have 20 coins, then the money that you have is 20 * $0.10 = $2.00.
Prescriptive analysis is the ultimate type of analysis, which seeks to prescribe the best course of action using all the given data and insights available, with consideration to inherent uncertainty present in all data. Prescriptive analysis is often an extension of predictive analysis, as predictions are important in arriving at the best course of action. Prescriptive analysis uses AI, machine learning, pattern recognition, and other advanced tools to analyze the data, find the possible actions, and weigh the consequence of each action, thus giving the user an analysis of the best course of action.
One important point in using prescriptive analysis is that it can be used unethically, even by accident. Depending on the situation at hand, the methods can potentially breach fairness and/or privacy. Valamis cites the example of applying prescriptive analysis to student data. Current methods of prescriptive analysis can already interpret student data and make predictions on their future successes. There are ethical issues surrounding this. Do the students consent to this type of use of their data? Who has access to it?
The solution, Valamis proposes, is a robust data governance strategy and a validation process for the prescriptive models. Predictive Analysis Today defines data governance strategy as a business’ way of defining how data is named, stored, processed, and shared. Crucial in data governance is the set of regulations that govern how data is handled by applications and by staff members. The set of regulations are not only written in paper but has to be enforced by code. The resulting system should make the data easily accessible to all authorized users while not allowing it to leak to unauthorized ones.
Validation is another important process especially for prescriptive models. Validation ensures that the models produce output that is as close as possible to what may happen in reality. It is done by feeding it historical data and then checking the model output with the actual results. If the model output is close to the actual results, it means that the models can be trusted upon to work as expected.
To summarize, here are the five types of analysis that we can do:
Descriptive analysis describes the basic features of your data, without making any inferences nor any predictions.
Inferential analysis makes inferences about the larger population by analyzing a dataset sample and then finding the relationship between two or more variables of one or more related datasets and/or testing hypotheses about the dataset.
Diagnostic analysis seeks to uncover previously-unknown patterns and relationships to see what led to a certain event to happen.
Predictive analysis processes existing data from the past to see what will most likely happen in the future.
Prescriptive analysis aims to prescribe the best course of action using all the given data and insights available, with consideration to inherent uncertainty present in all data.
There are more types of analysis, but the given five types we discussed are what a business must do in order to enhance their growth.
You can skip all the complicated math and statistics stuff involved in all of these analyses by considering our app Lido. Not only does it have integrations with several e-Commerce and marketing platforms, it also has its own analysis tools so that you don’t go through the hassle of doing the long math and stat processes and instead skip to the finish line of actual decision-making. Interested? Sign up for free.
The following served as references in writing this article: