In my previous article, I walked you through everything statistics-related. This article, on the other hand, will provide you with a better understanding of descriptive and inferential statistics in particular.
Descriptive statistics describe the essential aspects of a study’s information. They provide concise summaries of the sample and metrics. Descriptive statistics, as the name implies, are mostly descriptive. They don’t include extrapolating beyond the current set of facts.
One example would be looking at a list of different presidents who governed a specific country at different periods or years. In this case, the descriptive statistic would indicate which president governed which country in which year.
Descriptive statistics convey quantitative data in a logical and understandable manner. We might have a set of methods in a research project. Alternatively, we can evaluate a large population using any method. Descriptive statistics help us make sense of massive amounts of data. Each descriptive statistic condenses a large amount of information into a concise summary. When attempting to explain a large number of observations with a single indicator, you run the danger of manipulating the data or losing vital information. Despite these drawbacks, descriptive statistics provide a useful summary that help in making comparisons between individuals or other units.
Measures of Descriptive Statistics
Descriptive statistics are divided into four categories.
The different measures are as follows:
The distribution sums the frequency of specific values for a variable. The most basic distribution would give each variable’s value as well as the number of people who had that value. This demonstrates how frequently something happens. A table or a graph generally represents frequency distributions. Percentages also help in depicting distributions.
Percentages help in discussing the following:
- Proportion of individuals in various economic brackets
- Percentage of individuals in certain age groups
- Proportion of persons in various standardized test score ranges
A distribution’s central tendency is an estimate of the “center” of a set of values.
Estimates of central tendency can be divided into three categories:
The most widely used way of describing central tendency is the mean or average. To find the mean, just add all of the values together and divide it by the number of values. The mean or average quiz score, for example, is calculated by adding all of the results and dividing by the number of students who took the test.
Let’s take a look at a different case. Assume you want to keep a monthly record of your school/college attendance. You would add up how many days you were present during the month and divide that number by the total number of days you are required to be at school/college during the month. This will allow you to calculate the percentage of time you spend in class over the course of a month. To track my attendance for the month of August, I’ll divide the total number of days I’ve been there (20) by the total number of days I’m required to be present (26) and multiply by 100. My attendance rate would be 76.92%.
In a sorted list of numbers, the median is the number in the center. To find the median value of a set of data, sort them in value order from the lowest to the highest or the highest to the lowest. Although the median helps in calculating an average or mean, it should not be misidentified with the real mean.
In a data set of 1,5,7,8,2,3,4, for example, the sorted order becomes 1,2,3,4,5,7,8. The median is the middle number (1,2,3,4,5,7,8), which is 4 in this case because there are three numbers on each side.
We can’t always forecast the median cognitively, though. That is why there exists a formula for it. After you’ve counted the total number of values, you may move on to the next step. Depending on whether the total number is even or odd, use the formula provided below. The median formula is (n + 1) 2th, where “n” refers to the total number of elements in the collection and “th” simply refers to the (n)th number.
The most common number, or mode, is the number that appears the most frequently. The mode of the numbers 1,2,5,5,5,5,7,8 is 5 since it appears four times, which is more than any other number.
Dispersion is the spread of data around the central tendency. The range and the standard deviation are two typical measurements of dispersion. Subtracting the highest number with the lowest results in the calculation of Range. A standard deviation is a statistic that determines how far a dataset deviates from its mean. The standard deviation is calculated as the square root of variance by computing each data point’s divergence from the mean. If the data points are farther from the mean, there is more variation within the data collection. As a result, the larger the standard deviation, the more dispersed the data is.
Formula for standard deviation method:
- xi = The value of i(th) point in the data set
- x̅ = The data set’s average value
- n = The amount of data points in the set as a whole
Measures of Position uses percentile and quartile ranks. It specifies how the scores are related to one another. It helps to compare scores to a standardized score and is also based on standardized scores.
A percentile is a metric that represents the percentage of total values that are equal to or less than that measure. Quartiles are numbers that split a data table into four groups, each with roughly the same amount of observations.
The following is the formula for the calculation of the f-value of each value in a data table:
- i = The value’s index
- n = The number of values
Descriptive and Inferential statistics both are different but are both important types. Inferential statistics utilize your sample to make plausible estimates about the wider population, whereas descriptive statistics merely summarizes the features of a sample.
It’s critical to adopt random and unbiased sampling procedures when working with inferential statistics. You can’t draw meaningful statistical judgments if your sample isn’t representative of your population.
Statistics and parameters are numbers that describe the characteristics of samples and populations. A statistic is a metric that describes a group of people (e.g., sample mean). A parameter is a metric that helps in characterizing the entire population (e.g., population mean).
Since the sample size is always less than the population size, a part of the population goes unnoticed by sample data. The difference between the true population values (called parameters) and the measured sample values is sampling error (called statistics).
There are two sorts of population estimates that you can create:
- Point estimates: A point estimate is a single value estimate of a parameter. A sample mean, for example, is a point estimate of a population mean.
- Interval estimates: An interval estimate provides you a range of possible values for the parameter. The most frequent form of interval estimator is a confidence interval.
A confidence interval calculates an interval estimate for a parameter based on the variability around a statistic. Since they account for sample error, confidence intervals are useful for estimating parameters. It indicates the degree of uncertainty in the point estimate. When utilized in sync, they produce the finest results. A confidence level is assigned to each confidence interval. If you repeat the research, the confidence level informs you of the likelihood (in percentage) of the interval having the parameter estimate.
With a 95% confidence interval, you can anticipate your estimate to fall inside the given range of values i.e., if you perform your study 100 times with a new sample in precisely the same way 95 times. You should also remember that the real value of a population parameter cannot be determined without data from the entire population. However, with the help of random sampling and a large sample size, you can anticipate your confidence interval to contain the parameter a given proportion of the time.
Hypothesis testing is a detailed assessment analytical procedure that employs inferential statistics. The objective of hypothesis testing is to use samples to compare populations or evaluate connections between variables. Statistical tests help in testing hypotheses or forecasts. They also quantify sample errors, allowing for accurate conclusions.
Statistical tests can be:
Parametric tests are generally more statistically strong.
Parametric tests make the following assumptions:
- The population from which the sample was taken has a normal distribution of scores.
- The sample size is sufficient to accurately reflect the population.
- Each group being compared has comparable variances, which is a measure of dispersion.
Non-parametric tests are more appropriate when your data contradicts any of these assumptions. Since they make no assumptions about the distribution of the population data, another name for non-parametric tests is “distribution-free tests.”
There are three types of statistical tests:
- Comparison: Comparison tests assist in noticing if there are any discrepancies in the means, medians, or rankings of two or more groups’ scores.
- Correlation: Correlation tests assist in assessing how closely two variables are related.
- Regression: Regression tests show whether or not the changes in predictor variables induce changes in an outcome variable. Depending on the quantity and types of variables you have as predictors and outcomes, you can choose which regression test to employ. The majority of widely used regression tests are parametric in nature.
When deciding which test is best for you, consider if your data fulfils the requirements for parametric testing, the quantity of samples, and the degrees of measurement of your variables. In addition, if your data is not regularly distributed, you should use data transformations to assist you in making it distributed. This is possible by doing mathematical operations such as calculating the square root of each number.
To summarise, descriptive and inferential statistics are not the same. Descriptive analysis focuses on describing specific information, whereas inferential analysis focuses on making predictions based on previously collected data. As a result, descriptive and inferential statistics are interrelated.