If you have read my previous article on statistics, you must have read the term “Hypothesis Testing” under Inferential Statistics.
Hypothesis testing is a statistical technique in which an analyst verifies a hypothesis about a population parameter. In simpler terms, it’s a statistical test that determines whether or not the hypothesis stated for a sample of data holds true for the full population. It helps in comparing two or more groups. This test also provides evidence concerning the success or failure of the hypothesis for any data.
A data analyst evaluates a statistical sample by monitoring and assessing a random sample of the population being studied in hypothesis testing. All analysts use a random population sample to test two hypotheses, which I’ll go over in more detail later in this article.
A population is a group of items that we want to examine or evaluate. It refers to the total number of observations that are possible. For example, if we are researching the availability of hospitals in a particular location, the population will consist of all of the hospitals in that area.
A parameter is any summary figure, such as an average or percentage, that represents the whole population. Two separate population parameters are the population mean (mu) and the population percentage (p). For example, suppose we wish to know the average height of the Chinese population. It would be called a parameter because it reveals information about the entire population of China.
A statistic is any summary figure of a sample measure. It helps in calculating the population parameter.
A sampling distribution displays every possible result a statistic can obtain from every potential sample from a population. It also shows how frequently each result occurs.
A standard error (SE) is the standard deviation of its sample distribution. It calculates how much difference one can expect there to be between a sample’s mean and the population mean. It aids in the presentation of sample data features as well as the explanation of statistical analysis results.
The null hypothesis is a general claim that there is no link between two variables. One can test, evaluate, and even reject a null hypothesis. The sign H₀ represents a null hypothesis. We can pronounce H₀ as H-null, H-zero, or H-naught. Since we can either support or reject a null hypothesis, it is in connection to the ‘equals to’ sign. Although If we decide to accept the null hypothesis, we need not make any changes in hypothesis testing.
The research hypothesis is another name for it. It indicates that two variables have a connection, implying that one influences the other and vice versa. The sign H₁ represents an alternative hypothesis. This hypothesis is basically an alternative to the null hypothesis. An example of an alternate hypothesis is believing that one of two variables in comparison is superior or inferior.
A one-tailed test is a statistical hypothesis test in which the alternative hypothesis has only one end. It shows if the sample mean is more or less than the population mean. When performing a one-tailed test, the analyst is looking for the likelihood of a connection in only one direction. However, before conducting a one-tailed test, the analyst must construct a null hypothesis, an alternative hypothesis, and a probability value (p-value).
We use the two-tailed test of the null hypothesis when the alternative hypothesis does not show direction. In other words, a two-tailed test has two endpoints for the alternative hypothesis. Here, the analyst is looking for the likelihood of a connection in either direction and not just one.
A test statistic is a numerical value we get from a statistical test of a hypothesis. It provides information about the data that is important in determining whether or not to reject the null hypothesis. It determines how near the sample is to the null hypothesis. If the test statistic number is high, the p-value will be low which will increase the likelihood to reject the null hypothesis. Similarly, if the test statistic number is low, the p-value will be high increasing the likelihood to support the null hypothesis.
The p-value is the likelihood of getting test results on the assumption that the null hypothesis is true. It is a measure of the likelihood that a discrepancy might have occurred solely by chance. P-value tables or statistical tools help in calculating p-values. The lower the p-value, the more likely it is that a null hypothesis will be rejected.
Assume you decide to get tested for Coronavirus since you’ve come across a few symptoms of the condition.
There are two errors that could possibly occur:
A Type I error occurs when a false positive result is obtained. For example, you may assume that the medicine you took improved your health when, in reality, it did not; other variables excluding the medicine, improved your health.
Such error occurs when the null hypothesis is rejected when it is actually true. It involves believing that results are statistically significant when they were obtained completely by coincidence. The significance level, or alpha (), is the chance of making a Type I error. The significance level is often around 0.05 or 5%. This indicates that if the null hypothesis is true, your results have a 5% chance of happening.
A Type II error occurs when a false negative result is obtained. It occurs when the null hypothesis is supported when it is actually incorrect. Beta () denotes the likelihood of committing a Type II error. For example, you may assume that the medication you took had no effect on your health when, in fact, it did. We also learned from the above example that it essentially implies failing to recognize an impact when one exists. The chance of such error is inversely proportional to a study’s statistical power. The smaller the likelihood of committing this error, the greater the statistical power.
Different forms of hypothesis in data sampling help to determine if the examined samples are positive or negative for a hypothesis. We have already discussed both null and alternative hypotheses.
Other types of hypotheses include the following:
The Non-directional hypothesis states that there is no direction to the relationship between two variables. It simply indicates that there is a link between the two variables, but there is no explanation for which variable influences which because there is no direction of effect.
In contrast, the Directional hypothesis emphasizes the direction of effect of the connection that exists between two variables. Here, we know which variable influences which.
A statistical hypothesis is a hypothesis that can be statistically proven to be true using data sampling and statistical expertise.
The following are the steps:
Establishing both null and alternative hypotheses is the first step. It establishes the groundwork for hypothesis testing. These hypotheses are important because they kick off the testing procedure, which involves the analyst working with data samples. This study helps them in determining whether to accept or reject the hypothesis.
After you’ve established your hypothesis, the next step is to devise a testing strategy. This involves gathering data samples and selecting which statistical approach should be used.
The third stage is to examine the data samples in order to extract a pattern from them.
An analyst decides the following when examining the samples:
The interpretation of outcomes from the study of data samples indicates whether the alternative hypothesis should be supported or rejected.