## Find out why Welch's t-test is the go-to method for accurate statistical comparison, even when variances differ.

## Part 1: Background

During the first semester of my third cycle, I had the opportunity to take the course STAT7055: Introduction to statistics for business and finance. Throughout the course, I definitely felt a little exhausted at times, but the amount of knowledge I gained about applying various statistical methods in different situations was truly invaluable. During the 8th week of the course, something really interesting caught my attention, namely the concept of hypothesis testing when comparing two populations. I found it fascinating to learn how the approach differs depending on whether the samples are independent or matched, as well as what to do when we do or do not know the variance of the two populations, as well as how to test hypotheses. for two proportions. However, there is one aspect that was not addressed in the paper, and it makes me wonder how to approach this particular scenario, which involves performing hypothesis testing from two population means when the variances are unequal, known as **Welch's t-test**.

To understand the concept of applying Welch's t-test, we can explore a dataset for the example case. Each step in this process involves using the dataset from real-world data.

## Part 2: The dataset

The dataset I use contains actual World Agricultural Supply and Demand Estimates (WASDE) data that is regularly updated. The WASDE dataset is compiled by the World Agricultural Outlook Board (WAOB). This is a monthly report that provides annual forecasts for various regions of the world and the United States for wheat, rice, coarse grains, oilseeds and cotton. Additionally, the dataset also covers forecasts for sugar, meat, poultry, eggs, and milk in the United States. It comes from the Nasdaq website and you can access it for free here: WASDE dataset. There are 3 data sets, but I only use the first one, which is supply and demand data. The column definitions can be viewed here:

I will use two different samples from specific regions, products and items to simplify the testing process. Additionally, we will use the R programming language for the end-to-end procedure.

Now let's do some good data preparation:

`library(dplyr)`# Read and preprocess the dataframe

wasde_data <- read.csv("wasde_data.csv") %>%

select(-min_value, -max_value, -year, -period) %>%

filter(item == "Production", commodity == "Wheat")

# Filter data for Argentina and Australia

wasde_argentina <- wasde_data %>%

filter(region == "Argentina") %>%

arrange(desc(report_month))

wasde_oz <- wasde_data %>%

filter(region == "Australia") %>%

arrange(desc(report_month))

I divided two samples into two different regions, namely Argentina and Australia. And the focus is on wheat production.

Now we are ready. But wait..

Before delving deeper into the application of Welch's t-test, I can't help but wonder why it is necessary to test whether the two population variances are equal or not.

## Part 3: Testing for Equality of Variances

When testing hypotheses to compare two population means without knowing the population variances, it is crucial to confirm the equality of the variances in order to select the appropriate statistical test. If the variances turn out to be the same, we opt for the pooled variance t test; otherwise, we can use Welch's t-test. This important step ensures the accuracy of the results, as using an incorrect test could lead to erroneous conclusions due to higher risks of Type I and Type II errors. By checking for equality of variances, we ensure that the hypothesis testing process is based on accurate hypotheses, ultimately leading to more reliable and valid conclusions.

So how do we test the two population variances?

We need to generate two hypotheses as below:

The basic rule is very simple:

- If the test statistic falls in the rejection region, then reject H0 or null hypothesis.
- Otherwise, we fail to reject H0 or the null hypothesis.

We can pose the hypotheses like this:

`# Hypotheses: Variance Comparison`

h0_variance <- "Population variance of Wheat production in Argentina equals that in Australia"

h1_variance <- "Population variance of Wheat production in Argentina differs from that in Australia"

Now we should do the statistical test. But how do you get this test statistic? we are using **F-test.**

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between several samples. The test statistic, random variable F, is used to determine whether the tested data has an F distribution under the true null hypothesis and usual true hypotheses on the error term.

we can generate the test statistic value by dividing **two samples of variances **like that:

and the rejection region is:

where n is the sample size and alpha is the significance level. thus, when the F value falls into either of these rejection regions, we reject the null hypothesis.

but..

the trick is: the labeling of sample 1 and sample 2 is actually random, so **make sure you place the largest sample variance at the top every time**. This way, our F statistic will be systematically greater than 1, and **it is enough to refer to the upper threshold to reject H0 at the significance level α each time**.

we can do this by:

`# Calculate sample variances`

sample_var_argentina <- var(wasde_argentina$value)

sample_var_oz <- var(wasde_oz$value)# Calculate F calculated value

f_calculated <- sample_var_argentina / sample_var_oz

we will use a significance level of 5% (0.05), so the decision rule is:

`# Define significance level and degrees of freedom`

alpha <- 0.05

alpha_half <- alpha / 2

n1 <- nrow(wasde_argentina)

n2 <- nrow(wasde_oz)

df1 <- n1 - 1

df2 <- n2 - 1# Calculate critical F values

f_value_lower <- qf(alpha_half, df1, df2)

f_value_upper <- qf(1 - alpha_half, df1, df2)

# Variance comparison result

if (f_calculated > f_value_lower & f_calculated < f_value_upper) {

cat("Fail to Reject H0: ", h0_variance, "\n")

equal_variances <- TRUE

} else {

cat("Reject H0: ", h1_variance, "\n")

equal_variances <- FALSE

}

the result is **we reject the null hypothesis at the 5% significance level**, in other words, from this test we believe that the variances of the two populations are not equal. We now know why we should use the Welch's t-test instead of the pooled variance t-test.

## Part 4: The main course, Welch t-Test

The Welch t-test, also called the Welch unequal variances t-test, is a statistical method used to compare the means of two separate samples. Instead of assuming equal variances like the standard pooled variance t-test, the Welch's t-test is more robust because it does not make this assumption. This adjustment in degrees of freedom leads to a more precise evaluation of the difference between the two sample means. By not assuming equal variances, Welch's t-test offers a more reliable result when working with real-world data where this assumption may not be true. It is preferred for its adaptability and reliability, ensuring that conclusions drawn from statistical analyzes remain valid even if the assumption of equal variances is not met.

The statistical formula for the test is as follows:

where:

and the degree of freedom can be defined like this:

The rejection region of the Welch t-test depends on the significance level chosen and whether the test is one-sided or two-sided.

**Two-sided test**: The null hypothesis is rejected if the absolute value of the test statistic |t| is greater than the critical value of the t distribution with ν degrees of freedom at α/2.

**One-sided test**: The null hypothesis is rejected if the test statistic t is greater than the critical value of the t distribution with ν degrees of freedom in α for an upper-tailed test, or if t is less than the negative critical value for a test with lower tail. tail test.

**Upper Tail Test**: t > tα,ν**Lower tail test**: t < −tα,ν

So let's make an example with **One-tailed Welch's t-test.**

generate the hypotheses:

`h0_mean <- "Population mean of Wheat production in Argentina equals that in Australia"`

h1_mean <- "Population mean of Wheat production in Argentina is greater than that in Australia"

it's a **Upper tail test, **therefore the rejection region is: t > tα,ν

and using the formula given above, and using the same level of significance (0.05):

`# Calculate sample means`

sample_mean_argentina <- mean(wasde_argentina$value)

sample_mean_oz <- mean(wasde_oz$value)# Welch's t-test (unequal variances)

s1 <- sample_var_argentina

s2 <- sample_var_oz

t_calculated <- (sample_mean_argentina - sample_mean_oz) / sqrt(s1/n1 + s2/n2)

df <- (s1/n1 + s2/n2)^2 / ((s1^2/(n1^2 * (n1-1))) + (s2^2/(n2^2 * (n2-1))))

t_value <- qt(1 - alpha, df)

# Mean comparison result

if (t_calculated > t_value) {

cat("Reject H0: ", h1_mean, "\n")

} else {

cat("Fail to Reject H0: ", h0_mean, "\n")

}

the result is **we fail to reject H0 at the 5% significance level, so the population average of wheat production in Argentina is equal to that of Australia.**

This is how to perform the Welch t-test. Now your turn. Happy experimenting!

## Part 5: Conclusion

When comparing two population means when testing a hypothesis, it is very important to first check whether the variances are equal. This first step is crucial because it allows you to decide which statistical test to use, ensuring accurate and reliable results. If it turns out that the variances are indeed equal, you can go ahead and apply the standard t-test with the variances pooled. However, in cases where the variances are not equal, it is recommended to use Welch's t-test.

Welch's t-test provides an effective solution for comparing means when the assumption of equality of variances is not true. By adjusting the degrees of freedom to account for unequal variances, Welch's t-test provides a more accurate and reliable assessment of the statistical significance of the difference between the means of two samples. This adaptability makes it a popular choice in various practical situations where sample sizes and variances can vary significantly.

In conclusion, checking the equality of variances and using Welch's t-test if necessary ensures the accuracy of hypothesis testing. This approach reduces the risk of Type I and Type II errors, leading to more reliable conclusions. By selecting the appropriate test based on equality of variances, we can analyze the results with confidence and make informed decisions based on empirical evidence.