Introduction to Statistical Tests

A Practical Guide with R and Python

1 Introduction

This document provides a practical guide to several fundamental statistical tests, demonstrating their implementation in both R and Python. We will cover the one-sample t-test, independent two-sample t-test, paired t-test, ANOVA, chi-square test, and correlation tests (Pearson and Spearman). For each test, we will explain the underlying theory, assumptions, and how to interpret the results.

2 Independent Two-Sample t-Test

The independent two-sample t-test is used to determine whether there is a statistically significant difference between the means of two independent groups.

Null Hypothesis (H0): The means of the two groups are equal. Alternative Hypothesis (H1): The means of the two groups are not equal.

Code
# Sample data for two independent groups
class_A <- c(85, 88, 90, 85, 87, 91, 89, 100)
class_B <- c(80, 82, 84, 79, 81, 83, 78)

# Calculate means and variances
mean_A <- mean(class_A)
mean_B <- mean(class_B)
var_A <- var(class_A)
var_B <- var(class_B)

cat(paste("Mean of Class A:", round(mean_A, 2), "\n"))
Mean of Class A: 89.38 
Code
cat(paste("Mean of Class B:", round(mean_B, 2), "\n"))
Mean of Class B: 81 
Code
cat(paste("Variance of Class A:", round(var_A, 2), "\n"))
Variance of Class A: 23.12 
Code
cat(paste("Variance of Class B:", round(var_B, 2), "\n"))
Variance of Class B: 4.67 
Code
# Perform the t-test
t_test_result <- t.test(class_A, class_B)
print(t_test_result)

    Welch Two Sample t-test

data:  class_A and class_B
t = 4.4404, df = 9.9817, p-value = 0.001259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  4.171513 12.578487
sample estimates:
mean of x mean of y 
   89.375    81.000 

Interpretation:

  • p-value: The p-value is very small (0.0001), which is less than the common alpha level of 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant difference between the mean scores of Class A and Class B.

2.1 Assumptions of the t-Test

  1. Normality: The data in each group should be approximately normally distributed.
  2. Independence: The two groups must be independent of each other.
  3. Equal Variances (Homogeneity of Variances): The variances of the two groups should be equal. The Welch’s t-test (the default in R) does not assume equal variances.

2.1.1 Normality Test (Shapiro-Wilk)

Code
shapiro.test(class_A)

    Shapiro-Wilk normality test

data:  class_A
W = 0.8249, p-value = 0.05252
Code
shapiro.test(class_B)

    Shapiro-Wilk normality test

data:  class_B
W = 0.978, p-value = 0.9493

Result: Both p-values are greater than 0.05, so we fail to reject the null hypothesis. The data appears to be normally distributed.

2.1.2 Equal Variance Test (Levene’s Test)

Code
# Combine data for Levene's test
score <- c(class_A, class_B)
group <- c(rep("A", length(class_A)), rep("B", length(class_B)))
data <- data.frame(score, group)

car::leveneTest(score ~ group, data = data)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.9926 0.3373
      13               

Result: The p-value is greater than 0.05, so we fail to reject the null hypothesis. The variances are assumed to be equal.

Code
import numpy as np
from scipy.stats import ttest_ind

# Sample data
group_a_scores = np.array([88, 92, 85, 91, 87])
group_b_scores = np.array([78, 75, 80, 73, 77])
Code
# Perform Independent Two-Sample t-Test
t_stat, p_value = ttest_ind(group_a_scores, group_b_scores)

print(f"T-statistic: {t_stat:.4f}")
T-statistic: 6.7937
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0001

Interpretation:

  • p-value: The p-value is very small (0.0001), which is less than 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant difference between the means of the two groups.

3 Paired t-Test

The paired t-test is used to compare the means of two related groups to determine if there is a statistically significant difference between them.

Null Hypothesis (H0): The true mean difference between the paired samples is zero. Alternative Hypothesis (H1): The true mean difference is not zero.

Code
# Sample data
before <- c(100, 102, 98, 95, 101)
after <- c(102, 104, 99, 97, 103)

# Calculate means
mean_before <- mean(before)
mean_after <- mean(after)

cat(paste("Mean before:", round(mean_before, 2), "\n"))
Mean before: 99.2 
Code
cat(paste("Mean after:", round(mean_after, 2), "\n"))
Mean after: 101 
Code
# Perform the paired t-test
t_test_paired <- t.test(before, after, paired = TRUE)
print(t_test_paired)

    Paired t-test

data:  before and after
t = -9, df = 4, p-value = 0.0008438
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.355289 -1.244711
sample estimates:
mean difference 
           -1.8 

Interpretation:

  • p-value: The p-value is very small (0.0008), which is less than 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant increase in scores after the intervention.

3.1 Assumption: Normality of Differences

The paired t-test assumes that the differences between the pairs are normally distributed.

Code
diff <- after - before
shapiro.test(diff)

    Shapiro-Wilk normality test

data:  diff
W = 0.55218, p-value = 0.000131

Result: The p-value is greater than 0.05, so the differences are normally distributed.

Code
import numpy as np
from scipy.stats import ttest_rel

# Sample data
before = np.array([72, 75, 78, 70, 74])
after = np.array([78, 80, 82, 76, 79])
Code
# Perform the paired t-test
t_stat, p_val = ttest_rel(before, after)

print("Paired t-Test Results:")
Paired t-Test Results:
Code
print(f"T-statistic: {t_stat:.4f}")
T-statistic: -13.8976
Code
print(f"P-value: {p_val:.4f}")
P-value: 0.0002

Interpretation:

  • p-value: The p-value is very small, which is less than 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant increase in scores.

4 One-Sample t-Test

The one-sample t-test is used to determine whether the mean of a single sample differs significantly from a known or hypothesized population mean.

Null Hypothesis (H0): The sample mean is equal to the known population mean. Alternative Hypothesis (H1): The sample mean is not equal to the known population mean.

Code
# Sample data
scores <- c(85, 88, 90, 85, 87, 91, 89, 93, 86, 88)

# Known population mean (hypothesized value)
population_mean <- 85

# Calculate sample mean
sample_mean <- mean(scores)

cat(paste("Sample mean:", round(sample_mean, 2), "\n"))
Sample mean: 88.2 
Code
cat(paste("Hypothesized population mean:", population_mean, "\n"))
Hypothesized population mean: 85 
Code
# Perform the one-sample t-test
t_test_one_sample <- t.test(scores, mu = population_mean)
print(t_test_one_sample)

    One Sample t-test

data:  scores
t = 3.868, df = 9, p-value = 0.003801
alternative hypothesis: true mean is not equal to 85
95 percent confidence interval:
 86.32849 90.07151
sample estimates:
mean of x 
     88.2 

Interpretation:

  • p-value: The p-value is greater than 0.05, which is not less than the common alpha level of 0.05.
  • Conclusion: We fail to reject the null hypothesis. There is no statistically significant difference between the sample mean and the hypothesized population mean.

4.1 Assumption: Normality

The one-sample t-test assumes that the data are normally distributed.

Code
shapiro.test(scores)

    Shapiro-Wilk normality test

data:  scores
W = 0.95415, p-value = 0.7176

Result: The p-value is greater than 0.05, so we fail to reject the null hypothesis. The data appears to be normally distributed.

Code
import numpy as np
from scipy.stats import ttest_1samp

# Sample data
scores = np.array([85, 88, 90, 85, 87, 91, 89, 93, 86, 88])

# Hypothesized population mean
population_mean = 85

print(f"Sample mean: {scores.mean():.2f}")
Sample mean: 88.20
Code
print(f"Hypothesized population mean: {population_mean}")
Hypothesized population mean: 85
Code
# Perform the one-sample t-test
t_stat, p_value = ttest_1samp(scores, population_mean)

print("One-Sample t-Test Results:")
One-Sample t-Test Results:
Code
print(f"T-statistic: {t_stat:.4f}")
T-statistic: 3.8680
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0038

Interpretation:

  • p-value: The p-value is greater than 0.05.
  • Conclusion: We fail to reject the null hypothesis. There is no statistically significant difference between the sample mean and the hypothesized population mean.

5 ANOVA (Analysis of Variance)

ANOVA is used to compare the means of three or more groups to see if at least one group is different from the others.

Null Hypothesis (H0): The means of all groups are equal. Alternative Hypothesis (H1): At least one group mean is different.

Code
# Create 3 groups
group1 <- c(85, 88, 90, 85, 87, 91, 89, 100)
group2 <- c(80, 88, 84, 89, 81, 83, 88, 100)
group3 <- c(120, 200, 200, 200, 100, 200, 100, 100)

# Combine into a data frame
value <- c(group1, group2, group3)
group <- factor(rep(c("Group1", "Group2", "Group3"), each = 8))
data <- data.frame(group, value)
Code
# Perform ANOVA
anova_result <- aov(value ~ group, data = data)
summary(anova_result)
            Df Sum Sq Mean Sq F value   Pr(>F)    
group        2  22218   11109   12.41 0.000277 ***
Residuals   21  18796     895                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation: The p-value is very small, so we reject the null hypothesis. At least one group mean is different.

5.1 Post-Hoc Test (Tukey HSD)

If the ANOVA is significant, we use a post-hoc test like Tukey’s Honestly Significant Difference (HSD) to find out which specific groups are different from each other.

Code
TukeyHSD(anova_result)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = value ~ group, data = data)

$group
                diff       lwr       upr     p adj
Group2-Group1 -2.750 -40.45414  34.95414 0.9815566
Group3-Group1 63.125  25.42086 100.82914 0.0010731
Group3-Group2 65.875  28.17086 103.57914 0.0006950

Interpretation: The results show that Group 3 is significantly different from both Group 1 and Group 2.

Code
import scipy.stats as stats
import pandas as pd
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Sample data
group1 = [85, 90, 88, 75, 95, 90]
group2 = [70, 65, 80, 72, 68, 90]
group3 = [120, 200, 200, 200, 100, 120]

# Combine data
scores = group1 + group2 + group3
methods = ['Method1'] * len(group1) + ['Method2'] * len(group2) + ['Method3'] * len(group3)
df = pd.DataFrame({'score': scores, 'method': methods})
Code
# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_statistic:.4f}")
F-statistic: 14.5233
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0003

Interpretation: The p-value is very small, so we reject the null hypothesis.

5.2 Post-Hoc Test (Tukey HSD)

Code
tukey = pairwise_tukeyhsd(endog=df['score'], groups=df['method'], alpha=0.05)
print(tukey)
  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
========================================================
 group1  group2 meandiff p-adj   lower    upper   reject
--------------------------------------------------------
Method1 Method2    -13.0 0.7148 -55.7563  29.7563  False
Method1 Method3     69.5  0.002  26.7437 112.2563   True
Method2 Method3     82.5 0.0004  39.7437 125.2563   True
--------------------------------------------------------

Interpretation: Method 3 is significantly different from Method 1 and Method 2.

6 Chi-Square Test

The chi-square test is used to determine whether there is a significant association between two categorical variables (test of independence) or whether observed frequencies differ from expected frequencies (goodness of fit test).

Null Hypothesis (H0): The two categorical variables are independent (for test of independence). Alternative Hypothesis (H1): The two categorical variables are not independent.

Code
# Sample data: Survey responses by gender
# Create a contingency table
survey_data <- matrix(c(50, 30, 20, 40, 60, 30), nrow = 2, byrow = TRUE,
                      dimnames = list(Gender = c("Male", "Female"),
                                      Response = c("Yes", "No", "Maybe")))

print("Contingency Table:")
[1] "Contingency Table:"
Code
print(survey_data)
        Response
Gender   Yes No Maybe
  Male    50 30    20
  Female  40 60    30
Code
# Perform the chi-square test of independence
chi_test <- chisq.test(survey_data)
print(chi_test)

    Pearson's Chi-squared test

data:  survey_data
X-squared = 9.3573, df = 2, p-value = 0.009292

Interpretation:

  • p-value: The p-value is less than 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant association between gender and response.

6.1 Assumptions of the Chi-Square Test

  1. Independence: The observations must be independent.
  2. Expected Frequencies: Expected frequencies in each cell should be at least 5 (or at least 80% of cells should have expected frequencies ≥ 5, and no cell should have expected frequency < 1).
Code
# Check expected frequencies
expected <- chi_test$expected
print("Expected Frequencies:")
[1] "Expected Frequencies:"
Code
print(expected)
        Response
Gender        Yes       No    Maybe
  Male   39.13043 39.13043 21.73913
  Female 50.86957 50.86957 28.26087
Code
# Check if all expected frequencies are >= 5
all(expected >= 5)
[1] TRUE

Result: All expected frequencies are greater than or equal to 5, so the assumption is satisfied.

Code
import numpy as np
from scipy.stats import chi2_contingency

# Sample data: Contingency table
# Rows: Gender (Male, Female)
# Columns: Response (Yes, No, Maybe)
survey_data = np.array([[50, 30, 20],
                        [40, 60, 30]])

print("Contingency Table:")
Contingency Table:
Code
print("Rows: Male, Female")
Rows: Male, Female
Code
print("Columns: Yes, No, Maybe")
Columns: Yes, No, Maybe
Code
print(survey_data)
[[50 30 20]
 [40 60 30]]
Code
# Perform the chi-square test of independence
chi2_stat, p_value, dof, expected = chi2_contingency(survey_data)

print("Chi-Square Test Results:")
Chi-Square Test Results:
Code
print(f"Chi-square statistic: {chi2_stat:.4f}")
Chi-square statistic: 9.3573
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0093
Code
print(f"Degrees of freedom: {dof}")
Degrees of freedom: 2
Code
print("Expected frequencies:")
Expected frequencies:
Code
print(expected)
[[39.13043478 39.13043478 21.73913043]
 [50.86956522 50.86956522 28.26086957]]

Interpretation:

  • p-value: The p-value is less than 0.05.
  • Conclusion: We reject the null hypothesis. There is a statistically significant association between the variables.

7 Correlation

Correlation tests measure the strength and direction of the relationship between two continuous variables.

7.1 Pearson Correlation

Measures the linear relationship between two variables. It assumes that the variables are normally distributed.

Code
# Sample data
data <- data.frame(
  x = c(10, 20, 30, 40, 50, 10),
  y = c(15, 25, 35, 45, 55, 5)
)
Code
# Compute Pearson correlation
correlation <- cor.test(data$x, data$y, method = "pearson")
print(correlation)

    Pearson's product-moment correlation

data:  data$x and data$y
t = 10.392, df = 4, p-value = 0.0004841
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.8392446 0.9981104
sample estimates:
      cor 
0.9819805 

Interpretation: The correlation coefficient is 0.83, indicating a strong positive linear relationship.

Code
import scipy.stats as stats

# Example data
x = [10, 20, 30, 40, 50, 77, 89]
y = [15, 25, 35, 45, 55, 70, 80]
Code
# Calculate Pearson correlation
corr_coef, p_value = stats.pearsonr(x, y)

print(f"Pearson correlation coefficient: {corr_coef:.4f}")
Pearson correlation coefficient: 0.9927
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0000

Interpretation: The correlation coefficient is 0.99, indicating a very strong positive linear relationship.

7.2 Spearman Correlation

Measures the monotonic relationship between two variables. It does not assume normality and is based on the ranks of the data.

Code
# Sample data
data <- data.frame(
  x = c(10, 20, 30, 40, 50, 10),
  y = c(15, 25, 35, 45, 55, 5)
)
Code
# Compute Spearman correlation
correlation <- cor.test(data$x, data$y, method = "spearman")
print(correlation)

    Spearman's rank correlation rho

data:  data$x and data$y
S = 0.50362, p-value = 0.0003091
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.9856108 

Interpretation: The Spearman correlation coefficient is 0.83, indicating a strong positive monotonic relationship.

Code
import scipy.stats as stats

# Example data
x = [10, 20, 30, 40, 50]
y = [1, 2, 3, 4, 5]
Code
# Calculate Spearman correlation
corr_coef, p_value = stats.spearmanr(x, y)

print(f"Spearman correlation coefficient: {corr_coef:.4f}")
Spearman correlation coefficient: 1.0000
Code
print(f"P-value: {p_value:.4f}")
P-value: 0.0000

Interpretation: The Spearman correlation coefficient is 1.0, indicating a perfect positive monotonic relationship.

8 Conclusion

This document has provided a practical overview of several key statistical tests. By understanding the principles behind these tests and how to implement them in R and Python, you can gain valuable insights from your data and make informed, data-driven decisions.