---
title: "Introduction to Statistical Tests"
subtitle: "A Practical Guide with R and Python"
execute:
warning: false
error: false
format:
html:
toc: true
toc-location: right
code-fold: show
code-tools: true
number-sections: true
code-block-bg: true
code-block-border-left: "#31BAE9"
---
## Introduction
This document provides a practical guide to several fundamental statistical tests, demonstrating their implementation in both R and Python. We will cover the one-sample t-test, independent two-sample t-test, paired t-test, ANOVA, chi-square test, and correlation tests (Pearson and Spearman). For each test, we will explain the underlying theory, assumptions, and how to interpret the results.
```{r}
#| echo: false
# Load required Python libraries for the document
library(reticulate)
py_require(c('scipy', 'pandas', 'numpy', 'matplotlib', 'seaborn', 'plotly', 'statsmodels', 'pingouin'))
```
## Independent Two-Sample t-Test
The independent two-sample t-test is used to determine whether there is a statistically significant difference between the means of two independent groups.
**Null Hypothesis (H0):** The means of the two groups are equal.
**Alternative Hypothesis (H1):** The means of the two groups are not equal.
::: panel-tabset
## R
```{r}
#| label: r-t-test-data
# Sample data for two independent groups
class_A <- c(85, 88, 90, 85, 87, 91, 89, 100)
class_B <- c(80, 82, 84, 79, 81, 83, 78)
# Calculate means and variances
mean_A <- mean(class_A)
mean_B <- mean(class_B)
var_A <- var(class_A)
var_B <- var(class_B)
cat(paste("Mean of Class A:", round(mean_A, 2), "\n"))
cat(paste("Mean of Class B:", round(mean_B, 2), "\n"))
cat(paste("Variance of Class A:", round(var_A, 2), "\n"))
cat(paste("Variance of Class B:", round(var_B, 2), "\n"))
```
```{r}
#| label: r-t-test-execution
# Perform the t-test
t_test_result <- t.test(class_A, class_B)
print(t_test_result)
```
**Interpretation:**
- **p-value:** The p-value is very small (0.0001), which is less than the common alpha level of 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant difference between the mean scores of Class A and Class B.
### Assumptions of the t-Test
1. **Normality:** The data in each group should be approximately normally distributed.
2. **Independence:** The two groups must be independent of each other.
3. **Equal Variances (Homogeneity of Variances):** The variances of the two groups should be equal. The Welch's t-test (the default in R) does not assume equal variances.
#### Normality Test (Shapiro-Wilk)
```{r}
#| label: r-normality-test
shapiro.test(class_A)
shapiro.test(class_B)
```
**Result:** Both p-values are greater than 0.05, so we fail to reject the null hypothesis. The data appears to be normally distributed.
#### Equal Variance Test (Levene's Test)
```{r}
#| label: r-levene-test
# Combine data for Levene's test
score <- c(class_A, class_B)
group <- c(rep("A", length(class_A)), rep("B", length(class_B)))
data <- data.frame(score, group)
car::leveneTest(score ~ group, data = data)
```
**Result:** The p-value is greater than 0.05, so we fail to reject the null hypothesis. The variances are assumed to be equal.
## Python
```{python}
#| label: python-t-test-data
import numpy as np
from scipy.stats import ttest_ind
# Sample data
group_a_scores = np.array([88, 92, 85, 91, 87])
group_b_scores = np.array([78, 75, 80, 73, 77])
```
```{python}
#| label: python-t-test-execution
# Perform Independent Two-Sample t-Test
t_stat, p_value = ttest_ind(group_a_scores, group_b_scores)
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
```
**Interpretation:**
- **p-value:** The p-value is very small (0.0001), which is less than 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant difference between the means of the two groups.
:::
## Paired t-Test
The paired t-test is used to compare the means of two related groups to determine if there is a statistically significant difference between them.
**Null Hypothesis (H0):** The true mean difference between the paired samples is zero.
**Alternative Hypothesis (H1):** The true mean difference is not zero.
::: panel-tabset
## R
```{r}
#| label: r-paired-t-test-data
# Sample data
before <- c(100, 102, 98, 95, 101)
after <- c(102, 104, 99, 97, 103)
# Calculate means
mean_before <- mean(before)
mean_after <- mean(after)
cat(paste("Mean before:", round(mean_before, 2), "\n"))
cat(paste("Mean after:", round(mean_after, 2), "\n"))
```
```{r}
#| label: r-paired-t-test-execution
# Perform the paired t-test
t_test_paired <- t.test(before, after, paired = TRUE)
print(t_test_paired)
```
**Interpretation:**
- **p-value:** The p-value is very small (0.0008), which is less than 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant increase in scores after the intervention.
### Assumption: Normality of Differences
The paired t-test assumes that the differences between the pairs are normally distributed.
```{r}
#| label: r-paired-normality
diff <- after - before
shapiro.test(diff)
```
**Result:** The p-value is greater than 0.05, so the differences are normally distributed.
## Python
```{python}
#| label: python-paired-t-test-data
import numpy as np
from scipy.stats import ttest_rel
# Sample data
before = np.array([72, 75, 78, 70, 74])
after = np.array([78, 80, 82, 76, 79])
```
```{python}
#| label: python-paired-t-test-execution
# Perform the paired t-test
t_stat, p_val = ttest_rel(before, after)
print("Paired t-Test Results:")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_val:.4f}")
```
**Interpretation:**
- **p-value:** The p-value is very small, which is less than 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant increase in scores.
:::
## One-Sample t-Test
The one-sample t-test is used to determine whether the mean of a single sample differs significantly from a known or hypothesized population mean.
**Null Hypothesis (H0):** The sample mean is equal to the known population mean.
**Alternative Hypothesis (H1):** The sample mean is not equal to the known population mean.
::: panel-tabset
## R
```{r}
#| label: r-one-sample-t-test-data
# Sample data
scores <- c(85, 88, 90, 85, 87, 91, 89, 93, 86, 88)
# Known population mean (hypothesized value)
population_mean <- 85
# Calculate sample mean
sample_mean <- mean(scores)
cat(paste("Sample mean:", round(sample_mean, 2), "\n"))
cat(paste("Hypothesized population mean:", population_mean, "\n"))
```
```{r}
#| label: r-one-sample-t-test-execution
# Perform the one-sample t-test
t_test_one_sample <- t.test(scores, mu = population_mean)
print(t_test_one_sample)
```
**Interpretation:**
- **p-value:** The p-value is greater than 0.05, which is not less than the common alpha level of 0.05.
- **Conclusion:** We fail to reject the null hypothesis. There is no statistically significant difference between the sample mean and the hypothesized population mean.
### Assumption: Normality
The one-sample t-test assumes that the data are normally distributed.
```{r}
#| label: r-one-sample-normality
shapiro.test(scores)
```
**Result:** The p-value is greater than 0.05, so we fail to reject the null hypothesis. The data appears to be normally distributed.
## Python
```{python}
#| label: python-one-sample-t-test-data
import numpy as np
from scipy.stats import ttest_1samp
# Sample data
scores = np.array([85, 88, 90, 85, 87, 91, 89, 93, 86, 88])
# Hypothesized population mean
population_mean = 85
print(f"Sample mean: {scores.mean():.2f}")
print(f"Hypothesized population mean: {population_mean}")
```
```{python}
#| label: python-one-sample-t-test-execution
# Perform the one-sample t-test
t_stat, p_value = ttest_1samp(scores, population_mean)
print("One-Sample t-Test Results:")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
```
**Interpretation:**
- **p-value:** The p-value is greater than 0.05.
- **Conclusion:** We fail to reject the null hypothesis. There is no statistically significant difference between the sample mean and the hypothesized population mean.
:::
## ANOVA (Analysis of Variance)
ANOVA is used to compare the means of three or more groups to see if at least one group is different from the others.
**Null Hypothesis (H0):** The means of all groups are equal.
**Alternative Hypothesis (H1):** At least one group mean is different.
::: panel-tabset
## R
```{r}
#| label: r-anova-data
# Create 3 groups
group1 <- c(85, 88, 90, 85, 87, 91, 89, 100)
group2 <- c(80, 88, 84, 89, 81, 83, 88, 100)
group3 <- c(120, 200, 200, 200, 100, 200, 100, 100)
# Combine into a data frame
value <- c(group1, group2, group3)
group <- factor(rep(c("Group1", "Group2", "Group3"), each = 8))
data <- data.frame(group, value)
```
```{r}
#| label: r-anova-execution
# Perform ANOVA
anova_result <- aov(value ~ group, data = data)
summary(anova_result)
```
**Interpretation:** The p-value is very small, so we reject the null hypothesis. At least one group mean is different.
### Post-Hoc Test (Tukey HSD)
If the ANOVA is significant, we use a post-hoc test like Tukey's Honestly Significant Difference (HSD) to find out which specific groups are different from each other.
```{r}
#| label: r-tukey-hsd
TukeyHSD(anova_result)
```
**Interpretation:** The results show that Group 3 is significantly different from both Group 1 and Group 2.
## Python
```{python}
#| label: python-anova-data
import scipy.stats as stats
import pandas as pd
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Sample data
group1 = [85, 90, 88, 75, 95, 90]
group2 = [70, 65, 80, 72, 68, 90]
group3 = [120, 200, 200, 200, 100, 120]
# Combine data
scores = group1 + group2 + group3
methods = ['Method1'] * len(group1) + ['Method2'] * len(group2) + ['Method3'] * len(group3)
df = pd.DataFrame({'score': scores, 'method': methods})
```
```{python}
#| label: python-anova-execution
# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(group1, group2, group3)
print(f"F-statistic: {f_statistic:.4f}")
print(f"P-value: {p_value:.4f}")
```
**Interpretation:** The p-value is very small, so we reject the null hypothesis.
### Post-Hoc Test (Tukey HSD)
```{python}
#| label: python-tukey-hsd
tukey = pairwise_tukeyhsd(endog=df['score'], groups=df['method'], alpha=0.05)
print(tukey)
```
**Interpretation:** Method 3 is significantly different from Method 1 and Method 2.
:::
## Chi-Square Test
The chi-square test is used to determine whether there is a significant association between two categorical variables (test of independence) or whether observed frequencies differ from expected frequencies (goodness of fit test).
**Null Hypothesis (H0):** The two categorical variables are independent (for test of independence).
**Alternative Hypothesis (H1):** The two categorical variables are not independent.
::: panel-tabset
## R
```{r}
#| label: r-chi-square-data
# Sample data: Survey responses by gender
# Create a contingency table
survey_data <- matrix(c(50, 30, 20, 40, 60, 30), nrow = 2, byrow = TRUE,
dimnames = list(Gender = c("Male", "Female"),
Response = c("Yes", "No", "Maybe")))
print("Contingency Table:")
print(survey_data)
```
```{r}
#| label: r-chi-square-execution
# Perform the chi-square test of independence
chi_test <- chisq.test(survey_data)
print(chi_test)
```
**Interpretation:**
- **p-value:** The p-value is less than 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant association between gender and response.
### Assumptions of the Chi-Square Test
1. **Independence:** The observations must be independent.
2. **Expected Frequencies:** Expected frequencies in each cell should be at least 5 (or at least 80% of cells should have expected frequencies ≥ 5, and no cell should have expected frequency < 1).
```{r}
#| label: r-chi-square-assumptions
# Check expected frequencies
expected <- chi_test$expected
print("Expected Frequencies:")
print(expected)
# Check if all expected frequencies are >= 5
all(expected >= 5)
```
**Result:** All expected frequencies are greater than or equal to 5, so the assumption is satisfied.
## Python
```{python}
#| label: python-chi-square-data
import numpy as np
from scipy.stats import chi2_contingency
# Sample data: Contingency table
# Rows: Gender (Male, Female)
# Columns: Response (Yes, No, Maybe)
survey_data = np.array([[50, 30, 20],
[40, 60, 30]])
print("Contingency Table:")
print("Rows: Male, Female")
print("Columns: Yes, No, Maybe")
print(survey_data)
```
```{python}
#| label: python-chi-square-execution
# Perform the chi-square test of independence
chi2_stat, p_value, dof, expected = chi2_contingency(survey_data)
print("Chi-Square Test Results:")
print(f"Chi-square statistic: {chi2_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected)
```
**Interpretation:**
- **p-value:** The p-value is less than 0.05.
- **Conclusion:** We reject the null hypothesis. There is a statistically significant association between the variables.
:::
## Correlation
Correlation tests measure the strength and direction of the relationship between two continuous variables.
### Pearson Correlation
Measures the linear relationship between two variables. It assumes that the variables are normally distributed.
::: panel-tabset
## R
```{r}
#| label: r-pearson-data
# Sample data
data <- data.frame(
x = c(10, 20, 30, 40, 50, 10),
y = c(15, 25, 35, 45, 55, 5)
)
```
```{r}
#| label: r-pearson-execution
# Compute Pearson correlation
correlation <- cor.test(data$x, data$y, method = "pearson")
print(correlation)
```
**Interpretation:** The correlation coefficient is 0.83, indicating a strong positive linear relationship.
## Python
```{python}
#| label: python-pearson-data
import scipy.stats as stats
# Example data
x = [10, 20, 30, 40, 50, 77, 89]
y = [15, 25, 35, 45, 55, 70, 80]
```
```{python}
#| label: python-pearson-execution
# Calculate Pearson correlation
corr_coef, p_value = stats.pearsonr(x, y)
print(f"Pearson correlation coefficient: {corr_coef:.4f}")
print(f"P-value: {p_value:.4f}")
```
**Interpretation:** The correlation coefficient is 0.99, indicating a very strong positive linear relationship.
:::
### Spearman Correlation
Measures the monotonic relationship between two variables. It does not assume normality and is based on the ranks of the data.
::: panel-tabset
## R
```{r}
#| label: r-spearman-data
# Sample data
data <- data.frame(
x = c(10, 20, 30, 40, 50, 10),
y = c(15, 25, 35, 45, 55, 5)
)
```
```{r}
#| label: r-spearman-execution
# Compute Spearman correlation
correlation <- cor.test(data$x, data$y, method = "spearman")
print(correlation)
```
**Interpretation:** The Spearman correlation coefficient is 0.83, indicating a strong positive monotonic relationship.
## Python
```{python}
#| label: python-spearman-data
import scipy.stats as stats
# Example data
x = [10, 20, 30, 40, 50]
y = [1, 2, 3, 4, 5]
```
```{python}
#| label: python-spearman-execution
# Calculate Spearman correlation
corr_coef, p_value = stats.spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr_coef:.4f}")
print(f"P-value: {p_value:.4f}")
```
**Interpretation:** The Spearman correlation coefficient is 1.0, indicating a perfect positive monotonic relationship.
:::
## Conclusion
This document has provided a practical overview of several key statistical tests. By understanding the principles behind these tests and how to implement them in R and Python, you can gain valuable insights from your data and make informed, data-driven decisions.