English: The figure shows the change in p-values computed from a t-test as the sample size increases, and how early stopping can allow for p-hacking.
Data is drawn from two identical normal distributions, . For each sample size , ranging from 5 to , a t-test is performed on the first <math>n<math> samples from each distribution, and the resulting p-value is plotted. The red dashed line indicates the commonly used significance level of 0.05.
If the data collection or analysis were to stop at a point where the p-value happened to fall below the significance level, a spurious statistically significant difference could be reported.
Illustration based on
Wagenmakers, Eric-Jan. "A practical solution to the pervasive problems of p values." Psychonomic bulletin & review 14.5 (2007): 779-804.
```python import numpy as np import matplotlib.pyplot as plt from scipy import stats
- Set random seed for reproducibility
np.random.seed(42)
- Function to perform t-test and return p-value
def perform_t_test(sample1, sample2):
_, p_value = stats.ttest_ind(sample1, sample2) return p_value
- Initialize parameters
max_samples = 10**4 start_samples = 5 p_values = [] sample_sizes = range(start_samples, max_samples + 1)
- Generate data and perform t-tests
population1 = stats.norm(loc=0, scale=10) population2 = stats.norm(loc=0, scale=10)
samples1 = population1.rvs(max_samples) samples2 = population2.rvs(max_samples)
for n in sample_sizes:
p_value = perform_t_test(samples1[:n], samples2[:n]) p_values.append(p_value)
- Create the plot
plt.figure(figsize=(12, 6)) plt.semilogx(sample_sizes, p_values, 'b-') plt.axhline(y=0.05, color='r', linestyle='--', label='p = 0.05') plt.xlabel('Sample Size (log scale)') plt.ylabel('p-value') plt.title('Variability of p-value as Sample Size Increases') plt.grid(True, which="both", ls="-", alpha=0.2) plt.legend() plt.ylim(0, 1) plt.tight_layout() plt.savefig('p-hacking.svg') plt.show()
```