What does the K-S statistic D measure?

D = sup|F₁(x) − F₂(x)| — the supremum (maximum) of the absolute vertical distance between the two ECDFs over all possible values of x. D ranges from 0 (identical distributions) to 1 (completely non-overlapping distributions).

What is the difference between the one-sample and two-sample K-S test?

The one-sample K-S test compares a sample ECDF against a fully specified theoretical CDF (e.g., testing normality). The two-sample K-S test compares two empirical ECDFs against each other, making no assumption about the underlying distribution family.

What is the critical value in the K-S test?

The critical value is the threshold that D must exceed to reject H₀ at a given significance level α. For two equal samples of size n, D_crit ≈ c(α) / √n, where c(0.05) = 1.36. For unequal sample sizes n₁ and n₂, D_crit = c(α) × √((n₁+n₂)/(n₁×n₂)).

What is a p-value in the K-S test?

The p-value is the probability of observing a K-S statistic as large as D (or larger) if H₀ were true. For large samples, p ≈ 2·exp(−2λ²), where λ = D·√(n₁·n₂/(n₁+n₂)). If p < α, reject H₀.

What is an empirical CDF (ECDF)?

The ECDF of a sample x₁, …, xₙ is the step function Fₙ(t) = (1/n)·#{i : xᵢ ≤ t}. It represents the proportion of observations ≤ t and converges to the true CDF as n→∞ (Glivenko-Cantelli theorem).

When should I use the K-S test instead of the t-test?

Use the K-S test when you cannot assume normality, when you want to detect any distributional difference (not just differences in mean), or when you are working with heavy-tailed or skewed data. The t-test is more powerful when normality holds and only the mean differs.

What are the assumptions of the K-S test?

The two-sample K-S test assumes: (1) observations within each sample are i.i.d., (2) the two samples are independent of each other, and (3) the underlying distribution is continuous (ties can reduce power). No parametric form is assumed for the distribution.

How does sample size affect the K-S test?

Larger samples increase the power of the K-S test — smaller distributional differences become detectable. The critical value decreases as n grows, proportional to 1/√n. With very small samples (n < 25), the asymptotic approximation for the p-value may be inaccurate; exact tables should be used.

What are real-world applications of the K-S test?

Finance (testing normality of returns, comparing pre/post-event distributions), clinical trials (comparing treatment vs control outcome distributions), environmental science (comparing pollutant levels across sites), quality control (detecting process shifts), and machine learning (covariate shift detection and model evaluation).

How does the K-S test compare to the Anderson-Darling test?

The Anderson-Darling test gives more weight to the tails of distributions, making it more sensitive to tail differences. The K-S test is equally weighted across the distribution. For normality testing, Anderson-Darling generally has more statistical power; for general distribution comparison, K-S is more versatile.

Kolmogorov-Smirnov (K-S) Test Calculator

Interactive two-sample K-S test with live ECDF visualization — compare Normal, Uniform, Exponential, and Bimodal distributions with instant results.

Free Calculator Nonparametric Live ECDF Plot p-value & Critical Value 4 Distribution Types

📋 Table of Contents

Interactive K-S Test Calculator
What is the Kolmogorov-Smirnov Test?
Historical Background
Understanding CDFs and ECDFs
The K-S Statistic: Mathematical Derivation
One-Sample vs. Two-Sample K-S Test
Step-by-Step: Performing the K-S Test
Worked Example with Full Calculation
Critical Values and P-Values
Assumptions of the K-S Test
K-S Test vs. Other Statistical Tests
Real-World Applications
Advantages and Limitations
Frequently Asked Questions

📊 Interactive K-S Test Calculator

Distribution Comparison

Critical Values

Real-World Examples

Distribution 1

Distribution Type:

Mean (μ):

Std Dev (σ):

Minimum (a):

Maximum (b):

Rate (λ):

Mean 1:

Mean 2:

Std Dev:

Distribution 2

Distribution Type:

Mean (μ):

Std Dev (σ):

Minimum (a):

Maximum (b):

Rate (λ):

Mean 1:

Mean 2:

Std Dev:

Test Settings

Sample Size (n):

Significance Level (α):

ECDF — Distribution 1

ECDF — Distribution 2

Max Distance D

K-S Test Results

K-S Statistic (D): —

Critical Value: —

p-value: —

Conclusion: —

How to Interpret These Results

The K-S statistic D (green dashed line) is the maximum absolute vertical gap between the two blue/red ECDF curves. If D exceeds the critical value — or equivalently, if p < α — we reject H₀ and conclude the distributions differ. Increase sample size to see how the test becomes more sensitive to smaller differences.

Critical Value Calculator

The critical value is the threshold that D must exceed to reject H₀. For two equal samples of size n at significance level α:

D_{\text{crit}} = \frac{c(\alpha)}{\sqrt{n}}, \quad \text{where } c(0.10)=1.22,\ c(0.05)=1.36,\ c(0.01)=1.63,\ c(0.001)=1.95

Sample Size n = 100

Critical Values at n = 100:

α = 0.10 → D_crit = 0.122

α = 0.05 → D_crit = 0.136

α = 0.01 → D_crit = 0.163

α = 0.001 → D_crit = 0.195

Key Insight

As n increases, D_crit decreases (∝ 1/√n) — meaning larger samples detect smaller distributional differences. This is why the K-S test applied to very large datasets may reject H₀ for trivially small practical differences. Always consider effect size alongside statistical significance.

Real-World Application Examples

Select Example:

Compares stock return distributions before and after a major economic event. The K-S test detects whether return volatility or location shifted significantly.

Example K-S Test Results

K-S Statistic (D): —

Critical Value: —

p-value: —

Conclusion: —

📘 What is the Kolmogorov-Smirnov (K-S) Test?

The Kolmogorov-Smirnov test (K-S test) is a powerful, nonparametric statistical procedure used to determine whether two samples are drawn from the same underlying probability distribution. Unlike parametric tests such as the t-test or ANOVA — which make specific assumptions about the shape of the population distribution — the K-S test is distribution-free: it makes no assumption that your data is normally distributed, Poisson-distributed, or of any other specific family.

The defining feature of the K-S test is the test statistic D, which measures the maximum absolute vertical distance between the two empirical cumulative distribution functions (ECDFs) of the two samples. A large D indicates that the two samples behave very differently across their entire range; a small D suggests they may come from the same distribution.

Key Insight: The K-S test is sensitive to differences in location (where the distribution is centred), scale (how spread out it is), and shape (symmetry, tail weight, modality). This makes it more comprehensive than tests that only assess mean differences.

Core Principles at a Glance

Distribution-free: No parametric assumption about the data's family of distributions.
Supremum statistic: D = sup|F₁(x) − F₂(x)| captures the worst-case discrepancy.
Two flavours: One-sample (data vs. theory) and two-sample (data vs. data).
Exact for small samples: Exact p-values can be computed via the K-S distribution; asymptotic approximations improve with n.
Glivenko-Cantelli guarantee: The ECDF converges uniformly to the true CDF as n → ∞, providing the theoretical foundation for the test.

🏛️ Historical Background

The K-S test is named after two giants of 20th-century Russian mathematics:

Andrey Nikolaevich Kolmogorov (1903–1987)

Kolmogorov was one of the most prolific mathematicians in history, making foundational contributions to probability theory (his 1933 axiomatisation of probability is still the standard today), turbulence, algorithmic information theory, and topology. In 1933, he published the limiting distribution of the supremum of the difference between an ECDF and a theoretical CDF for a continuous distribution, establishing the theoretical basis for the one-sample test.

Nikolai Vasilyevich Smirnov (1900–1966)

Smirnov extended Kolmogorov's work to the two-sample case in 1948 and derived the tables of critical values that statisticians used for decades before computers made exact computation feasible. His 1948 paper in The Annals of Mathematical Statistics is one of the most-cited works in nonparametric statistics.

Historical Note: The K-S test predates modern computers. Statisticians originally computed ECDFs by hand and compared them to tables of critical values printed in statistical handbooks — a tedious but important practice that underscores the test's mathematical elegance and simplicity.

📈 Understanding CDFs, ECDFs, and Their Properties

The Cumulative Distribution Function (CDF)

For a random variable X, the CDF is defined as:

CDF Definition \[F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t)\,dt \quad \text{(for continuous } X \text{)}\]

The CDF is a non-decreasing, right-continuous function taking values in [0, 1]. It starts at 0 as x → −∞ and approaches 1 as x → +∞. For a normal distribution with mean μ and standard deviation σ:

Normal CDF \[F(x) = \Phi\!\left(\frac{x-\mu}{\sigma}\right) = \frac{1}{2}\left[1 + \operatorname{erf}\!\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right]\]

The Empirical CDF (ECDF)

Given n observations x₁, x₂, …, xₙ from an unknown distribution, the empirical CDF is the step function:

Empirical CDF \[F_n(t) = \frac{1}{n}\sum_{i=1}^{n} \mathbf{1}(x_i \leq t) = \frac{\#\{i : x_i \leq t\}}{n}\]

The ECDF assigns probability mass of 1/n to each observation and places a "step" of height 1/n at every observation. The Glivenko-Cantelli Theorem (proof by Kolmogorov, 1933) guarantees that:

Glivenko-Cantelli Theorem \[\sup_{x \in \mathbb{R}} \left|F_n(x) - F(x)\right| \xrightarrow{\text{a.s.}} 0 \quad \text{as } n \to \infty\]

This almost-sure uniform convergence is the cornerstone of K-S test theory: as sample size grows, the ECDF becomes an increasingly accurate estimate of the true CDF in an exact, worst-case sense.

Properties of the ECDF

Non-decreasing step function with jumps of size 1/n at each data point
F_n(x) = 0 for x < min(x₁,…,xₙ) and F_n(x) = 1 for x ≥ max(x₁,…,xₙ)
Unbiased estimator of the true CDF: E[F_n(x)] = F(x) for all x
Pointwise variance: Var[F_n(x)] = F(x)(1−F(x)) / n

📐 The K-S Statistic: Mathematical Derivation

Two-Sample K-S Statistic

Given two independent samples of sizes n₁ and n₂ with ECDFs F_{n₁}(x) and G_{n₂}(x), the two-sample K-S statistic is:

Two-Sample K-S Statistic \[D_{n_1, n_2} = \sup_{x \in \mathbb{R}} \left|F_{n_1}(x) - G_{n_2}(x)\right|\]

Because ECDFs are step functions that only change at observed data points, the supremum is attained at one of the combined ordered observations. The algorithm evaluates |F_n(x) − G_m(x)| at every unique point in the merged sorted sample and takes the maximum.

One-Sample K-S Statistic

When comparing a sample to a fully specified theoretical CDF F₀(x):

One-Sample K-S Statistic \[D_n = \sup_{x \in \mathbb{R}} \left|F_n(x) - F_0(x)\right|\]

The Null Distribution

Under H₀ (both samples from the same continuous distribution), for the one-sample test:

Kolmogorov Distribution (asymptotic) \[P\!\left(\sqrt{n}\,D_n \leq z\right) \to K(z) = 1 - 2\sum_{k=1}^{\infty}(-1)^{k+1} e^{-2k^2z^2} \quad \text{as } n \to \infty\]

The function K(z) is the Kolmogorov distribution, a distribution specific to this test. For the two-sample test with unequal sample sizes n₁ and n₂, replace √n with the effective sample size:

Effective Sample Size \[n_{\text{eff}} = \sqrt{\frac{n_1 \cdot n_2}{n_1 + n_2}}\]

🔀 One-Sample vs. Two-Sample K-S Test

Feature	One-Sample K-S Test	Two-Sample K-S Test
Purpose	Test if sample follows a specified distribution	Compare two empirical distributions
H₀	F_n(x) = F₀(x) for all x	F_{n₁}(x) = G_{n₂}(x) for all x
Comparison	ECDF vs. theoretical CDF	ECDF vs. ECDF
Statistic	D_n = sup\|F_n(x) − F₀(x)\|	D = sup\|F_{n₁}(x) − G_{n₂}(x)\|
Typical use	Normality testing, goodness-of-fit	A/B testing, treatment vs. control
Limitation	Parameters must be pre-specified (not estimated from data)	Requires both samples to be continuous
Common alternatives	Shapiro-Wilk, Anderson-Darling	Mann-Whitney U, Wilcoxon rank-sum

Important: If you estimate parameters (μ, σ) from the same sample you are testing against a normal distribution, you must use the Lilliefors test (a modification of the one-sample K-S test) rather than the standard K-S test. Using estimated parameters with the standard K-S test will produce an overly conservative test (inflated p-values).

🔢 Step-by-Step: Performing the Two-Sample K-S Test

Step 1 — State the Hypotheses

Clearly define what you are testing:

H₀: The two samples are drawn from the same continuous distribution.
H₁: The two samples are drawn from different distributions.
Choose the significance level α (typically 0.05).

Step 2 — Compute the ECDFs

Sort each sample independently. For sample 1 with n₁ observations sorted as x₍₁₎ ≤ x₍₂₎ ≤ … ≤ x₍n₁₎:

F_{n_1}(x) = \frac{i}{n_1} \quad \text{for } x_{(i)} \leq x < x_{(i+1)}

Repeat for sample 2 to obtain G_{n₂}(x).

Step 3 — Compute the K-S Statistic D

Merge and sort the two sample arrays. At each unique value v in the combined sorted list:

D = \max_{v} \left|F_{n_1}(v) - G_{n_2}(v)\right|

Step 4 — Determine the Critical Value

For large samples, the critical value at significance level α is:

D_{\text{crit}} = c(\alpha) \cdot \sqrt{\frac{n_1 + n_2}{n_1 \cdot n_2}}

Where the constants c(α) are:

α	c(α)	Confidence Level
0.10	1.2238	90%
0.05	1.3581	95%
0.01	1.6276	99%
0.001	1.9495	99.9%

Step 5 — Compute the p-value

Using the asymptotic Kolmogorov distribution, an approximate p-value is:

p \approx 2\exp\!\left(-2\lambda^2\right), \quad \lambda = D \cdot \sqrt{\frac{n_1 \cdot n_2}{n_1 + n_2}}

Step 6 — Make a Decision

\text{Reject } H_0 \iff D > D_{\text{crit}} \iff p < \alpha

Both conditions are mathematically equivalent for the asymptotic test. If you reject H₀, conclude the two samples come from different distributions. Failing to reject does not prove they are the same — only that there is insufficient evidence to distinguish them at the chosen α.

🧮 Worked Example: Full K-S Test Calculation

Scenario: A pharmaceutical researcher wants to determine whether the body temperature (°C) distribution of patients in Group A (treated) differs from Group B (control). She collects n₁ = n₂ = 8 measurements from each group.

Data

Group A (Treated)	Group B (Control)
36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.1	36.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9

Step 1: Sort Each Sample (already sorted)

Group A sorted: 36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.1
Group B sorted: 36.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9

Step 2: Compute ECDFs and Find Maximum Difference

x	F_A(x)	F_B(x)	\|F_A − F_B\|
36.4	1/8 = 0.125	0/8 = 0.000	0.125
36.6	1/8 = 0.125	1/8 = 0.125	0.000
36.8	2/8 = 0.250	1/8 = 0.125	0.125
36.9	2/8 = 0.250	2/8 = 0.250	0.000
37.0	3/8 = 0.375	3/8 = 0.375	0.000
37.1	4/8 = 0.500	3/8 = 0.375	0.125
37.2	4/8 = 0.500	4/8 = 0.500	0.000
37.3	5/8 = 0.625	4/8 = 0.500	0.125
37.4	5/8 = 0.625	5/8 = 0.625	0.000
37.5	6/8 = 0.750	5/8 = 0.625	0.125
37.6	6/8 = 0.750	6/8 = 0.750	0.000
37.7	7/8 = 0.875	6/8 = 0.750	0.125
37.8	7/8 = 0.875	7/8 = 0.875	0.000
37.9	7/8 = 0.875	8/8 = 1.000	0.125
38.1	8/8 = 1.000	8/8 = 1.000	0.000

Step 3: K-S Statistic

D = \max_x\left|F_A(x) - F_B(x)\right| = 0.125

Step 4: Critical Value at α = 0.05, n₁ = n₂ = 8

D_{\text{crit}} = 1.3581 \times \sqrt{\frac{8+8}{8 \times 8}} = 1.3581 \times \sqrt{\frac{16}{64}} = 1.3581 \times 0.5 = 0.6791

Step 5: Conclusion

D = 0.125 < D_{\text{crit}} = 0.679 \implies \textbf{Fail to reject } H_0

There is insufficient evidence at α = 0.05 to conclude that the temperature distributions of the two groups are different. The two ECDFs are very close to each other throughout the range, which is confirmed by the small D = 0.125. Note: with only n = 8 per group, the test has limited power to detect moderate differences.

📊 Critical Values and P-Value Interpretation

Interpreting the p-value

The p-value in the K-S test is the probability of observing a test statistic as extreme as D under H₀. It is approximately:

Approximate p-value (asymptotic) \[p \approx 2\sum_{k=1}^{\infty}(-1)^{k+1} e^{-2k^2\lambda^2} \approx 2e^{-2\lambda^2}, \quad \lambda = D\sqrt{\frac{n_1 n_2}{n_1+n_2}}\]

The single-term approximation \(p \approx 2e^{-2\lambda^2}\) is accurate for λ > 0.5 and provides an upper bound on the true p-value. For small samples, exact tables (Massey 1951, Smirnov 1948) should be used.

Standard Critical Value Reference Table

n (per group)	α = 0.10	α = 0.05	α = 0.01
10	0.369	0.409	0.486
20	0.265	0.294	0.352
30	0.218	0.242	0.290
50	0.170	0.188	0.225
100	0.121	0.134	0.161
200	0.086	0.095	0.114
500	0.054	0.060	0.072
Large n	1.22/√n	1.36/√n	1.63/√n

✅ Assumptions of the K-S Test

While the K-S test is nonparametric and distribution-free, it is not assumption-free. Violating these conditions can invalidate the test:

Independence within samples: Each observation must be independently and identically distributed (i.i.d.). Correlated observations (time series, clustered data) violate this assumption.
Independence between samples: The two samples must be independent of each other. Paired data (before/after on the same subjects) requires paired tests.
Continuous distribution: The underlying distribution must be continuous. The K-S test can be conservative (inflated p-values) when applied to discrete data due to ties at the same value.
Pre-specified parameters (one-sample only): For the one-sample test, the theoretical CDF parameters must be specified in advance, not estimated from the data. If parameters are estimated, use the Lilliefors test.

Ties: When ties occur in the data, the K-S statistic can still be computed, but its null distribution changes. For heavily discretised data, consider Barnard's exact test or a permutation-based K-S test.

⚖️ K-S Test vs. Other Statistical Tests

Test	Parametric?	What it Detects	Best Use Case	Power vs. K-S
K-S Test	No	All distributional differences	General comparison of two distributions	Baseline
t-test (two-sample)	Yes	Difference in means only	Normal data, comparing means	Higher if normality holds
Mann-Whitney U	No	Location shift (stochastic dominance)	Ordinal or skewed data	Higher for location shifts
Anderson-Darling	No (one-sample)	Tail differences (weighted)	Normality testing, tail-sensitive applications	Higher for tail differences
Shapiro-Wilk	No	Departure from normality	Normality testing (n < 2000)	Much higher for normality test
Chi-square GoF	No	Binned frequency differences	Categorical or binned continuous data	Lower (depends on binning)
Cramér–von Mises	No	Integrated squared difference	Sensitive to global distributional differences	Generally comparable

When to Choose the K-S Test

You want to compare full distributions, not just means or medians.
You cannot or do not want to assume a parametric form for the data.
You need a test that detects location, scale, AND shape differences simultaneously.
You want a test with an intuitive, visual representation (the ECDF plot).
Your data is continuous or nearly continuous.

🌐 Real-World Applications of the K-S Test

Finance and Econometrics

The K-S test is frequently used to test whether financial returns follow a specific distribution (e.g., normality, which underpins Black-Scholes option pricing). Value-at-Risk models often assume normal or log-normal returns, and the K-S test can formally challenge this assumption. Quantitative analysts also use K-S tests to detect regime shifts — for example, comparing the distribution of daily S&P 500 returns before and after the 2008 financial crisis.

Clinical Trials and Pharmacology

In clinical research, the K-S test compares the outcome distribution between treatment and control groups. Unlike the t-test, it detects distributional differences even when means are similar — for example, a treatment might shift the tail of the distribution while leaving the median unchanged, which would be missed by a t-test but caught by the K-S test. FDA guidance documents recognise nonparametric tests including K-S for situations where normality cannot be assumed.

Machine Learning and Data Science

A critical challenge in machine learning is covariate shift — when the distribution of input features changes between the training environment and the production environment. The K-S test is widely used to detect such shifts automatically in production monitoring systems. Libraries such as Evidently AI, Deepchecks, and AWS SageMaker Model Monitor implement K-S tests as a core drift detection metric.

Drift Detection Decision Rule \[\text{Trigger retraining if } D\!\left(F_{\text{train}}, G_{\text{prod}}\right) > D_{\text{crit}}(\alpha)\]

Environmental Science and Public Health

Environmental scientists compare pollutant concentration distributions across different sites, seasons, or years. Public health researchers test whether the distribution of disease biomarkers differs between exposed and unexposed populations. The K-S test is particularly valuable here because environmental data is often skewed and non-normal.

Quality Control and Manufacturing

Statistical process control (SPC) uses K-S tests to detect shifts in the distribution of product dimensions, weights, or chemical properties over time. If a production batch shows D > D_crit compared to historical data, it signals a process change requiring investigation.

Physics and Engineering

Particle physicists and signal processing engineers use K-S tests to compare observed event distributions with theoretical predictions. The Large Hadron Collider (LHC) data analysis pipelines employ automated K-S testing to compare energy deposition distributions between simulated and measured data.

⚡ Advantages and Limitations

Advantages

Nonparametric: No assumption about the underlying distribution family — works on any continuous data.
Comprehensive: Detects differences in location, scale, and shape simultaneously.
Interpretable: The statistic D has a direct geometric interpretation (maximum vertical gap between ECDFs).
Exact for finite samples: Exact p-values are available from computed tables (no large-sample approximation needed for small n).
Universal: The K-S distribution is universal — D_crit depends only on n, not on the family of the underlying distributions.
Visual: The ECDF plot provides an immediate visual summary of where the two distributions differ.

Limitations

Pairwise only: Compares exactly two distributions; extending to k > 2 groups requires multiple testing corrections.
Less powerful at tails: The K-S test is less sensitive to differences in the extreme tails. The Anderson-Darling test (which down-weights the centre) is preferred for tail-focused inference.
Conservative with ties: Discrete or heavily tied data inflates the p-value, leading to under-rejection.
Large n over-sensitivity: With very large samples, even trivially small and practically irrelevant distributional differences become statistically significant at any α level.
Does not localise: D tells you that distributions differ but not where or how — supplementary analysis (quantile-quantile plots, density estimates) is needed to characterise the difference.
One-sample limitation: Parameters must be pre-specified; estimating them from the same data invalidates the standard critical values.

❓ Frequently Asked Questions

What is the Kolmogorov-Smirnov test used for?

The K-S test determines whether two samples come from the same probability distribution by comparing their empirical CDFs. It is used in machine learning (drift detection), finance (return distribution testing), medicine (clinical trial comparison), environmental science (site comparison), and quality control (process monitoring).

How do I interpret the K-S statistic D?

D is the maximum absolute vertical difference between the two ECDFs over all x values. D = 0 means the ECDFs are identical (distributions are the same). D = 1 means the distributions are completely non-overlapping. In practice, D values around 0.05–0.30 are typical for moderately different distributions with n = 100–500.

What is the null hypothesis of the K-S test?

H₀: Both samples are drawn from the same continuous distribution (i.e., F₁(x) = F₂(x) for all x). H₁: The distributions differ for at least one value of x. Rejecting H₀ (p < α) means there is statistically significant evidence that the distributions differ.

What sample size does the K-S test require?

The K-S test works for any n ≥ 1, but the asymptotic critical values (based on the Kolmogorov distribution) are most accurate for n ≥ 25–30 per group. For very small samples (n < 20), use exact tables (Massey 1951) or simulation-based p-values for reliable inference.

Can the K-S test be used for discrete data?

Technically yes, but the K-S test is most valid for continuous data. With discrete data, ties at the same value cause the test to be conservative — it under-rejects H₀. For discrete data, consider the chi-squared goodness-of-fit test or exact permutation tests.

What is the difference between the K-S test and the t-test?

The t-test only compares the means of two normally distributed populations. The K-S test is nonparametric and compares entire distributions, detecting any kind of difference (mean, variance, skewness, kurtosis, shape). The t-test is more powerful when normality holds and only mean differences are of interest; the K-S test is more general and robust.

What does "fail to reject H₀" mean?

It means there is insufficient statistical evidence to conclude the two distributions differ at the chosen significance level α. Crucially, it does NOT prove the two distributions are identical — it only means the data cannot distinguish them given the available sample size. The test may simply lack power (especially with small n).

How is the p-value computed in the K-S test?

For large samples, p ≈ 2·exp(−2λ²) where λ = D·√(n₁·n₂/(n₁+n₂)). This is derived from the asymptotic Kolmogorov distribution. For small samples, the exact p-value is computed using the Smirnov-Massey tables or simulation. Most statistical software (R, Python's scipy, SPSS) computes exact p-values numerically.

What is covariate shift and how does the K-S test detect it?

Covariate shift occurs in machine learning when the distribution of input features P(X) changes between training and deployment, even if the conditional output distribution P(Y|X) remains stable. The K-S test detects this by comparing distributions of each feature between training data and new incoming data. A significant D (p < α) for any feature triggers a data drift alert.

Can the K-S test be applied to more than two groups?

The standard K-S test compares exactly two distributions. For k > 2 groups, you have two options: (1) pairwise K-S tests with Bonferroni or Holm correction for multiple comparisons, or (2) the k-sample Anderson-Darling test, which generalises the goodness-of-fit approach to multiple groups simultaneously.

What is the Lilliefors correction and when do I need it?

The Lilliefors test is a correction to the one-sample K-S test when population parameters (μ, σ) are estimated from the same sample being tested. Standard K-S critical values are too large when parameters are estimated, leading to under-rejection. Lilliefors (1967) derived new critical values for this common situation. Always use Lilliefors when testing normality with estimated parameters.

How do I perform the K-S test in Python or R?

In Python: scipy.stats.ks_2samp(sample1, sample2) returns (D_statistic, p_value). In R: ks.test(sample1, sample2) performs the two-sample test. Both automatically compute appropriate p-values. The HeLovesMath calculator above lets you explore K-S test results interactively — without any coding.