Kolmogorov-Smirnov (K-S) Test Calculator
Interactive two-sample K-S test with live ECDF visualization — compare Normal, Uniform, Exponential, and Bimodal distributions with instant results.
📋 Table of Contents
- Interactive K-S Test Calculator
- What is the Kolmogorov-Smirnov Test?
- Historical Background
- Understanding CDFs and ECDFs
- The K-S Statistic: Mathematical Derivation
- One-Sample vs. Two-Sample K-S Test
- Step-by-Step: Performing the K-S Test
- Worked Example with Full Calculation
- Critical Values and P-Values
- Assumptions of the K-S Test
- K-S Test vs. Other Statistical Tests
- Real-World Applications
- Advantages and Limitations
- Frequently Asked Questions
📊 Interactive K-S Test Calculator
Distribution 1
Distribution 2
Test Settings
How to Interpret These Results
The K-S statistic D (green dashed line) is the maximum absolute vertical gap between the two blue/red ECDF curves. If D exceeds the critical value — or equivalently, if p < α — we reject H₀ and conclude the distributions differ. Increase sample size to see how the test becomes more sensitive to smaller differences.
Critical Value Calculator
The critical value is the threshold that D must exceed to reject H₀. For two equal samples of size n at significance level α:
Key Insight
As n increases, Dcrit decreases (∝ 1/√n) — meaning larger samples detect smaller distributional differences. This is why the K-S test applied to very large datasets may reject H₀ for trivially small practical differences. Always consider effect size alongside statistical significance.
Real-World Application Examples
📘 What is the Kolmogorov-Smirnov (K-S) Test?
The Kolmogorov-Smirnov test (K-S test) is a powerful, nonparametric statistical procedure used to determine whether two samples are drawn from the same underlying probability distribution. Unlike parametric tests such as the t-test or ANOVA — which make specific assumptions about the shape of the population distribution — the K-S test is distribution-free: it makes no assumption that your data is normally distributed, Poisson-distributed, or of any other specific family.
The defining feature of the K-S test is the test statistic D, which measures the maximum absolute vertical distance between the two empirical cumulative distribution functions (ECDFs) of the two samples. A large D indicates that the two samples behave very differently across their entire range; a small D suggests they may come from the same distribution.
Key Insight: The K-S test is sensitive to differences in location (where the distribution is centred), scale (how spread out it is), and shape (symmetry, tail weight, modality). This makes it more comprehensive than tests that only assess mean differences.
Core Principles at a Glance
- Distribution-free: No parametric assumption about the data's family of distributions.
- Supremum statistic: D = sup|F₁(x) − F₂(x)| captures the worst-case discrepancy.
- Two flavours: One-sample (data vs. theory) and two-sample (data vs. data).
- Exact for small samples: Exact p-values can be computed via the K-S distribution; asymptotic approximations improve with n.
- Glivenko-Cantelli guarantee: The ECDF converges uniformly to the true CDF as n → ∞, providing the theoretical foundation for the test.
🏛️ Historical Background
The K-S test is named after two giants of 20th-century Russian mathematics:
Andrey Nikolaevich Kolmogorov (1903–1987)
Kolmogorov was one of the most prolific mathematicians in history, making foundational contributions to probability theory (his 1933 axiomatisation of probability is still the standard today), turbulence, algorithmic information theory, and topology. In 1933, he published the limiting distribution of the supremum of the difference between an ECDF and a theoretical CDF for a continuous distribution, establishing the theoretical basis for the one-sample test.
Nikolai Vasilyevich Smirnov (1900–1966)
Smirnov extended Kolmogorov's work to the two-sample case in 1948 and derived the tables of critical values that statisticians used for decades before computers made exact computation feasible. His 1948 paper in The Annals of Mathematical Statistics is one of the most-cited works in nonparametric statistics.
Historical Note: The K-S test predates modern computers. Statisticians originally computed ECDFs by hand and compared them to tables of critical values printed in statistical handbooks — a tedious but important practice that underscores the test's mathematical elegance and simplicity.
📈 Understanding CDFs, ECDFs, and Their Properties
The Cumulative Distribution Function (CDF)
For a random variable X, the CDF is defined as:
The CDF is a non-decreasing, right-continuous function taking values in [0, 1]. It starts at 0 as x → −∞ and approaches 1 as x → +∞. For a normal distribution with mean μ and standard deviation σ:
The Empirical CDF (ECDF)
Given n observations x₁, x₂, …, xₙ from an unknown distribution, the empirical CDF is the step function:
The ECDF assigns probability mass of 1/n to each observation and places a "step" of height 1/n at every observation. The Glivenko-Cantelli Theorem (proof by Kolmogorov, 1933) guarantees that:
This almost-sure uniform convergence is the cornerstone of K-S test theory: as sample size grows, the ECDF becomes an increasingly accurate estimate of the true CDF in an exact, worst-case sense.
Properties of the ECDF
- Non-decreasing step function with jumps of size 1/n at each data point
- F_n(x) = 0 for x < min(x₁,…,xₙ) and F_n(x) = 1 for x ≥ max(x₁,…,xₙ)
- Unbiased estimator of the true CDF: E[F_n(x)] = F(x) for all x
- Pointwise variance: Var[F_n(x)] = F(x)(1−F(x)) / n
📐 The K-S Statistic: Mathematical Derivation
Two-Sample K-S Statistic
Given two independent samples of sizes n₁ and n₂ with ECDFs F_{n₁}(x) and G_{n₂}(x), the two-sample K-S statistic is:
Because ECDFs are step functions that only change at observed data points, the supremum is attained at one of the combined ordered observations. The algorithm evaluates |F_n(x) − G_m(x)| at every unique point in the merged sorted sample and takes the maximum.
One-Sample K-S Statistic
When comparing a sample to a fully specified theoretical CDF F₀(x):
The Null Distribution
Under H₀ (both samples from the same continuous distribution), for the one-sample test:
The function K(z) is the Kolmogorov distribution, a distribution specific to this test. For the two-sample test with unequal sample sizes n₁ and n₂, replace √n with the effective sample size:
🔀 One-Sample vs. Two-Sample K-S Test
| Feature | One-Sample K-S Test | Two-Sample K-S Test |
|---|---|---|
| Purpose | Test if sample follows a specified distribution | Compare two empirical distributions |
| H₀ | F_n(x) = F₀(x) for all x | F_{n₁}(x) = G_{n₂}(x) for all x |
| Comparison | ECDF vs. theoretical CDF | ECDF vs. ECDF |
| Statistic | D_n = sup|F_n(x) − F₀(x)| | D = sup|F_{n₁}(x) − G_{n₂}(x)| |
| Typical use | Normality testing, goodness-of-fit | A/B testing, treatment vs. control |
| Limitation | Parameters must be pre-specified (not estimated from data) | Requires both samples to be continuous |
| Common alternatives | Shapiro-Wilk, Anderson-Darling | Mann-Whitney U, Wilcoxon rank-sum |
Important: If you estimate parameters (μ, σ) from the same sample you are testing against a normal distribution, you must use the Lilliefors test (a modification of the one-sample K-S test) rather than the standard K-S test. Using estimated parameters with the standard K-S test will produce an overly conservative test (inflated p-values).
🔢 Step-by-Step: Performing the Two-Sample K-S Test
Step 1 — State the Hypotheses
Clearly define what you are testing:
- H₀: The two samples are drawn from the same continuous distribution.
- H₁: The two samples are drawn from different distributions.
- Choose the significance level α (typically 0.05).
Step 2 — Compute the ECDFs
Sort each sample independently. For sample 1 with n₁ observations sorted as x₍₁₎ ≤ x₍₂₎ ≤ … ≤ x₍n₁₎:
Repeat for sample 2 to obtain G_{n₂}(x).
Step 3 — Compute the K-S Statistic D
Merge and sort the two sample arrays. At each unique value v in the combined sorted list:
Step 4 — Determine the Critical Value
For large samples, the critical value at significance level α is:
Where the constants c(α) are:
| α | c(α) | Confidence Level |
|---|---|---|
| 0.10 | 1.2238 | 90% |
| 0.05 | 1.3581 | 95% |
| 0.01 | 1.6276 | 99% |
| 0.001 | 1.9495 | 99.9% |
Step 5 — Compute the p-value
Using the asymptotic Kolmogorov distribution, an approximate p-value is:
Step 6 — Make a Decision
Both conditions are mathematically equivalent for the asymptotic test. If you reject H₀, conclude the two samples come from different distributions. Failing to reject does not prove they are the same — only that there is insufficient evidence to distinguish them at the chosen α.
🧮 Worked Example: Full K-S Test Calculation
Scenario: A pharmaceutical researcher wants to determine whether the body temperature (°C) distribution of patients in Group A (treated) differs from Group B (control). She collects n₁ = n₂ = 8 measurements from each group.
Data
| Group A (Treated) | Group B (Control) |
|---|---|
| 36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.1 | 36.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9 |
Step 1: Sort Each Sample (already sorted)
Group A sorted: 36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.1
Group B sorted: 36.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9
Step 2: Compute ECDFs and Find Maximum Difference
| x | F_A(x) | F_B(x) | |F_A − F_B| |
|---|---|---|---|
| 36.4 | 1/8 = 0.125 | 0/8 = 0.000 | 0.125 |
| 36.6 | 1/8 = 0.125 | 1/8 = 0.125 | 0.000 |
| 36.8 | 2/8 = 0.250 | 1/8 = 0.125 | 0.125 |
| 36.9 | 2/8 = 0.250 | 2/8 = 0.250 | 0.000 |
| 37.0 | 3/8 = 0.375 | 3/8 = 0.375 | 0.000 |
| 37.1 | 4/8 = 0.500 | 3/8 = 0.375 | 0.125 |
| 37.2 | 4/8 = 0.500 | 4/8 = 0.500 | 0.000 |
| 37.3 | 5/8 = 0.625 | 4/8 = 0.500 | 0.125 |
| 37.4 | 5/8 = 0.625 | 5/8 = 0.625 | 0.000 |
| 37.5 | 6/8 = 0.750 | 5/8 = 0.625 | 0.125 |
| 37.6 | 6/8 = 0.750 | 6/8 = 0.750 | 0.000 |
| 37.7 | 7/8 = 0.875 | 6/8 = 0.750 | 0.125 |
| 37.8 | 7/8 = 0.875 | 7/8 = 0.875 | 0.000 |
| 37.9 | 7/8 = 0.875 | 8/8 = 1.000 | 0.125 |
| 38.1 | 8/8 = 1.000 | 8/8 = 1.000 | 0.000 |
Step 3: K-S Statistic
Step 4: Critical Value at α = 0.05, n₁ = n₂ = 8
Step 5: Conclusion
There is insufficient evidence at α = 0.05 to conclude that the temperature distributions of the two groups are different. The two ECDFs are very close to each other throughout the range, which is confirmed by the small D = 0.125. Note: with only n = 8 per group, the test has limited power to detect moderate differences.
📊 Critical Values and P-Value Interpretation
Interpreting the p-value
The p-value in the K-S test is the probability of observing a test statistic as extreme as D under H₀. It is approximately:
The single-term approximation \(p \approx 2e^{-2\lambda^2}\) is accurate for λ > 0.5 and provides an upper bound on the true p-value. For small samples, exact tables (Massey 1951, Smirnov 1948) should be used.
Standard Critical Value Reference Table
| n (per group) | α = 0.10 | α = 0.05 | α = 0.01 |
|---|---|---|---|
| 10 | 0.369 | 0.409 | 0.486 |
| 20 | 0.265 | 0.294 | 0.352 |
| 30 | 0.218 | 0.242 | 0.290 |
| 50 | 0.170 | 0.188 | 0.225 |
| 100 | 0.121 | 0.134 | 0.161 |
| 200 | 0.086 | 0.095 | 0.114 |
| 500 | 0.054 | 0.060 | 0.072 |
| Large n | 1.22/√n | 1.36/√n | 1.63/√n |
✅ Assumptions of the K-S Test
While the K-S test is nonparametric and distribution-free, it is not assumption-free. Violating these conditions can invalidate the test:
- Independence within samples: Each observation must be independently and identically distributed (i.i.d.). Correlated observations (time series, clustered data) violate this assumption.
- Independence between samples: The two samples must be independent of each other. Paired data (before/after on the same subjects) requires paired tests.
- Continuous distribution: The underlying distribution must be continuous. The K-S test can be conservative (inflated p-values) when applied to discrete data due to ties at the same value.
- Pre-specified parameters (one-sample only): For the one-sample test, the theoretical CDF parameters must be specified in advance, not estimated from the data. If parameters are estimated, use the Lilliefors test.
Ties: When ties occur in the data, the K-S statistic can still be computed, but its null distribution changes. For heavily discretised data, consider Barnard's exact test or a permutation-based K-S test.
⚖️ K-S Test vs. Other Statistical Tests
| Test | Parametric? | What it Detects | Best Use Case | Power vs. K-S |
|---|---|---|---|---|
| K-S Test | No | All distributional differences | General comparison of two distributions | Baseline |
| t-test (two-sample) | Yes | Difference in means only | Normal data, comparing means | Higher if normality holds |
| Mann-Whitney U | No | Location shift (stochastic dominance) | Ordinal or skewed data | Higher for location shifts |
| Anderson-Darling | No (one-sample) | Tail differences (weighted) | Normality testing, tail-sensitive applications | Higher for tail differences |
| Shapiro-Wilk | No | Departure from normality | Normality testing (n < 2000) | Much higher for normality test |
| Chi-square GoF | No | Binned frequency differences | Categorical or binned continuous data | Lower (depends on binning) |
| Cramér–von Mises | No | Integrated squared difference | Sensitive to global distributional differences | Generally comparable |
When to Choose the K-S Test
- You want to compare full distributions, not just means or medians.
- You cannot or do not want to assume a parametric form for the data.
- You need a test that detects location, scale, AND shape differences simultaneously.
- You want a test with an intuitive, visual representation (the ECDF plot).
- Your data is continuous or nearly continuous.
🌐 Real-World Applications of the K-S Test
Finance and Econometrics
The K-S test is frequently used to test whether financial returns follow a specific distribution (e.g., normality, which underpins Black-Scholes option pricing). Value-at-Risk models often assume normal or log-normal returns, and the K-S test can formally challenge this assumption. Quantitative analysts also use K-S tests to detect regime shifts — for example, comparing the distribution of daily S&P 500 returns before and after the 2008 financial crisis.
Clinical Trials and Pharmacology
In clinical research, the K-S test compares the outcome distribution between treatment and control groups. Unlike the t-test, it detects distributional differences even when means are similar — for example, a treatment might shift the tail of the distribution while leaving the median unchanged, which would be missed by a t-test but caught by the K-S test. FDA guidance documents recognise nonparametric tests including K-S for situations where normality cannot be assumed.
Machine Learning and Data Science
A critical challenge in machine learning is covariate shift — when the distribution of input features changes between the training environment and the production environment. The K-S test is widely used to detect such shifts automatically in production monitoring systems. Libraries such as Evidently AI, Deepchecks, and AWS SageMaker Model Monitor implement K-S tests as a core drift detection metric.
Environmental Science and Public Health
Environmental scientists compare pollutant concentration distributions across different sites, seasons, or years. Public health researchers test whether the distribution of disease biomarkers differs between exposed and unexposed populations. The K-S test is particularly valuable here because environmental data is often skewed and non-normal.
Quality Control and Manufacturing
Statistical process control (SPC) uses K-S tests to detect shifts in the distribution of product dimensions, weights, or chemical properties over time. If a production batch shows D > D_crit compared to historical data, it signals a process change requiring investigation.
Physics and Engineering
Particle physicists and signal processing engineers use K-S tests to compare observed event distributions with theoretical predictions. The Large Hadron Collider (LHC) data analysis pipelines employ automated K-S testing to compare energy deposition distributions between simulated and measured data.
⚡ Advantages and Limitations
Advantages
- Nonparametric: No assumption about the underlying distribution family — works on any continuous data.
- Comprehensive: Detects differences in location, scale, and shape simultaneously.
- Interpretable: The statistic D has a direct geometric interpretation (maximum vertical gap between ECDFs).
- Exact for finite samples: Exact p-values are available from computed tables (no large-sample approximation needed for small n).
- Universal: The K-S distribution is universal — D_crit depends only on n, not on the family of the underlying distributions.
- Visual: The ECDF plot provides an immediate visual summary of where the two distributions differ.
Limitations
- Pairwise only: Compares exactly two distributions; extending to k > 2 groups requires multiple testing corrections.
- Less powerful at tails: The K-S test is less sensitive to differences in the extreme tails. The Anderson-Darling test (which down-weights the centre) is preferred for tail-focused inference.
- Conservative with ties: Discrete or heavily tied data inflates the p-value, leading to under-rejection.
- Large n over-sensitivity: With very large samples, even trivially small and practically irrelevant distributional differences become statistically significant at any α level.
- Does not localise: D tells you that distributions differ but not where or how — supplementary analysis (quantile-quantile plots, density estimates) is needed to characterise the difference.
- One-sample limitation: Parameters must be pre-specified; estimating them from the same data invalidates the standard critical values.
❓ Frequently Asked Questions
HeLovesMath.com — Free Statistics Calculators, Data Science Tools & Educational Resources
© HeLovesMath. All calculators are free for educational and research use. | Visit HeLovesMath.com
Disclaimer: p-values computed using the asymptotic Kolmogorov approximation. For very small samples (n < 25), consult exact K-S tables. This tool is intended for educational purposes and should not replace formal statistical analysis in peer-reviewed research.
