Data Science

Kolmogorov-Smirnov (K-S) Test Calculator — Free Interactive Nonparametric Tool

Use our free K-S test calculator to compare two distributions (Normal, Uniform, Exponential, Bimodal) with live ECDF visualization, K-S statistic D, p-value, and critical value. Includes full theory, worked examples, and Python/R guidance.

Kolmogorov-Smirnov (K-S) Test Calculator

Interactive two-sample K-S test with live ECDF visualization — compare Normal, Uniform, Exponential, and Bimodal distributions with instant results.

Free Calculator Nonparametric Live ECDF Plot p-value & Critical Value 4 Distribution Types

📊 Interactive K-S Test Calculator

Distribution Comparison
Critical Values
Real-World Examples

Distribution 1

Distribution 2

Test Settings

ECDF — Distribution 1
ECDF — Distribution 2
Max Distance D
K-S Test Results
K-S Statistic (D): —
Critical Value: —
p-value: —
Conclusion: —

How to Interpret These Results

The K-S statistic D (green dashed line) is the maximum absolute vertical gap between the two blue/red ECDF curves. If D exceeds the critical value — or equivalently, if p < α — we reject H₀ and conclude the distributions differ. Increase sample size to see how the test becomes more sensitive to smaller differences.

Critical Value Calculator

The critical value is the threshold that D must exceed to reject H₀. For two equal samples of size n at significance level α:

\[D_{\text{crit}} = \frac{c(\alpha)}{\sqrt{n}}, \quad \text{where } c(0.10)=1.22,\ c(0.05)=1.36,\ c(0.01)=1.63,\ c(0.001)=1.95\]
Critical Values at n = 100:
α = 0.10 → Dcrit = 0.122
α = 0.05 → Dcrit = 0.136
α = 0.01 → Dcrit = 0.163
α = 0.001 → Dcrit = 0.195

Key Insight

As n increases, Dcrit decreases (∝ 1/√n) — meaning larger samples detect smaller distributional differences. This is why the K-S test applied to very large datasets may reject H₀ for trivially small practical differences. Always consider effect size alongside statistical significance.

Real-World Application Examples

Compares stock return distributions before and after a major economic event. The K-S test detects whether return volatility or location shifted significantly.
Example K-S Test Results
K-S Statistic (D): —
Critical Value: —
p-value: —
Conclusion: —

📘 What is the Kolmogorov-Smirnov (K-S) Test?

The Kolmogorov-Smirnov test (K-S test) is a powerful, nonparametric statistical procedure used to determine whether two samples are drawn from the same underlying probability distribution. Unlike parametric tests such as the t-test or ANOVA — which make specific assumptions about the shape of the population distribution — the K-S test is distribution-free: it makes no assumption that your data is normally distributed, Poisson-distributed, or of any other specific family.

The defining feature of the K-S test is the test statistic D, which measures the maximum absolute vertical distance between the two empirical cumulative distribution functions (ECDFs) of the two samples. A large D indicates that the two samples behave very differently across their entire range; a small D suggests they may come from the same distribution.

Key Insight: The K-S test is sensitive to differences in location (where the distribution is centred), scale (how spread out it is), and shape (symmetry, tail weight, modality). This makes it more comprehensive than tests that only assess mean differences.

Core Principles at a Glance

  • Distribution-free: No parametric assumption about the data's family of distributions.
  • Supremum statistic: D = sup|F₁(x) − F₂(x)| captures the worst-case discrepancy.
  • Two flavours: One-sample (data vs. theory) and two-sample (data vs. data).
  • Exact for small samples: Exact p-values can be computed via the K-S distribution; asymptotic approximations improve with n.
  • Glivenko-Cantelli guarantee: The ECDF converges uniformly to the true CDF as n → ∞, providing the theoretical foundation for the test.

🏛️ Historical Background

The K-S test is named after two giants of 20th-century Russian mathematics:

Andrey Nikolaevich Kolmogorov (1903–1987)

Kolmogorov was one of the most prolific mathematicians in history, making foundational contributions to probability theory (his 1933 axiomatisation of probability is still the standard today), turbulence, algorithmic information theory, and topology. In 1933, he published the limiting distribution of the supremum of the difference between an ECDF and a theoretical CDF for a continuous distribution, establishing the theoretical basis for the one-sample test.

Nikolai Vasilyevich Smirnov (1900–1966)

Smirnov extended Kolmogorov's work to the two-sample case in 1948 and derived the tables of critical values that statisticians used for decades before computers made exact computation feasible. His 1948 paper in The Annals of Mathematical Statistics is one of the most-cited works in nonparametric statistics.

Historical Note: The K-S test predates modern computers. Statisticians originally computed ECDFs by hand and compared them to tables of critical values printed in statistical handbooks — a tedious but important practice that underscores the test's mathematical elegance and simplicity.

📈 Understanding CDFs, ECDFs, and Their Properties

The Cumulative Distribution Function (CDF)

For a random variable X, the CDF is defined as:

CDF Definition \[F(x) = P(X \leq x) = \int_{-\infty}^{x} f(t)\,dt \quad \text{(for continuous } X \text{)}\]

The CDF is a non-decreasing, right-continuous function taking values in [0, 1]. It starts at 0 as x → −∞ and approaches 1 as x → +∞. For a normal distribution with mean μ and standard deviation σ:

Normal CDF \[F(x) = \Phi\!\left(\frac{x-\mu}{\sigma}\right) = \frac{1}{2}\left[1 + \operatorname{erf}\!\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right]\]

The Empirical CDF (ECDF)

Given n observations x₁, x₂, …, xₙ from an unknown distribution, the empirical CDF is the step function:

Empirical CDF \[F_n(t) = \frac{1}{n}\sum_{i=1}^{n} \mathbf{1}(x_i \leq t) = \frac{\#\{i : x_i \leq t\}}{n}\]

The ECDF assigns probability mass of 1/n to each observation and places a "step" of height 1/n at every observation. The Glivenko-Cantelli Theorem (proof by Kolmogorov, 1933) guarantees that:

Glivenko-Cantelli Theorem \[\sup_{x \in \mathbb{R}} \left|F_n(x) - F(x)\right| \xrightarrow{\text{a.s.}} 0 \quad \text{as } n \to \infty\]

This almost-sure uniform convergence is the cornerstone of K-S test theory: as sample size grows, the ECDF becomes an increasingly accurate estimate of the true CDF in an exact, worst-case sense.

Properties of the ECDF

  • Non-decreasing step function with jumps of size 1/n at each data point
  • F_n(x) = 0 for x < min(x₁,…,xₙ) and F_n(x) = 1 for x ≥ max(x₁,…,xₙ)
  • Unbiased estimator of the true CDF: E[F_n(x)] = F(x) for all x
  • Pointwise variance: Var[F_n(x)] = F(x)(1−F(x)) / n

📐 The K-S Statistic: Mathematical Derivation

Two-Sample K-S Statistic

Given two independent samples of sizes n₁ and n₂ with ECDFs F_{n₁}(x) and G_{n₂}(x), the two-sample K-S statistic is:

Two-Sample K-S Statistic \[D_{n_1, n_2} = \sup_{x \in \mathbb{R}} \left|F_{n_1}(x) - G_{n_2}(x)\right|\]

Because ECDFs are step functions that only change at observed data points, the supremum is attained at one of the combined ordered observations. The algorithm evaluates |F_n(x) − G_m(x)| at every unique point in the merged sorted sample and takes the maximum.

One-Sample K-S Statistic

When comparing a sample to a fully specified theoretical CDF F₀(x):

One-Sample K-S Statistic \[D_n = \sup_{x \in \mathbb{R}} \left|F_n(x) - F_0(x)\right|\]

The Null Distribution

Under H₀ (both samples from the same continuous distribution), for the one-sample test:

Kolmogorov Distribution (asymptotic) \[P\!\left(\sqrt{n}\,D_n \leq z\right) \to K(z) = 1 - 2\sum_{k=1}^{\infty}(-1)^{k+1} e^{-2k^2z^2} \quad \text{as } n \to \infty\]

The function K(z) is the Kolmogorov distribution, a distribution specific to this test. For the two-sample test with unequal sample sizes n₁ and n₂, replace √n with the effective sample size:

Effective Sample Size \[n_{\text{eff}} = \sqrt{\frac{n_1 \cdot n_2}{n_1 + n_2}}\]

🔀 One-Sample vs. Two-Sample K-S Test

FeatureOne-Sample K-S TestTwo-Sample K-S Test
PurposeTest if sample follows a specified distributionCompare two empirical distributions
H₀F_n(x) = F₀(x) for all xF_{n₁}(x) = G_{n₂}(x) for all x
ComparisonECDF vs. theoretical CDFECDF vs. ECDF
StatisticD_n = sup|F_n(x) − F₀(x)|D = sup|F_{n₁}(x) − G_{n₂}(x)|
Typical useNormality testing, goodness-of-fitA/B testing, treatment vs. control
LimitationParameters must be pre-specified (not estimated from data)Requires both samples to be continuous
Common alternativesShapiro-Wilk, Anderson-DarlingMann-Whitney U, Wilcoxon rank-sum

Important: If you estimate parameters (μ, σ) from the same sample you are testing against a normal distribution, you must use the Lilliefors test (a modification of the one-sample K-S test) rather than the standard K-S test. Using estimated parameters with the standard K-S test will produce an overly conservative test (inflated p-values).

🔢 Step-by-Step: Performing the Two-Sample K-S Test

Step 1 — State the Hypotheses

Clearly define what you are testing:

  • H₀: The two samples are drawn from the same continuous distribution.
  • H₁: The two samples are drawn from different distributions.
  • Choose the significance level α (typically 0.05).

Step 2 — Compute the ECDFs

Sort each sample independently. For sample 1 with n₁ observations sorted as x₍₁₎ ≤ x₍₂₎ ≤ … ≤ x₍n₁₎:

\[F_{n_1}(x) = \frac{i}{n_1} \quad \text{for } x_{(i)} \leq x < x_{(i+1)}\]

Repeat for sample 2 to obtain G_{n₂}(x).

Step 3 — Compute the K-S Statistic D

Merge and sort the two sample arrays. At each unique value v in the combined sorted list:

\[D = \max_{v} \left|F_{n_1}(v) - G_{n_2}(v)\right|\]

Step 4 — Determine the Critical Value

For large samples, the critical value at significance level α is:

\[D_{\text{crit}} = c(\alpha) \cdot \sqrt{\frac{n_1 + n_2}{n_1 \cdot n_2}}\]

Where the constants c(α) are:

αc(α)Confidence Level
0.101.223890%
0.051.358195%
0.011.627699%
0.0011.949599.9%

Step 5 — Compute the p-value

Using the asymptotic Kolmogorov distribution, an approximate p-value is:

\[p \approx 2\exp\!\left(-2\lambda^2\right), \quad \lambda = D \cdot \sqrt{\frac{n_1 \cdot n_2}{n_1 + n_2}}\]

Step 6 — Make a Decision

\[\text{Reject } H_0 \iff D > D_{\text{crit}} \iff p < \alpha\]

Both conditions are mathematically equivalent for the asymptotic test. If you reject H₀, conclude the two samples come from different distributions. Failing to reject does not prove they are the same — only that there is insufficient evidence to distinguish them at the chosen α.

🧮 Worked Example: Full K-S Test Calculation

Scenario: A pharmaceutical researcher wants to determine whether the body temperature (°C) distribution of patients in Group A (treated) differs from Group B (control). She collects n₁ = n₂ = 8 measurements from each group.

Data

Group A (Treated)Group B (Control)
36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.136.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9

Step 1: Sort Each Sample (already sorted)

Group A sorted: 36.4, 36.8, 37.0, 37.1, 37.3, 37.5, 37.7, 38.1
Group B sorted: 36.6, 36.9, 37.0, 37.2, 37.4, 37.6, 37.8, 37.9

Step 2: Compute ECDFs and Find Maximum Difference

xF_A(x)F_B(x)|F_A − F_B|
36.41/8 = 0.1250/8 = 0.0000.125
36.61/8 = 0.1251/8 = 0.1250.000
36.82/8 = 0.2501/8 = 0.1250.125
36.92/8 = 0.2502/8 = 0.2500.000
37.03/8 = 0.3753/8 = 0.3750.000
37.14/8 = 0.5003/8 = 0.3750.125
37.24/8 = 0.5004/8 = 0.5000.000
37.35/8 = 0.6254/8 = 0.5000.125
37.45/8 = 0.6255/8 = 0.6250.000
37.56/8 = 0.7505/8 = 0.6250.125
37.66/8 = 0.7506/8 = 0.7500.000
37.77/8 = 0.8756/8 = 0.7500.125
37.87/8 = 0.8757/8 = 0.8750.000
37.97/8 = 0.8758/8 = 1.0000.125
38.18/8 = 1.0008/8 = 1.0000.000

Step 3: K-S Statistic

\[D = \max_x\left|F_A(x) - F_B(x)\right| = 0.125\]

Step 4: Critical Value at α = 0.05, n₁ = n₂ = 8

\[D_{\text{crit}} = 1.3581 \times \sqrt{\frac{8+8}{8 \times 8}} = 1.3581 \times \sqrt{\frac{16}{64}} = 1.3581 \times 0.5 = 0.6791\]

Step 5: Conclusion

\[D = 0.125 < D_{\text{crit}} = 0.679 \implies \textbf{Fail to reject } H_0\]

There is insufficient evidence at α = 0.05 to conclude that the temperature distributions of the two groups are different. The two ECDFs are very close to each other throughout the range, which is confirmed by the small D = 0.125. Note: with only n = 8 per group, the test has limited power to detect moderate differences.

📊 Critical Values and P-Value Interpretation

Interpreting the p-value

The p-value in the K-S test is the probability of observing a test statistic as extreme as D under H₀. It is approximately:

Approximate p-value (asymptotic) \[p \approx 2\sum_{k=1}^{\infty}(-1)^{k+1} e^{-2k^2\lambda^2} \approx 2e^{-2\lambda^2}, \quad \lambda = D\sqrt{\frac{n_1 n_2}{n_1+n_2}}\]

The single-term approximation \(p \approx 2e^{-2\lambda^2}\) is accurate for λ > 0.5 and provides an upper bound on the true p-value. For small samples, exact tables (Massey 1951, Smirnov 1948) should be used.

Standard Critical Value Reference Table

n (per group)α = 0.10α = 0.05α = 0.01
100.3690.4090.486
200.2650.2940.352
300.2180.2420.290
500.1700.1880.225
1000.1210.1340.161
2000.0860.0950.114
5000.0540.0600.072
Large n1.22/√n1.36/√n1.63/√n

✅ Assumptions of the K-S Test

While the K-S test is nonparametric and distribution-free, it is not assumption-free. Violating these conditions can invalidate the test:

  1. Independence within samples: Each observation must be independently and identically distributed (i.i.d.). Correlated observations (time series, clustered data) violate this assumption.
  2. Independence between samples: The two samples must be independent of each other. Paired data (before/after on the same subjects) requires paired tests.
  3. Continuous distribution: The underlying distribution must be continuous. The K-S test can be conservative (inflated p-values) when applied to discrete data due to ties at the same value.
  4. Pre-specified parameters (one-sample only): For the one-sample test, the theoretical CDF parameters must be specified in advance, not estimated from the data. If parameters are estimated, use the Lilliefors test.

Ties: When ties occur in the data, the K-S statistic can still be computed, but its null distribution changes. For heavily discretised data, consider Barnard's exact test or a permutation-based K-S test.

⚖️ K-S Test vs. Other Statistical Tests

TestParametric?What it DetectsBest Use CasePower vs. K-S
K-S TestNoAll distributional differencesGeneral comparison of two distributionsBaseline
t-test (two-sample)YesDifference in means onlyNormal data, comparing meansHigher if normality holds
Mann-Whitney UNoLocation shift (stochastic dominance)Ordinal or skewed dataHigher for location shifts
Anderson-DarlingNo (one-sample)Tail differences (weighted)Normality testing, tail-sensitive applicationsHigher for tail differences
Shapiro-WilkNoDeparture from normalityNormality testing (n < 2000)Much higher for normality test
Chi-square GoFNoBinned frequency differencesCategorical or binned continuous dataLower (depends on binning)
Cramér–von MisesNoIntegrated squared differenceSensitive to global distributional differencesGenerally comparable

When to Choose the K-S Test

  • You want to compare full distributions, not just means or medians.
  • You cannot or do not want to assume a parametric form for the data.
  • You need a test that detects location, scale, AND shape differences simultaneously.
  • You want a test with an intuitive, visual representation (the ECDF plot).
  • Your data is continuous or nearly continuous.

🌐 Real-World Applications of the K-S Test

Finance and Econometrics

The K-S test is frequently used to test whether financial returns follow a specific distribution (e.g., normality, which underpins Black-Scholes option pricing). Value-at-Risk models often assume normal or log-normal returns, and the K-S test can formally challenge this assumption. Quantitative analysts also use K-S tests to detect regime shifts — for example, comparing the distribution of daily S&P 500 returns before and after the 2008 financial crisis.

Clinical Trials and Pharmacology

In clinical research, the K-S test compares the outcome distribution between treatment and control groups. Unlike the t-test, it detects distributional differences even when means are similar — for example, a treatment might shift the tail of the distribution while leaving the median unchanged, which would be missed by a t-test but caught by the K-S test. FDA guidance documents recognise nonparametric tests including K-S for situations where normality cannot be assumed.

Machine Learning and Data Science

A critical challenge in machine learning is covariate shift — when the distribution of input features changes between the training environment and the production environment. The K-S test is widely used to detect such shifts automatically in production monitoring systems. Libraries such as Evidently AI, Deepchecks, and AWS SageMaker Model Monitor implement K-S tests as a core drift detection metric.

Drift Detection Decision Rule \[\text{Trigger retraining if } D\!\left(F_{\text{train}}, G_{\text{prod}}\right) > D_{\text{crit}}(\alpha)\]

Environmental Science and Public Health

Environmental scientists compare pollutant concentration distributions across different sites, seasons, or years. Public health researchers test whether the distribution of disease biomarkers differs between exposed and unexposed populations. The K-S test is particularly valuable here because environmental data is often skewed and non-normal.

Quality Control and Manufacturing

Statistical process control (SPC) uses K-S tests to detect shifts in the distribution of product dimensions, weights, or chemical properties over time. If a production batch shows D > D_crit compared to historical data, it signals a process change requiring investigation.

Physics and Engineering

Particle physicists and signal processing engineers use K-S tests to compare observed event distributions with theoretical predictions. The Large Hadron Collider (LHC) data analysis pipelines employ automated K-S testing to compare energy deposition distributions between simulated and measured data.

⚡ Advantages and Limitations

Advantages

  • Nonparametric: No assumption about the underlying distribution family — works on any continuous data.
  • Comprehensive: Detects differences in location, scale, and shape simultaneously.
  • Interpretable: The statistic D has a direct geometric interpretation (maximum vertical gap between ECDFs).
  • Exact for finite samples: Exact p-values are available from computed tables (no large-sample approximation needed for small n).
  • Universal: The K-S distribution is universal — D_crit depends only on n, not on the family of the underlying distributions.
  • Visual: The ECDF plot provides an immediate visual summary of where the two distributions differ.

Limitations

  • Pairwise only: Compares exactly two distributions; extending to k > 2 groups requires multiple testing corrections.
  • Less powerful at tails: The K-S test is less sensitive to differences in the extreme tails. The Anderson-Darling test (which down-weights the centre) is preferred for tail-focused inference.
  • Conservative with ties: Discrete or heavily tied data inflates the p-value, leading to under-rejection.
  • Large n over-sensitivity: With very large samples, even trivially small and practically irrelevant distributional differences become statistically significant at any α level.
  • Does not localise: D tells you that distributions differ but not where or how — supplementary analysis (quantile-quantile plots, density estimates) is needed to characterise the difference.
  • One-sample limitation: Parameters must be pre-specified; estimating them from the same data invalidates the standard critical values.

❓ Frequently Asked Questions

What is the Kolmogorov-Smirnov test used for?
The K-S test determines whether two samples come from the same probability distribution by comparing their empirical CDFs. It is used in machine learning (drift detection), finance (return distribution testing), medicine (clinical trial comparison), environmental science (site comparison), and quality control (process monitoring).
How do I interpret the K-S statistic D?
D is the maximum absolute vertical difference between the two ECDFs over all x values. D = 0 means the ECDFs are identical (distributions are the same). D = 1 means the distributions are completely non-overlapping. In practice, D values around 0.05–0.30 are typical for moderately different distributions with n = 100–500.
What is the null hypothesis of the K-S test?
H₀: Both samples are drawn from the same continuous distribution (i.e., F₁(x) = F₂(x) for all x). H₁: The distributions differ for at least one value of x. Rejecting H₀ (p < α) means there is statistically significant evidence that the distributions differ.
What sample size does the K-S test require?
The K-S test works for any n ≥ 1, but the asymptotic critical values (based on the Kolmogorov distribution) are most accurate for n ≥ 25–30 per group. For very small samples (n < 20), use exact tables (Massey 1951) or simulation-based p-values for reliable inference.
Can the K-S test be used for discrete data?
Technically yes, but the K-S test is most valid for continuous data. With discrete data, ties at the same value cause the test to be conservative — it under-rejects H₀. For discrete data, consider the chi-squared goodness-of-fit test or exact permutation tests.
What is the difference between the K-S test and the t-test?
The t-test only compares the means of two normally distributed populations. The K-S test is nonparametric and compares entire distributions, detecting any kind of difference (mean, variance, skewness, kurtosis, shape). The t-test is more powerful when normality holds and only mean differences are of interest; the K-S test is more general and robust.
What does "fail to reject H₀" mean?
It means there is insufficient statistical evidence to conclude the two distributions differ at the chosen significance level α. Crucially, it does NOT prove the two distributions are identical — it only means the data cannot distinguish them given the available sample size. The test may simply lack power (especially with small n).
How is the p-value computed in the K-S test?
For large samples, p ≈ 2·exp(−2λ²) where λ = D·√(n₁·n₂/(n₁+n₂)). This is derived from the asymptotic Kolmogorov distribution. For small samples, the exact p-value is computed using the Smirnov-Massey tables or simulation. Most statistical software (R, Python's scipy, SPSS) computes exact p-values numerically.
What is covariate shift and how does the K-S test detect it?
Covariate shift occurs in machine learning when the distribution of input features P(X) changes between training and deployment, even if the conditional output distribution P(Y|X) remains stable. The K-S test detects this by comparing distributions of each feature between training data and new incoming data. A significant D (p < α) for any feature triggers a data drift alert.
Can the K-S test be applied to more than two groups?
The standard K-S test compares exactly two distributions. For k > 2 groups, you have two options: (1) pairwise K-S tests with Bonferroni or Holm correction for multiple comparisons, or (2) the k-sample Anderson-Darling test, which generalises the goodness-of-fit approach to multiple groups simultaneously.
What is the Lilliefors correction and when do I need it?
The Lilliefors test is a correction to the one-sample K-S test when population parameters (μ, σ) are estimated from the same sample being tested. Standard K-S critical values are too large when parameters are estimated, leading to under-rejection. Lilliefors (1967) derived new critical values for this common situation. Always use Lilliefors when testing normality with estimated parameters.
How do I perform the K-S test in Python or R?
In Python: scipy.stats.ks_2samp(sample1, sample2) returns (D_statistic, p_value). In R: ks.test(sample1, sample2) performs the two-sample test. Both automatically compute appropriate p-values. The HeLovesMath calculator above lets you explore K-S test results interactively — without any coding.
Shares:

Related Posts