Free reference·Applied Mathematics and Statistics

Probability and Statistics

Mean, median, mode, hypothesis testing, normal distribution, linear regression — applied to survey data.

The hook

Surveying measurements live and die by the normal distribution. Random errors cluster around the mean, fall off symmetrically, and 68% land within ±1 standard deviation. Once you internalize this curve, error propagation, blunder detection, and confidence intervals all click.

μ (mean)−1σ+1σ68%−2σ+2σ95% within ±2σ
Standard normal curve. ±1σ contains 68%, ±2σ contains 95%, ±3σ contains 99.7%. Anything beyond ±3σ is suspicious — likely a blunder.
Memorize these

Concepts that show up on the exam

Mean (μ, x̄)
Arithmetic average of the observations. Center of the distribution; least-squares estimate when all weights are equal.
Median
Middle value when sorted. For symmetric distributions equals the mean. More robust to outliers.
Mode
The most frequent value. For a continuous normal distribution it equals the mean; for a histogram, the tallest bar.
Standard deviation (σ, s)
Square root of the variance. The "1σ" interval contains 68% of normal observations. Use n−1 in the denominator for sample (s); n for population (σ).
Variance (σ²)
Average squared deviation from the mean. Adds linearly when independent random variables are summed (variances add, NOT standard deviations).
Confidence interval
Range expected to contain the true value with a stated probability. ±1σ ≈ 68%, ±1.96σ ≈ 95%, ±2.58σ ≈ 99%.
Hypothesis test
H₀ (null) vs. H₁ (alternative). Reject H₀ if test statistic is too extreme to have occurred by chance at chosen α (typically 0.05).
Linear regression
Fits a line ŷ = ax + b by minimizing Σ(yᵢ − ŷᵢ)². slope a = Σ((xᵢ − x̄)(yᵢ − ȳ)) / Σ(xᵢ − x̄)². intercept b = ȳ − a · x̄.
Keep these in muscle memory

Formulas to know cold

Sample mean
x̄ = (Σ xᵢ) / n
Sample standard deviation
s = √( Σ(xᵢ − x̄)² / (n − 1) )
Use n−1 (Bessel correction) for SAMPLES of a larger population. Use n for the entire population.
Standard error of the mean
SE = s / √n
The mean of n observations is √n times more precise than a single observation.
95% confidence interval (large n)
CI = x̄ ± 1.96 · (s / √n)
Try it before you peek

Worked example

The problem
A line is measured 5 times: 312.41, 312.45, 312.39, 312.43, 312.42 ft. Compute the mean, standard deviation, and the 95% confidence interval for the true line length.
Don't fall for these

What trips people up

Using σ when you should use s
Population σ uses n in the denominator; sample s uses n−1. With small n, the difference is significant. Use s unless you literally observed every member of the population.
Confusing σ and σ_mean
σ is the std dev of a single observation; σ/√n is the std dev of the MEAN of n observations. The mean is √n times more precise. Quoting σ when you need σ_mean overstates uncertainty.
Standard deviations don't add
VARIANCES add (when independent). σ² of (a + b) = σ²_a + σ²_b. Adding standard deviations directly gives a wrong (too-pessimistic) total.
Outliers vs. blunders
An observation more than 3σ from the mean has only a 0.3% chance of being a real random error. It's probably a blunder — investigate, don't just downweight.
Test yourself

How well did it stick?

A quick 5-question check on Probability and Statistics. See where you stand and what to review.

Related: Applied Mathematics and Statistics
Free · 2 minutes

Not sure what to learn next?

Tell us where you are and what you want to get better at, and we'll build you a personalized path through these free modules — with your progress tracked as you go.