18.3 - Sample Percentiles

It can be shown, as the authors of our textbook illustrate, that the order statistics \(Y_1<Y_2<\cdots<Y_n\) partition the support of \(X\) into \(n+1\) parts and thereby create \(n+1\) areas under \(f(x)\) and above the \(x\)-axis, with each of the \(n+1\) areas equaling, on average, \( \dfrac{1}{n+1} \):

Now, if we recall that the (100p)th percentile \(\pi_p\) is such that the area under \(f(x)\) to the left of \(\pi_p\) is \(p\), then the above plot suggests that we should let \(Y_r\) serve as an estimator of \(\pi_p\), where \( p = \dfrac{r}{n+1}\):

It's for this reason that we use the following formal definition of the sample percentile.

(100p)th percentile of the sample: The (100p)th percentile of the sample is defined to be \(Y_r\), the \(r^{th}\) order statistic, where \(r=(n+1)p\). For cases in which \((n+1)p\) is not an integer, we use a weighted average of the two adjacent order statistics \(Y_r\) and \(Y_{r+1}\).

Let's try this definition out an example.

Example 18-3 Section

A report from the Texas Transportation Institute on Congestion Reduction Strategies highlighted the extra travel time (due to traffic congestion) for commute travel per traveler per year in hours for 13 different large urban areas in the United States:

Urban Area	Extra Hours per Traveler Per Year
Philadelphia	40
Miami	48
Phoenix	49
New York	50
Boston	53
Detroit	54
Chicago	55
Dallas-Fort Worth	61
Atlanta	64
Houston	65
Washington, DC	66
San Fransisco	75
Los Angeles	98

Find the first quartile, the 40th percentile, and the median of the sample of \(n=13\) extra travel times.

Answer

Because \(r=(13+1)(0.25)=3.5\), the first quartile, alternatively known as the 25th percentile, is:

\(\tilde{q}_1 =y_3+0.5(y_4-y_3) = 0.5y_3+0.5y_4=0.5(49)+0.5(50)=49.5\)

Because \(r=(13+1)(0.4)=5.6\), the 40th percentile is:

\(\tilde{\pi}_{0.40} = y_5 + 0.6(y_6-y_5) =0.4y_5 + 0.6y_6 =0.4(53)+0.6(54)=53.6\)

Because \(r=(13+1)(0.5)=7\), the median is:

\(\tilde{m} =y_7 = 55\)