It can be shown, as the authors of our textbook illustrate, that the order statistics \(Y_1<Y_2<\cdots<Y_n\) partition the support of \(X\) into \(n+1\) parts and thereby create \(n+1\) areas under \(f(x)\) and above the \(x\)-axis, with each of the \(n+1\) areas equaling, on average, \( \dfrac{1}{n+1} \):
Now, if we recall that the (100p)th percentile \(\pi_p\) is such that the area under \(f(x)\) to the left of \(\pi_p\) is \(p\), then the above plot suggests that we should let \(Y_r\) serve as an estimator of \(\pi_p\), where \( p = \dfrac{r}{n+1}\):
It's for this reason that we use the following formal definition of the sample percentile.
- (100p)th percentile of the sample
- The (100p)th percentile of the sample is defined to be \(Y_r\), the \(r^{th}\) order statistic, where \(r=(n+1)p\). For cases in which \((n+1)p\) is not an integer, we use a weighted average of the two adjacent order statistics \(Y_r\) and \(Y_{r+1}\).
Let's try this definition out an example.
Example 18-3 Section
A report from the Texas Transportation Institute on Congestion Reduction Strategies highlighted the extra travel time (due to traffic congestion) for commute travel per traveler per year in hours for 13 different large urban areas in the United States:
Urban Area | Extra Hours per Traveler Per Year |
---|---|
Philadelphia | 40 |
Miami | 48 |
Phoenix | 49 |
New York | 50 |
Boston | 53 |
Detroit | 54 |
Chicago | 55 |
Dallas-Fort Worth | 61 |
Atlanta | 64 |
Houston | 65 |
Washington, DC | 66 |
San Fransisco | 75 |
Los Angeles | 98 |
Find the first quartile, the 40th percentile, and the median of the sample of \(n=13\) extra travel times.
Answer
Because \(r=(13+1)(0.25)=3.5\), the first quartile, alternatively known as the 25th percentile, is:
\(\tilde{q}_1 =y_3+0.5(y_4-y_3) = 0.5y_3+0.5y_4=0.5(49)+0.5(50)=49.5\)
Because \(r=(13+1)(0.4)=5.6\), the 40th percentile is:
\(\tilde{\pi}_{0.40} = y_5 + 0.6(y_6-y_5) =0.4y_5 + 0.6y_6 =0.4(53)+0.6(54)=53.6\)
Because \(r=(13+1)(0.5)=7\), the median is:
\(\tilde{m} =y_7 = 55\)