13.3 - Order Statistics and Sample Percentiles

13.3 - Order Statistics and Sample Percentiles

The primary advantage of creating an ordered stem-and-leaf plot is that you can readily read what are called the order statistics right off of the plot. If we have a sample of \(n\) observations represented as:

\(x_1,x_2,x_3,\cdots,x_n\)

then when the observations are ordered from smallest to largest, the resulting ordered data are called the order statistics of the sample, and are represented as:

\(y_1 \leq y_2 \leq y_3 \leq \cdots \leq y_n\)

That is, \(y_1\), the smallest data point is the first order statistic. The second smallest data point, \(y_2\), is the second order statistic. And so on, until we reach the largest data point and \(n^{th}\) order statistic, \(y_n\). From the order statistics, it is rather easy to find the sample percentiles.

Definition. If \(0<p<1\), then the \((100p)^{th}\) sample percentile has approximately \(np\) sample observations less than it, and \(n(1-p)\) sample observations greater than it.

Some sample percentiles have special names:

  • The 25th percentile is also called the first quartile and is denoted as \(q_1\).
  • The 50th percentile is also called the second quartile or median, and is denoted as \(q_2\) or \(m\).
  • The 75th percentile is also called the third quartile and is denoted as \(q_3\).
The interquartile range (IQR) is the difference between the first and third quartiles.

Here's the typical method used for finding a particular sample percentile:

  1. Arrange the sample data in increasing order. That is, determine the order statistics:

    \(y_1 \leq y_2 \leq y_3 \leq \cdots \leq y_n\)

  2. If \((n+1)p\) is an integer, then the \((100p)^{th}\) sample percentile is the \((n+1)p^{th}\) order statistic.

  3. If \((n+1)p\) is not an integer, but rather equals \(r\) plus some proper fraction, \(a/b\) say, then use a weighted average of the \(r^{th}\) and \((r+1)^{st}\) order statistics. That is, define the \((100p)^{th}\) sample percentile as:

    \(\tilde{\pi}_p=y_r+\left(\dfrac{a}{b}\right)(y_{r+1}-y_r)\)

Let's try this method out on an example or two.

Example 13-3 Revisited

Let's return to our random sample of 64 people selected to take the Stanford-Binet Intelligence Test. The resulting 64 IQs were sorted as follows:

68 75 78 83 85 85 85 86 86 87
84 88 90 91 91 91 91 93 93 93
94 94 94 96 96 97 98 98 99 99
99 99 100 101 101 102 102 104 104 105
105 105 106 106 106 107 107 107 107 107
108 109 110 110 111 114 116 116 117 122
123 128 136 141            

That is, the first order statistic is \(y_1=68\), the second-order statistic is \(y_2=75\), and the \(64^{th}\) order statistic is \(y_{64}=141\). Find the 25th sample percentile, the 50th sample percentile, 75th sample percentile, and the interquartile range.

Solution

Here, we have \(n=64\) IQs. To find the 25th sample percentile, we need to consider \(p=0.25\). In that case:

\((n+1)p=(64+1)(0.25)=(65)(0.25)=16.25\)

Because 16.25 is not an integer, we are going to need to interpolate linearly between the 16th order statistic (91) and 17th order statistic (91). That is, the 25th sample percentile (or first quartile) is 91, as determined by:

\(\tilde{\pi}_{0.25}=y_{16}+(0.25)(y_{17}-y_{16})=91+0.25(91-91)=91\)

To find the 50th sample percentile, we need to consider \(p=0.50\). In that case:

\((n+1)p=(64+1)(0.5)=(65)(0.5)=32.5\)

Because 32.5 is not an integer, we are going to need to interpolate linearly between the 32nd order statistic (99)and 33rd order statistic (100). That is, the 50th sample percentile (or second quartile or median) is 99.5 as determined by:

\(\tilde{\pi}_{0.5}=y_{32}+(0.5)(y_{33}-y_{32})=99+0.5(100-99)=99.5\)

To find the 75th sample percentile, we need to consider \(p=0.75\). In that case:

\((n+1)p=(64+1)(0.75)=(65)(0.75)=48.75\)

Because 48.75 is not an integer, we are going to need to interpolate linearly between the 48th order statistic (107) and 49th order statistic (107). That is, the 75th sample percentile (or third quartile) is 107 as determined by:

\(\tilde{\pi}_{0.75}=y_{48}+(0.75)(y_{49}-y_{48})=107+0.75(107-107)=107\)

The interquartile range IQR is then 107−91 = 16.

Example 13-3 Revisited again

Let's return again to our IQ data, but this time suppose that the person deemed to have the largest IQ (141) couldn't take the pressure of the test and fainted before completing the test. In that case, the sorted data of the now \(n=63\) IQs look like this:

68 75 78 83 85 85 85 86 86 87
87 88 90 91 91 91 91 93 93 93
94 94 94 96 96 97 98 98 99 99
99 99 100 101 101 102 102 104 104 105
105 105 106 106 106 107 107 107 107 107
108 109 110 110 111 114 116 116 117 122
123 128 136              

You should notice that the once largest observation (141) no longer exists in the data set. Find the 25th sample percentile, the 50th sample percentile, 75th sample percentile, and the interquartile range.

Solution

Here, we have \(n=63\) IQs. To find the 25th sample percentile, we need to consider \(p=0.25\). In that case:

\((n+1)p=(63+1)(0.25)=(64)(0.25)=16\)

Because 16 is an integer, the 25th sample percentile (or first quartile) is readily determined to be the 16th order statistic, that is, 91.

To find the 50th sample percentile, we need to consider \(p=0.50\). In that case:

\((n+1)p=(63+1)(0.5)=(64)(0.5)=32\)

Because 32 is an integer, the 50th sample percentile (or second quartile or median) is readily determined to be the 32nd order statistic, that is 99.

To find the 75th sample percentile, we need to consider \(p=0.75\). In that case:

\((n+1)p=(63+1)(0.75)=(64)(0.75)=48\)

Because 48 is an integer, the 75th sample percentile (or third quartile) is readily determined to be the 48th order statistic, that is, 107.

The interquartile range IQR is then again 107−91 = 16.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility