32.4 - Sorting Data

32.4 - Sorting Data

The SELECT and FROM clauses are indispensable in the SELECT statement. Other clauses may be optional but very useful when querying a table. Last section introduced the WHERE clause and how to use it to select rows conditionally. From previous examples, you may have noticed that the order of observations might remain the same as they were in original data set. If, however, you want to specify the order of data, you will need the ORDER BY clause to sort the data as you want.

Example 32.13

The following SAS program uses the ORDER BY in SELECT statement inside PROC SQL to sort the data in the file survey.sas7bdat by the values of gender and GPA:

PROC SQL;
	select ID,
		Gender,
        GPA,
		SATM,
		SATV
	from stat482.survey
	where SATV is not null and GPA>3
	order by Gender, GPA ;
QUIT;

Download survey data to your own computer and change the directory if needed to reflect the local directory where you saved the file. Then launch and run the SAS program, and review the output to convince yourself that the query result is in order first by gender and then by GPA.

Several things need to be pointed out regarding the above program:

  1. You can use one or more column in ORDER BY to sort the data. Comma is used to separate multiple column names. In this example, two columns have been used, Gender and GPA. So the data will be sorted by Gender first, then by GPA in order.
  2. By default, the values of column(s) will be sorted ascendingly. For example, there are two values in Gender, Female and Male. In the query result, Female records are listed first, then male ones because SAS sorted them by the first letter in alphabetical order. As to GPA order, since it’s numeric, SAS sorted observations by number values of GPA inside each gender group.
  3. TheWHERE clause is used to select observations that her/his SAT verbal score is not missing and GPA greater than 3. “is not null” and “is not missing” are interchangeable to indicate no missing values included.

As in PROC SORT, if you want to change the default ascending order into descending order, you just need to specify DESC following the column name.

Example 32.14

The following SAS program sorts the data survey.sas7bdat by the values of gender in descending order then by GPA ascendingly:

PROC SQL;
	select ID,
			Gender,
			GPA,
			SATM,
			SATV
	from stat482.survey
	where SATV is not null and GPA>3
	order by Gender desc, 3 ;
QUIT;

There are only two places that are different from the program in the previous example. DESC has been added after Gender to tell SAS to sort the data descending. Another way to refer to the column rather than its name is its location in the SELECT clause. GPA is listed as the third one so that we can use 3 to specify GPA.

Launch and run the SAS program, and then review the output to convince yourself that the output from this query is in descending order of Gender and in ascending order of GPA.

Up until now, you might think that ORDER BY can perform the same as PROC SORT. Actually, it can do more than that. Let’s find out with the next example.

Example 32.15

The following program sorts the survey data first by gender in descending order as before, then by mean values of SAT math and verbal scores in ascending order:

PROC SQL;
	select *
	from stat482.survey
	where SATV is not null and GPA>3
	order by Gender desc, MEAN(SATM,SATV) ;
QUIT;

Since all columns will be used in the query, * is used to specify all columns after SELECT. The WHERE clause remains the same. In the ORDER BY clause, besides Gender, one function is used to calculate the average scores of SATM and SATV, then uses the calculation results to sort the data inside each gender group. To get the same result, could you try other SAS steps and count how many of them will be needed?

Launch and run the SAS program, and review the output to convince yourself that the data has been sorted in desired order.

One more thing, you may notice a note in log window when running this program.

NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.

That’s because MEAN(SATM,SATV) is not listed in the SELECT clause, only in the ORDER BY clause.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility