20.4 - Modifying List Input

List input can be made even more versatile by using what is called modified list input. Modified list input entails using either the ampersand (&) modifier or the colon (:) modifier:

  • The ampersand (&) modifier allows you to read character values that contain embedded blanks.
  • The colon (:) modifier allows you to read nonstandard data values and character values that are longer than eight characters, but which have no embedded blanks.

Let's take a look at an example in which modified list input would be useful.

Example 20.13 Section

The following program attempts to use list input to read the populations of the ten most populous cities in the United States into a temporary SAS data set called citypops, but the program fails:

DATA reading;
DATA citypops;
   infile DATALINES FIRSTOBS = 2;
   input city pop2000;
   DATALINES;
City  Yr2000Popn
New York  8,008,278
Los Angeles  3,694,820
Chicago  2,896,016
Houston  1,953,631
Philadelphia  1,517,550
Phoenix  1,321,045
San Antonio  1,144,646
San Diego  1,223,400
Dallas  1,188,580
San Jose  894,943
;
RUN;
 
PROC PRINT data = citypops;
   title 'The citypops data set';
RUN;

The citypops data set
Obscitypop2000
1..
2..
3..
4..
5..
6..
7..
8..
9..
10..

In reviewing the data, the first thing you might notice is that this particular input data file contains a header row:

City  Yr2000Popn

that reports the content of each record. You may often find yourself in a situation in which someone has handed you such a data file, that is, one containing headings in addition to the columns of data. In general, that's a good thing, since then you know for sure what each record contains. It creates a problem though for reading in the data unless you tell SAS to disregard the heading information. That's just what the FIRSTOBS = 2 option in the INFILE statement tells SAS to do. It tells SAS to begin reading data at line 2 instead of the default line 1.

Moving past the header row, you should note the important features of the data. The longest city name is 12 characters. Some of the cities — New York, for example — contain embedded blanks. There are two blank spaces between the city names and their populations. Finally, because the population values contain commas, they are nonstandard values that require an informat during input. Given this list of features, it shouldn't be surprising that the standard list input style used in the INPUT statement fails.

Launch and run  the SAS program, and review the output to convince yourself that SAS encounters a serious problem when attempting to read the data into the citypops data set.

The Ampersand (&) Modifier Section

Because the ampersand (&) modifier allows us to use list input to read character values containing single embedded blanks, it is the tool that we will want to use to read in the city names.

Example 20.14 

The following program uses list input modified with an ampersand (&) to read in the city and the population values of the ten most populous cities in the United States in the year 2000:

DATA citypops;
	infile DATALINES FIRSTOBS = 2;
	length city $ 12;
	input city & pop2000;
	DATALINES;
City  Yr2000Popn
New York  8008278
Los Angeles  3694820
Chicago  2896016
Houston  1953631
Philadelphia  1517550
Phoenix  1321045
San Antonio  1144646
San Diego  1223400
Dallas  1188580
San Jose  894943
;
RUN;
 
PROC PRINT data = citypops;
	title 'The citypops data set';
	format pop2000 comma10.;
RUN;

The citypops data set
Obscitypop2000
1New York8,008,278
2Los Angeles3,694,820
3Chicago2,896,016
4Houston1,953,631
5Philadelphia1,517,550
6Phoenix1,321,045
7San Antonio1,144,646
8San Diego1,223,400
9Dallas1,188,580
10San Jose894,943

Comparing this program to the previous program you should note four differences:

  1. The LENGTH statement tells SAS, in the compile phase, to define the city variable as a character variable, and to expect the city names to be as long as 12 characters.
  2. The ampersand (&) that follows the city variable in the INPUT statement tells SAS that the city values may contain one or more single embedded blanks. Because the ampersand modifier is used, SAS will read the city value until two or more consecutive blanks are encountered. That is a very important point ... when you use ampersand modified list input, the values that you are reading in must be separated by two or more consecutive blanks. You cannot use any other delimiter to indicate the end of each field.
  3. The commas have been removed from the population values so that SAS can read in the population values using unmodified (standard) list input for the pop2000 variable.
  4. A FORMAT statement has been added to the PRINT procedure just so that the pop2000 values are displayed with commas.

Launch and run  the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are read in properly.

Example 20.15

Rather than using a LENGTH statement to define the type and length of the city variable, we can place a $w. character informat right in the INPUT statement. The only difference between the following program and the previous one is that the LENGTH statement has been removed, and the $12. character informat has been inserted into the INPUT statement immediately following the city variable's ampersand (&) modifier:

DATA citypops;
infile DATALINES FIRSTOBS = 2;
input city & $12. pop2000;
DATALINES;
City  Yr2000Popn
New York  8008278
Los Angeles  3694820
Chicago  2896016
Houston  1953631
Philadelphia  1517550
Phoenix  1321045
San Antonio  1144646
San Diego  1223400
Dallas  1188580
San Jose  894943
;
RUN;
 
PROC PRINT data = citypops;
	title 'The citypops data set';
	format pop2000 comma10.;
RUN;

The citypops data set
Obscitypop2000
1New York8,008,278
2Los Angeles3,694,820
3Chicago2,896,016
4Houston1,953,631
5Philadelphia1,517,550
6Phoenix1,321,045
7San Antonio1,144,646
8San Diego1,223,400
9Dallas1,188,580
10San Jose894,943

Launch and run  the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are again read in properly.

The Colon (:) Modifier Section

The colon (:) modifier allows us to use list input to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks. The colon (:) indicates that values are read until a blank (or other delimiters) is encountered, and then an informat is applied. If an informat for reading character values is specified, the w value specifies the variable's length, overriding the default length of 8.

Example 20.1

The following program uses the colon (:) modifier to tell SAS to expect commas when reading in the values for the pop2000 variable:

DATA citypops;
	infile DATALINES FIRSTOBS = 2;
	input city & $12. pop2000 : comma.;
	DATALINES;
City  Yr2000Popn
New York  8,008,278
Los Angeles  3,694,820
Chicago  2,896,016
Houston  1,953,631
Philadelphia  1,517,550
Phoenix  1,321,045
San Antonio  1,144,646
San Diego  1,223,400
Dallas  1,188,580
San Jose  894,943
;
RUN;

PROC PRINT data = citypops;
	title 'The citypops data set';
	format pop2000 comma10.;
RUN;

The citypops data set
Obscitypop2000
1New York8,008,278
2Los Angeles3,694,820
3Chicago2,896,016
4Houston1,953,631
5Philadelphia1,517,550
6Phoenix1,321,045
7San Antonio1,144,646
8San Diego1,223,400
9Dallas1,188,580
10San Jose894,943

Comparing this program to the previous program you should note just two differences:

  1. The commas have been added back into the population values so that we can see how to use the colon (:) modifier to read in nonstandard data values while still using list input.
  2. The colon (:) and comma. informat that follows the pop2000 variable in the INPUT statement tells SAS to expect population values to contain nonstandard characters — commas, in this particular instance. As illustrated here, we need not specify a w value when using the COMMAw.d informat here. That's because list input just reads each value until a blank is detected. (This differs from using a numeric informat with formatted input, in which we must specify a w value in order to tell SAS how many columns to read.)

Launch and run  the SAS program, and review the output to convince yourself that the values for both the city and pop2000 variables are again read in properly.

Comparing Formatted Input and Modified List Input Section

It is important to keep in mind that informats work differently in modified list input than they do in formatted input. So, let's emphasize the point! With formatted input, the informat determines both the length of character variables and the number of columns that are read. The same number of columns are read from each record. For example, the following INPUT statement using formatted input:

input @1 City $12. @15 Pop2000 comma10.;

uses the $12. character informat to tell SAS to set the length of the city variable to 12 as well as to read columns 1 to 12 when reading in these data values:

CityPop2000
New York8,008,278
Los Angeles3,694,820
Chicago2,896,016
Huston1,953,631
Philadelphia1,517,550

The informat in the modified list input, on the other hand, determines only the length of the modified variable, not the number of columns that are read. Here:

input city & $12. pop2000 : comma.;

the raw data values are read until two consecutive blanks are encountered when reading in these data values suitable for list input:

CityPop2000
New York8,008,278
Los Angeles3,694,820
Chicago2,896,016
Huston1,953,631
Philadelphia1,517,550