32.7 - Querying Multiple Tables

So far, all the examples in this lesson are querying a single table. However, as a matter of fact, you can specify multiple tables in the FROM clause. Querying more than one table at a time makes PROC SQL even more powerful in data manipulation.

The following examples use two tables:

Survey Data (survey.sas7bdat) contains:

ID, Gender, GPA, SmokeCigarrets, SATM, SATV

id Gender GPA SmokeCig SATM SATV
1001 Male 2.67 No 700 700
1002 Female 2.98 No 700 500
1003 Female 2.67 No 470 470
1004 Female 3.6 No 710 560
1005 Female 3.76 No 600 520
1006 Male 3.86 No 610 720
1007 Male 3.94 No 710 670
1008 Male 2.8 Yes 610 580
1009 Male 3.48 No 690 620

Survey2 Data (survey2.sas7bdat) contains:

ID, Seating, DiveInfluence, Height, Weight

id Seating DriverInfluen Height Weight
1001 Middle No 68 190
1002 Middle No 54 110
1003 Middle No 65 225
1004 Middle No 52 135
1005 Back No 72 128
1006 Middle No 70 188
1007 Back No 70 155
1008 Middle Yes 68 160
1009 Front No 72 160

Download these two tables if you have not done so. Revise the libnameto reflect the directory where you save the files.

Example 32.26 Section

The following program attempts to get demographic information about students from two separate tables, survey, and survey2:

PROC SQL;
	create table demo_info as
	select ID,
		Gender,
		Height,
		Weight
	from stat482.survey, stat482.survey2;
QUIT;

example of the data table output that would have displayed without the create table clause

Let’s review the code. In this SQL procedure, we used the CREATE TABLE clause to save and name the new table as demo_info. The subsequent SELECT clause chooses ID, gender, height, and weight columns from two tables. In FROM clause, two tables’ names are listed.

Launch and run the SAS program. You should expect no result in the output window because the CREATE TABLE clause suppresses output. On the other hand, check the log window and you will find the error message: “Ambiguous reference, column ID is in more than one table”.

ERROR: Ambiguous reference, column ID is in more than one table.

As you observed two tables, the ID is in both tables and contains the same information. If a column in the SELECT statement appears in multiple tables, the table it is chosen from has to be specified by adding the table’s name in front as this:

Table.Column

So to make it right, we revise the previous program a little bit: change ID to survey.ID, which means that we use ID from survey data. The other change is the tables’ names. You can give any table an alias with or without the keyword AS after its original name. In the following program, we use S1 for survey data and S2 to survey2 data. And as you can see, it’s okay to use one level alias even for a permanent file. This makes life easier! In this way, ID can be specified as S1.ID.

PROC SQL;
		create table demo_info as
		select s1.ID,
			Gender,
			Height,
			Weight
	from stat482.survey as s1, stat482.survey2 as s2;
QUIT;

Everything seems good. Now launch and run the SAS program. As before, there is no output because of the CREATE TABLE statement. Check the log file in which there are two notes that need your attention.

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

NOTE: Table WORK.DEMO_INFO created, with 51076 rows and 4 columns.

The first is “ The execution of this query involves performing one or more Cartesian product joins that can not be optimized”. What is a Cartesian product? It refers to a query result in which each row in the first table is combined with every row in the second table. If you specify multiple tables in FROM clause but do not use a WHERE clause to choose needed rows, a Cartesian product is generated. For example,

If we submit the following program:

PROC SQL;
Select *
from table1, table2;

Table 1 has 3 rows; Table 2 has 3 rows as well. Their Cartesian product contains (3*3)9 rows.

Table1

name value1
x 1
y 2
z 3
\(\times \)

Table2

name value2
A 4
B 5
C 6
\(=\)

Result:

name value1 name value2
x 1 A 4
x 1 B 5
x 1 C 6
y 2 A 4
y 2 B 5
y 2 C 6
z 3 A 4
z 3 B 5
z 3 C 6

In the program for this example, there is no WHERE clause. So SAS generated a Cartesian product and gave you the note. Both Survey and Survey2 have 226 rows in the table. The query should have (226*226) = 51076 rows as the result. That’s why you got the other note, “Table Work.demo_info created, with 51076 rows and 4 columns.” Clearly, this can’t be correct. How do we get the desired result? Let’s make a final push.

Example 32.27 Section

The following program selects the demographic information of students (ID, gender, height, and weight) from two tables, survey and survey2:

PROC SQL;
	create table demo_info as
	select s1.ID,
			Gender,
			Height,
			Weight
	from stat482.survey as s1, stat482.survey2 as s2
	where s1.ID = s2.ID;
	select *
	from demo_info;
QUIT;

id Gender Height Weight
1001 Male 67 190
1002 Female 54 110
1003 Female 65 225
1004 Female 52 135
1005 Female 72 128
1006 Male 70 188
1007 Male 70 155
1008 Male 68 160
1009 Male 72 160
1010 Female 52 117
1011 Female 64 120
1012 Female 65 130
1013 Female 65 120
1014 Female 67 125
1015 Female 62 129
1016 Male 70 165
1017 Male 68 165
1018 Female 68 125
1019 Male 65 180
1020 Female 68 160
1021 Male 65 135
1022 Male 73 168
1023 Female 65 130
1024 Male 72 170
1025 Female 63 110
1026 Female 63 155
1027 Male 68 155
1028 Male 73 160
1029 Male 69 155
1030 Female 54 120
1031 Female 70 132
1032 Male 62 200
1033 Female 64 155
1034 Male 70 170
1035 Female 65 155
1036 Male 72 175
1037 Female 63 130
1038 Female 67 123
1039 Female 64 125
1040 Female 68 140
1041 Male 75 215
1042 Male 68 185
1043 Female 63 130
1044 Female 61 210
1045 Male 68 145
1046 Female 65 120
1047 Male 74 165
1048 Male 74 182
1049 Male 70 175
1050 Male 68 170
1051 Female 65 135
1052 Female 69 150
1053 Male 75 184
1054 Male 73 230
1055 Female 68 120
1056 Male 69 165
1057 Female 53 150
1058 Female 67 143
1059 Male 72 175
1060 Female 56 130
1061 Male 69 195
1062 Male 72 165
1063 Female 66 135
1064 Male 72 200
1065 Female 63 113
1066 Female 69 125
1067 Female 67 150
1068 Female 68 132
1069 Female 68 140
1070 Male 68 155
1071 Male 70 180
1072 Female 64 133
1073 Female 64 125
1074 Female 64 150
1075 Female 63 112
1076 Female 62 130
1077 Female 66 125
1078 Female 64 180
1079 Male 70 150
1080 Male 69 145
1081 Female 68 150
1082 Female 71 174
1083 Female 63 114
1084 Male 74 140
1085 Male 72 200
1086 Female 63 145
1087 Male 71 168
1088 Male 57 240
1089 Female 60 140
1090 Male 64 150
1091 Female 63 105
1092 Male 68 147
1093 Female 62 115
1094 Female 64 115
1095 Male 76 190
1096 Female 67 180
1097 Male 69 132
1098 Male 67 155
1099 Female 65 135
1100 Female 64 120
1101 Male 58 210
1102 Female 66 175
1103 Female 75 125
1104 Male 71 184
1105 Female 78 135
1106 Male 68 165
1107 Female 68 135
1108 Female 64 105
1109 Female 67 150
1110 Female 65 124
1111 Male 70 200
1112 Female 65 130
1113 Female 68 160
1114 Female 59 190
1115 Female 63 120
1116 Female 68 142
1117 Female . .
1118 Female 62 130
1119 Male 73 180
1120 Female 68 155
1121 Female 63 190
1122 Male 69 138
1123 Female 66 120
1124 Male 73 180
1125 Female 59 100
1126 Male 72 160
1127 Female 69 145
1128 Female 56 129
1129 Female 59 110
1130 Male 72 180
1131 Male 66 145
1132 Male 75 267
1133 Female 61 120
1134 Female 66 135
1135 Male 72 195
1136 Female 84 115
1137 Male 69 200
1138 Female 72 137
1139 Female 62 125
1140 Male 70 165
1141 Male 73 175
1142 Female 65 110
1143 Male 72 180
1144 Female 65 140
1145 Female 67 155
1146 Female 64 160
1147 Male 71 165
1148 Female 62 117
1149 Female 67 128
1150 Male 73 195
1151 Male 75 190
1152 Male 67 122
1153 Male 69 160
1154 Male 69 133
1155 Female 98 160
1156 Male 75 190
1157 Male 81 290
1158 Male 70 150
1159 Female 67 150
1160 Female 68 170
1161 Male 74 180
1162 Male 68 136
1163 Female 69 135
1164 Female 67 165
1165 Female 64 130
1166 Male 74 173
1167 Male 66 140
1168 Female 67 157
1169 Male 71 165
1170 Male 72 160
1171 Female 62 145
1172 Male 70 175
1173 Male 70 135
1174 Female 68 145
1175 Male 71 155
1176 Male 68 175
1177 Female 71 125
1178 Male 78 210
1179 Female 62 114
1180 Male 73 155
1181 Female 67 105
1182 Female 68 140
1183 Female 66 150
1184 Male 73 180
1185 Male 72 165
1186 Female 66 189
1187 Female 61 115
1188 Female 66 115
1189 Female 68 120
1190 Female 106 170
1191 Female 58 170
1192 Female 73 118
1193 Female 64 126
1194 Male 71 175
1195 Male 68 170
1196 Female 64 128
1197 Female 63 130
1198 Female 67 125
1199 Female 68 140
1200 Male 68 155
1201 Female 67 175
1202 Female 62 105
1203 Female 67 160
1204 Female 65 105
1205 Female 65 130
1206 Male 69 135
1207 Female 63 112
1208 Female 72 115
1209 Male 72 190
1210 Male 70 165
1211 Male 72 170
1212 Male 75 230
1213 Male 72 157
1214 Female 64 98
1215 Female 65 150
1216 Female 65 200
1217 Male 71 154
1218 Female 62 135
1219 Female 60 115
1220 Male 69 215
1221 Male 69 160
1222 Male 67 170
1223 Male 71 155
1224 Male 60 170
1225 Female 75 148
1226 Male 69 151

Let’s check through the code. Only one more clause has been added to the query, WHERE. We use the WHERE clause to subset the whole Cartesian product by only selecting the rows with matched ID numbers. Note that the column names in the WHERE clause do not have to be the same. At last, to be able to check the table in person, another query is added to display the data in the output window.

Launch and run the SAS program, and review the log file and the output.

NOTE: Table WORK.DEMO_INFO creates, with 226 rows and 4 columns.

Finally, we got what we want. As you can see from the query result, it’s like combining two columns from each table horizontally. SAS also calls it join. In this particular case, since we only chose the matched rows, it’s also called the inner join. Such a type of join is very similar to Merge By in the DATA step but requires less computing resources and less coding. There are other types of join and data union (a vertical combination of rows) in PROC SQL which are beyond this lesson’s scope. If you are interested, you can explore them yourself with the foundation of this lesson!