So far, all the examples in this lesson are querying a single table. However, as a matter of fact, you can specify multiple tables in the FROM clause. Querying more than one table at a time makes PROC SQL even more powerful in data manipulation.
The following examples use two tables:
Survey Data (survey.sas7bdat) contains:
ID, Gender, GPA, SmokeCigarrets, SATM, SATV
id | Gender | GPA | SmokeCig | SATM | SATV |
---|---|---|---|---|---|
1001 | Male | 2.67 | No | 700 | 700 |
1002 | Female | 2.98 | No | 700 | 500 |
1003 | Female | 2.67 | No | 470 | 470 |
1004 | Female | 3.6 | No | 710 | 560 |
1005 | Female | 3.76 | No | 600 | 520 |
1006 | Male | 3.86 | No | 610 | 720 |
1007 | Male | 3.94 | No | 710 | 670 |
1008 | Male | 2.8 | Yes | 610 | 580 |
1009 | Male | 3.48 | No | 690 | 620 |
Survey2 Data (survey2.sas7bdat) contains:
ID, Seating, DiveInfluence, Height, Weight
id | Seating | DriverInfluen | Height | Weight |
---|---|---|---|---|
1001 | Middle | No | 68 | 190 |
1002 | Middle | No | 54 | 110 |
1003 | Middle | No | 65 | 225 |
1004 | Middle | No | 52 | 135 |
1005 | Back | No | 72 | 128 |
1006 | Middle | No | 70 | 188 |
1007 | Back | No | 70 | 155 |
1008 | Middle | Yes | 68 | 160 |
1009 | Front | No | 72 | 160 |
Download these two tables if you have not done so. Revise the libname
to reflect the directory where you save the files.
Example 32.26 Section
The following program attempts to get demographic information about students from two separate tables, survey, and survey2:
PROC SQL;
create table demo_info as
select ID,
Gender,
Height,
Weight
from stat482.survey, stat482.survey2;
QUIT;
Let’s review the code. In this SQL procedure, we used the CREATE TABLE clause to save and name the new table as demo_info
. The subsequent SELECT clause chooses ID, gender, height, and weight columns from two tables. In FROM clause, two tables’ names are listed.
Launch and run the SAS program. You should expect no result in the output window because the CREATE TABLE clause suppresses output. On the other hand, check the log window and you will find the error message: “Ambiguous reference, column ID is in more than one table”.
ERROR: Ambiguous reference, column ID is in more than one table.
As you observed two tables, the ID is in both tables and contains the same information. If a column in the SELECT statement appears in multiple tables, the table it is chosen from has to be specified by adding the table’s name in front as this:
Table.Column
So to make it right, we revise the previous program a little bit: change ID to survey.ID, which means that we use ID from survey data. The other change is the tables’ names. You can give any table an alias with or without the keyword AS after its original name. In the following program, we use S1 for survey data and S2 to survey2 data. And as you can see, it’s okay to use one level alias even for a permanent file. This makes life easier! In this way, ID can be specified as S1.ID.
PROC SQL;
create table demo_info as
select s1.ID,
Gender,
Height,
Weight
from stat482.survey as s1, stat482.survey2 as s2;
QUIT;
Everything seems good. Now launch and run the SAS program. As before, there is no output because of the CREATE TABLE statement. Check the log file in which there are two notes that need your attention.
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.DEMO_INFO created, with 51076 rows and 4 columns.
The first is “ The execution of this query involves performing one or more Cartesian product joins that can not be optimized”. What is a Cartesian product? It refers to a query result in which each row in the first table is combined with every row in the second table. If you specify multiple tables in FROM clause but do not use a WHERE clause to choose needed rows, a Cartesian product is generated. For example,
If we submit the following program:
PROC SQL;
Select *
from table1, table2;
Table 1 has 3 rows; Table 2 has 3 rows as well. Their Cartesian product contains (3*3)9 rows.
Table1
name | value1 |
---|---|
x | 1 |
y | 2 |
z | 3 |
Table2
name | value2 |
---|---|
A | 4 |
B | 5 |
C | 6 |
Result:
name | value1 | name | value2 |
---|---|---|---|
x | 1 | A | 4 |
x | 1 | B | 5 |
x | 1 | C | 6 |
y | 2 | A | 4 |
y | 2 | B | 5 |
y | 2 | C | 6 |
z | 3 | A | 4 |
z | 3 | B | 5 |
z | 3 | C | 6 |
In the program for this example, there is no WHERE clause. So SAS generated a Cartesian product and gave you the note. Both Survey and Survey2 have 226 rows in the table. The query should have (226*226) = 51076 rows as the result. That’s why you got the other note, “Table Work.demo_info created, with 51076 rows and 4 columns.” Clearly, this can’t be correct. How do we get the desired result? Let’s make a final push.
Example 32.27 Section
The following program selects the demographic information of students (ID, gender, height, and weight) from two tables, survey and survey2:
PROC SQL;
create table demo_info as
select s1.ID,
Gender,
Height,
Weight
from stat482.survey as s1, stat482.survey2 as s2
where s1.ID = s2.ID;
select *
from demo_info;
QUIT;
id | Gender | Height | Weight |
---|---|---|---|
1001 | Male | 67 | 190 |
1002 | Female | 54 | 110 |
1003 | Female | 65 | 225 |
1004 | Female | 52 | 135 |
1005 | Female | 72 | 128 |
1006 | Male | 70 | 188 |
1007 | Male | 70 | 155 |
1008 | Male | 68 | 160 |
1009 | Male | 72 | 160 |
1010 | Female | 52 | 117 |
1011 | Female | 64 | 120 |
1012 | Female | 65 | 130 |
1013 | Female | 65 | 120 |
1014 | Female | 67 | 125 |
1015 | Female | 62 | 129 |
1016 | Male | 70 | 165 |
1017 | Male | 68 | 165 |
1018 | Female | 68 | 125 |
1019 | Male | 65 | 180 |
1020 | Female | 68 | 160 |
1021 | Male | 65 | 135 |
1022 | Male | 73 | 168 |
1023 | Female | 65 | 130 |
1024 | Male | 72 | 170 |
1025 | Female | 63 | 110 |
1026 | Female | 63 | 155 |
1027 | Male | 68 | 155 |
1028 | Male | 73 | 160 |
1029 | Male | 69 | 155 |
1030 | Female | 54 | 120 |
1031 | Female | 70 | 132 |
1032 | Male | 62 | 200 |
1033 | Female | 64 | 155 |
1034 | Male | 70 | 170 |
1035 | Female | 65 | 155 |
1036 | Male | 72 | 175 |
1037 | Female | 63 | 130 |
1038 | Female | 67 | 123 |
1039 | Female | 64 | 125 |
1040 | Female | 68 | 140 |
1041 | Male | 75 | 215 |
1042 | Male | 68 | 185 |
1043 | Female | 63 | 130 |
1044 | Female | 61 | 210 |
1045 | Male | 68 | 145 |
1046 | Female | 65 | 120 |
1047 | Male | 74 | 165 |
1048 | Male | 74 | 182 |
1049 | Male | 70 | 175 |
1050 | Male | 68 | 170 |
1051 | Female | 65 | 135 |
1052 | Female | 69 | 150 |
1053 | Male | 75 | 184 |
1054 | Male | 73 | 230 |
1055 | Female | 68 | 120 |
1056 | Male | 69 | 165 |
1057 | Female | 53 | 150 |
1058 | Female | 67 | 143 |
1059 | Male | 72 | 175 |
1060 | Female | 56 | 130 |
1061 | Male | 69 | 195 |
1062 | Male | 72 | 165 |
1063 | Female | 66 | 135 |
1064 | Male | 72 | 200 |
1065 | Female | 63 | 113 |
1066 | Female | 69 | 125 |
1067 | Female | 67 | 150 |
1068 | Female | 68 | 132 |
1069 | Female | 68 | 140 |
1070 | Male | 68 | 155 |
1071 | Male | 70 | 180 |
1072 | Female | 64 | 133 |
1073 | Female | 64 | 125 |
1074 | Female | 64 | 150 |
1075 | Female | 63 | 112 |
1076 | Female | 62 | 130 |
1077 | Female | 66 | 125 |
1078 | Female | 64 | 180 |
1079 | Male | 70 | 150 |
1080 | Male | 69 | 145 |
1081 | Female | 68 | 150 |
1082 | Female | 71 | 174 |
1083 | Female | 63 | 114 |
1084 | Male | 74 | 140 |
1085 | Male | 72 | 200 |
1086 | Female | 63 | 145 |
1087 | Male | 71 | 168 |
1088 | Male | 57 | 240 |
1089 | Female | 60 | 140 |
1090 | Male | 64 | 150 |
1091 | Female | 63 | 105 |
1092 | Male | 68 | 147 |
1093 | Female | 62 | 115 |
1094 | Female | 64 | 115 |
1095 | Male | 76 | 190 |
1096 | Female | 67 | 180 |
1097 | Male | 69 | 132 |
1098 | Male | 67 | 155 |
1099 | Female | 65 | 135 |
1100 | Female | 64 | 120 |
1101 | Male | 58 | 210 |
1102 | Female | 66 | 175 |
1103 | Female | 75 | 125 |
1104 | Male | 71 | 184 |
1105 | Female | 78 | 135 |
1106 | Male | 68 | 165 |
1107 | Female | 68 | 135 |
1108 | Female | 64 | 105 |
1109 | Female | 67 | 150 |
1110 | Female | 65 | 124 |
1111 | Male | 70 | 200 |
1112 | Female | 65 | 130 |
1113 | Female | 68 | 160 |
1114 | Female | 59 | 190 |
1115 | Female | 63 | 120 |
1116 | Female | 68 | 142 |
1117 | Female | . | . |
1118 | Female | 62 | 130 |
1119 | Male | 73 | 180 |
1120 | Female | 68 | 155 |
1121 | Female | 63 | 190 |
1122 | Male | 69 | 138 |
1123 | Female | 66 | 120 |
1124 | Male | 73 | 180 |
1125 | Female | 59 | 100 |
1126 | Male | 72 | 160 |
1127 | Female | 69 | 145 |
1128 | Female | 56 | 129 |
1129 | Female | 59 | 110 |
1130 | Male | 72 | 180 |
1131 | Male | 66 | 145 |
1132 | Male | 75 | 267 |
1133 | Female | 61 | 120 |
1134 | Female | 66 | 135 |
1135 | Male | 72 | 195 |
1136 | Female | 84 | 115 |
1137 | Male | 69 | 200 |
1138 | Female | 72 | 137 |
1139 | Female | 62 | 125 |
1140 | Male | 70 | 165 |
1141 | Male | 73 | 175 |
1142 | Female | 65 | 110 |
1143 | Male | 72 | 180 |
1144 | Female | 65 | 140 |
1145 | Female | 67 | 155 |
1146 | Female | 64 | 160 |
1147 | Male | 71 | 165 |
1148 | Female | 62 | 117 |
1149 | Female | 67 | 128 |
1150 | Male | 73 | 195 |
1151 | Male | 75 | 190 |
1152 | Male | 67 | 122 |
1153 | Male | 69 | 160 |
1154 | Male | 69 | 133 |
1155 | Female | 98 | 160 |
1156 | Male | 75 | 190 |
1157 | Male | 81 | 290 |
1158 | Male | 70 | 150 |
1159 | Female | 67 | 150 |
1160 | Female | 68 | 170 |
1161 | Male | 74 | 180 |
1162 | Male | 68 | 136 |
1163 | Female | 69 | 135 |
1164 | Female | 67 | 165 |
1165 | Female | 64 | 130 |
1166 | Male | 74 | 173 |
1167 | Male | 66 | 140 |
1168 | Female | 67 | 157 |
1169 | Male | 71 | 165 |
1170 | Male | 72 | 160 |
1171 | Female | 62 | 145 |
1172 | Male | 70 | 175 |
1173 | Male | 70 | 135 |
1174 | Female | 68 | 145 |
1175 | Male | 71 | 155 |
1176 | Male | 68 | 175 |
1177 | Female | 71 | 125 |
1178 | Male | 78 | 210 |
1179 | Female | 62 | 114 |
1180 | Male | 73 | 155 |
1181 | Female | 67 | 105 |
1182 | Female | 68 | 140 |
1183 | Female | 66 | 150 |
1184 | Male | 73 | 180 |
1185 | Male | 72 | 165 |
1186 | Female | 66 | 189 |
1187 | Female | 61 | 115 |
1188 | Female | 66 | 115 |
1189 | Female | 68 | 120 |
1190 | Female | 106 | 170 |
1191 | Female | 58 | 170 |
1192 | Female | 73 | 118 |
1193 | Female | 64 | 126 |
1194 | Male | 71 | 175 |
1195 | Male | 68 | 170 |
1196 | Female | 64 | 128 |
1197 | Female | 63 | 130 |
1198 | Female | 67 | 125 |
1199 | Female | 68 | 140 |
1200 | Male | 68 | 155 |
1201 | Female | 67 | 175 |
1202 | Female | 62 | 105 |
1203 | Female | 67 | 160 |
1204 | Female | 65 | 105 |
1205 | Female | 65 | 130 |
1206 | Male | 69 | 135 |
1207 | Female | 63 | 112 |
1208 | Female | 72 | 115 |
1209 | Male | 72 | 190 |
1210 | Male | 70 | 165 |
1211 | Male | 72 | 170 |
1212 | Male | 75 | 230 |
1213 | Male | 72 | 157 |
1214 | Female | 64 | 98 |
1215 | Female | 65 | 150 |
1216 | Female | 65 | 200 |
1217 | Male | 71 | 154 |
1218 | Female | 62 | 135 |
1219 | Female | 60 | 115 |
1220 | Male | 69 | 215 |
1221 | Male | 69 | 160 |
1222 | Male | 67 | 170 |
1223 | Male | 71 | 155 |
1224 | Male | 60 | 170 |
1225 | Female | 75 | 148 |
1226 | Male | 69 | 151 |
Let’s check through the code. Only one more clause has been added to the query, WHERE. We use the WHERE clause to subset the whole Cartesian product by only selecting the rows with matched ID numbers. Note that the column names in the WHERE clause do not have to be the same. At last, to be able to check the table in person, another query is added to display the data in the output window.
Launch and run the SAS program, and review the log file and the output.
NOTE: Table WORK.DEMO_INFO creates, with 226 rows and 4 columns.
Finally, we got what we want. As you can see from the query result, it’s like combining two columns from each table horizontally. SAS also calls it join. In this particular case, since we only chose the matched rows, it’s also called the inner join. Such a type of join is very similar to Merge By in the DATA step but requires less computing resources and less coding. There are other types of join and data union (a vertical combination of rows) in PROC SQL which are beyond this lesson’s scope. If you are interested, you can explore them yourself with the foundation of this lesson!