32.7 - Querying Multiple Tables

So far, all the examples in this lesson are querying a single table. However, as a matter of fact, you can specify multiple tables in the FROM clause. Querying more than one table at a time makes PROC SQL even more powerful in data manipulation.

The following examples use two tables:

Survey Data (survey.sas7bdat) contains:

ID, Gender, GPA, SmokeCigarrets, SATM, SATV

id	Gender	GPA	SmokeCig	SATM	SATV
1001	Male	2.67	No	700	700
1002	Female	2.98	No	700	500
1003	Female	2.67	No	470	470
1004	Female	3.6	No	710	560
1005	Female	3.76	No	600	520
1006	Male	3.86	No	610	720
1007	Male	3.94	No	710	670
1008	Male	2.8	Yes	610	580
1009	Male	3.48	No	690	620

Survey2 Data (survey2.sas7bdat) contains:

ID, Seating, DiveInfluence, Height, Weight

id	Seating	DriverInfluen	Height	Weight
1001	Middle	No	68	190
1002	Middle	No	54	110
1003	Middle	No	65	225
1004	Middle	No	52	135
1005	Back	No	72	128
1006	Middle	No	70	188
1007	Back	No	70	155
1008	Middle	Yes	68	160
1009	Front	No	72	160

Download these two tables if you have not done so. Revise the libnameto reflect the directory where you save the files.

Example 32.26 Section

The following program attempts to get demographic information about students from two separate tables, survey, and survey2:

PROC SQL;
	create table demo_info as
	select ID,
		Gender,
		Height,
		Weight
	from stat482.survey, stat482.survey2;
QUIT;

example of the data table output that would have displayed without the create table clause

Let’s review the code. In this SQL procedure, we used the CREATE TABLE clause to save and name the new table as demo_info. The subsequent SELECT clause chooses ID, gender, height, and weight columns from two tables. In FROM clause, two tables’ names are listed.

Launch and run the SAS program. You should expect no result in the output window because the CREATE TABLE clause suppresses output. On the other hand, check the log window and you will find the error message: “Ambiguous reference, column ID is in more than one table”.

ERROR: Ambiguous reference, column ID is in more than one table.

As you observed two tables, the ID is in both tables and contains the same information. If a column in the SELECT statement appears in multiple tables, the table it is chosen from has to be specified by adding the table’s name in front as this:

Table.Column

So to make it right, we revise the previous program a little bit: change ID to survey.ID, which means that we use ID from survey data. The other change is the tables’ names. You can give any table an alias with or without the keyword AS after its original name. In the following program, we use S1 for survey data and S2 to survey2 data. And as you can see, it’s okay to use one level alias even for a permanent file. This makes life easier! In this way, ID can be specified as S1.ID.

PROC SQL;
		create table demo_info as
		select s1.ID,
			Gender,
			Height,
			Weight
	from stat482.survey as s1, stat482.survey2 as s2;
QUIT;

Everything seems good. Now launch and run the SAS program. As before, there is no output because of the CREATE TABLE statement. Check the log file in which there are two notes that need your attention.

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

NOTE: Table WORK.DEMO_INFO created, with 51076 rows and 4 columns.

The first is “ The execution of this query involves performing one or more Cartesian product joins that can not be optimized”. What is a Cartesian product? It refers to a query result in which each row in the first table is combined with every row in the second table. If you specify multiple tables in FROM clause but do not use a WHERE clause to choose needed rows, a Cartesian product is generated. For example,

If we submit the following program:

PROC SQL; Select * from table1, table2;

Table 1 has 3 rows; Table 2 has 3 rows as well. Their Cartesian product contains (3*3)9 rows.

Table1

name	value1
x	1
y	2
z	3

\(\times \)

Table2

name	value2
A	4
B	5
C	6

\(=\)

Result:

name	value1	name	value2
x	1	A	4
x	1	B	5
x	1	C	6
y	2	A	4
y	2	B	5
y	2	C	6
z	3	A	4
z	3	B	5
z	3	C	6

In the program for this example, there is no WHERE clause. So SAS generated a Cartesian product and gave you the note. Both Survey and Survey2 have 226 rows in the table. The query should have (226*226) = 51076 rows as the result. That’s why you got the other note, “Table Work.demo_info created, with 51076 rows and 4 columns.” Clearly, this can’t be correct. How do we get the desired result? Let’s make a final push.

Example 32.27 Section

The following program selects the demographic information of students (ID, gender, height, and weight) from two tables, survey and survey2:

PROC SQL;
	create table demo_info as
	select s1.ID,
			Gender,
			Height,
			Weight
	from stat482.survey as s1, stat482.survey2 as s2
	where s1.ID = s2.ID;
	select *
	from demo_info;
QUIT;

id	Gender	Height	Weight
1001	Male	67	190
1002	Female	54	110
1003	Female	65	225
1004	Female	52	135
1005	Female	72	128
1006	Male	70	188
1007	Male	70	155
1008	Male	68	160
1009	Male	72	160
1010	Female	52	117
1011	Female	64	120
1012	Female	65	130
1013	Female	65	120
1014	Female	67	125
1015	Female	62	129
1016	Male	70	165
1017	Male	68	165
1018	Female	68	125
1019	Male	65	180
1020	Female	68	160
1021	Male	65	135
1022	Male	73	168
1023	Female	65	130
1024	Male	72	170
1025	Female	63	110
1026	Female	63	155
1027	Male	68	155
1028	Male	73	160
1029	Male	69	155
1030	Female	54	120
1031	Female	70	132
1032	Male	62	200
1033	Female	64	155
1034	Male	70	170
1035	Female	65	155
1036	Male	72	175
1037	Female	63	130
1038	Female	67	123
1039	Female	64	125
1040	Female	68	140
1041	Male	75	215
1042	Male	68	185
1043	Female	63	130
1044	Female	61	210
1045	Male	68	145
1046	Female	65	120
1047	Male	74	165
1048	Male	74	182
1049	Male	70	175
1050	Male	68	170
1051	Female	65	135
1052	Female	69	150
1053	Male	75	184
1054	Male	73	230
1055	Female	68	120
1056	Male	69	165
1057	Female	53	150
1058	Female	67	143
1059	Male	72	175
1060	Female	56	130
1061	Male	69	195
1062	Male	72	165
1063	Female	66	135
1064	Male	72	200
1065	Female	63	113
1066	Female	69	125
1067	Female	67	150
1068	Female	68	132
1069	Female	68	140
1070	Male	68	155
1071	Male	70	180
1072	Female	64	133
1073	Female	64	125
1074	Female	64	150
1075	Female	63	112
1076	Female	62	130
1077	Female	66	125
1078	Female	64	180
1079	Male	70	150
1080	Male	69	145
1081	Female	68	150
1082	Female	71	174
1083	Female	63	114
1084	Male	74	140
1085	Male	72	200
1086	Female	63	145
1087	Male	71	168
1088	Male	57	240
1089	Female	60	140
1090	Male	64	150
1091	Female	63	105
1092	Male	68	147
1093	Female	62	115
1094	Female	64	115
1095	Male	76	190
1096	Female	67	180
1097	Male	69	132
1098	Male	67	155
1099	Female	65	135
1100	Female	64	120
1101	Male	58	210
1102	Female	66	175
1103	Female	75	125
1104	Male	71	184
1105	Female	78	135
1106	Male	68	165
1107	Female	68	135
1108	Female	64	105
1109	Female	67	150
1110	Female	65	124
1111	Male	70	200
1112	Female	65	130
1113	Female	68	160
1114	Female	59	190
1115	Female	63	120
1116	Female	68	142
1117	Female	.	.
1118	Female	62	130
1119	Male	73	180
1120	Female	68	155
1121	Female	63	190
1122	Male	69	138
1123	Female	66	120
1124	Male	73	180
1125	Female	59	100
1126	Male	72	160
1127	Female	69	145
1128	Female	56	129
1129	Female	59	110
1130	Male	72	180
1131	Male	66	145
1132	Male	75	267
1133	Female	61	120
1134	Female	66	135
1135	Male	72	195
1136	Female	84	115
1137	Male	69	200
1138	Female	72	137
1139	Female	62	125
1140	Male	70	165
1141	Male	73	175
1142	Female	65	110
1143	Male	72	180
1144	Female	65	140
1145	Female	67	155
1146	Female	64	160
1147	Male	71	165
1148	Female	62	117
1149	Female	67	128
1150	Male	73	195
1151	Male	75	190
1152	Male	67	122
1153	Male	69	160
1154	Male	69	133
1155	Female	98	160
1156	Male	75	190
1157	Male	81	290
1158	Male	70	150
1159	Female	67	150
1160	Female	68	170
1161	Male	74	180
1162	Male	68	136
1163	Female	69	135
1164	Female	67	165
1165	Female	64	130
1166	Male	74	173
1167	Male	66	140
1168	Female	67	157
1169	Male	71	165
1170	Male	72	160
1171	Female	62	145
1172	Male	70	175
1173	Male	70	135
1174	Female	68	145
1175	Male	71	155
1176	Male	68	175
1177	Female	71	125
1178	Male	78	210
1179	Female	62	114
1180	Male	73	155
1181	Female	67	105
1182	Female	68	140
1183	Female	66	150
1184	Male	73	180
1185	Male	72	165
1186	Female	66	189
1187	Female	61	115
1188	Female	66	115
1189	Female	68	120
1190	Female	106	170
1191	Female	58	170
1192	Female	73	118
1193	Female	64	126
1194	Male	71	175
1195	Male	68	170
1196	Female	64	128
1197	Female	63	130
1198	Female	67	125
1199	Female	68	140
1200	Male	68	155
1201	Female	67	175
1202	Female	62	105
1203	Female	67	160
1204	Female	65	105
1205	Female	65	130
1206	Male	69	135
1207	Female	63	112
1208	Female	72	115
1209	Male	72	190
1210	Male	70	165
1211	Male	72	170
1212	Male	75	230
1213	Male	72	157
1214	Female	64	98
1215	Female	65	150
1216	Female	65	200
1217	Male	71	154
1218	Female	62	135
1219	Female	60	115
1220	Male	69	215
1221	Male	69	160
1222	Male	67	170
1223	Male	71	155
1224	Male	60	170
1225	Female	75	148
1226	Male	69	151

Let’s check through the code. Only one more clause has been added to the query, WHERE. We use the WHERE clause to subset the whole Cartesian product by only selecting the rows with matched ID numbers. Note that the column names in the WHERE clause do not have to be the same. At last, to be able to check the table in person, another query is added to display the data in the output window.

Launch and run the SAS program, and review the log file and the output.

NOTE: Table WORK.DEMO_INFO creates, with 226 rows and 4 columns.

Finally, we got what we want. As you can see from the query result, it’s like combining two columns from each table horizontally. SAS also calls it join. In this particular case, since we only chose the matched rows, it’s also called the inner join. Such a type of join is very similar to Merge By in the DATA step but requires less computing resources and less coding. There are other types of join and data union (a vertical combination of rows) in PROC SQL which are beyond this lesson’s scope. If you are interested, you can explore them yourself with the foundation of this lesson!