CASE STUDY: The Water Level Study

Introduction

An Interview with Dr. Harkness

Dr. William Harkness provides his own unique introduction in the first part of this series of video interviews about this study.

Case Overview

glass of water In 1956 Piaget and Imhelder argued that a child needs to construct conceptual systems in order to understand spatial relationships, for example, the Euclidean coordinate system. As part of their research they asked children to draw pictures of vertical and horizontal surfaces. In one task (the water-level task) the child is shown a picture of an upright glass half-filled with water. The child is then shown pictures of tilted glasses and asked to draw a line which represents how the surface of the water would look in these glasses. According to their results, by the age of nine or ten most children have mastered this task. However, later studies have shown that many adults, particularly females, have difficulty with this task.

The robust gender differences observed have had a dramatic impact on the status of Piaget and Imhelder's theory of Euclidean space. If Euclidean space is a construct needed for understanding the relationships between objects in our environment, it is a serious accusation to suggest that large numbers of females lack this construct. Also, if the majority of females do lack this system of reference, it is difficult to explain how they can accomplish tasks such as estimating the trajectories of moving objects while driving an automobile. It seems that the lack of a Euclidean coordinate system would be such a great hindrance that it would be noticeable in everyday life. If the Euclidean system is not used in tasks such as estimating the locations of moving objects, then it is important to discover what skills are facilitated by the Euclidean spatial system.

Some researchers have suggested that many people who fail the water level task may have Euclidean spatial competence, but are affected by specific performance variables and knowledge defects, including

The ability to draw a horizontal line and the criteria used for passing the task.
Attempts to draw the water line while the water is moving.
Understanding and knowledge of relevant physical principles.
Spatial skills.

The relevant SAS programs and their outputs can be found below:

The Penn State Study

Debbie Dalke, a Ph.D. candidate (at Penn State University ) conducted a study to investigate several factors which might provide insight into the gender differences which are so consistently reported in water-level studies. She recruited n = 166 subjects (all college students) from introductory psychology classes. Each subject was given two test booklets. The first was a paper-and-pencil water-level test. This consisted of six drawings of a rectangular glass tipped at one of three different angles on a table top (20, 40, and 60 degree degrees; three tipped to the left and three tipped to the right). A line representing the table top was located beneath the glass (see pictures below). The subjects were told to "Imagine that the glass has water in it and draw a line which represents how the surface of the water will look". A drawing was considered to be correct if the line was within five degrees of true horizontal.

Then each subject was asked "Did you draw the water line as it would look after the glass had come to a complete halt or while it was in motion?" Answers were recorded as a variable MOVING with values "1" if the answer was "complete halt" and "2" if the answer was "moving". Finally, each subject answered questions or performed tasks, in the second booklet, on

Gravity (5 items - example item)
Complex Physics (4 items - example item )
Mental Rotations (Vandenberg's test-6 problems, 2 answers/question).
Drawing a line inside a triangle; the variable measured was the deviation in degrees from a horizontal line.
Estimating the intersection of two lines (Bryant's test-3 tasks. Subjects were given 2 points if the "dot" was within 3mm of the intersection, 1 if within 5 mm, and zero otherwise, on each task).
Drawing a "light-cord" hanging from the ceiling of a trailer going up a hill, slanting either left or right, at angles of either 20 or 40 degrees.
Drawing a "tree" on the side of a hill, inclined 20, 40, or 60 degrees in both left and right directions. Subject's answers were scored as correct if the drawing in (f) or (g) was within 5 degrees of true vertical.

The Dataset and Variables

The datafile, water_level.txt, records 166 observations of the following variables:

dataset

The response variable was the outcome on the water-level task. Subjects passed (Y = 1) if they were right on at least 5 out of the 6 water-level drawings and failed (Y = 0) if they missed two or more. There were 10 predictor variables:

SEX: Female (1), Male (2).
GRAVITY: Number of gravity tasks answered correctly.
BRYANT: Total Score on Bryant's test -0, 1, 2, 3, 4.
VANDER: Total number of correct answers (0, 1, …, 12)
TRIANGLE: Score on the triangle task -0, 1, … (degrees from horizontal).
TRAILER: Total Score on the trailer test -0, 1, 2, 3, 4.
TREE: Total Score on the tree drawings -0, 1, …, 8.
COMPHYS: Number of Complex Physics questions answered correctly.
MOVING: Values as given above.

In addition, two other variables, derived from these, will be used:

TOTPHYS: Sum of COMPHYS and GRAVITY-VALUES ARE 0, 1, 2, …, 9.
TOTAL: Sum of TOTPHYS , VANDER, TRIANGLE, BRYANT, TRAILER, AND TREE.

The variables SEX and MOVING are class variables and the rest are quantitative.

Going About Explaining Gender Differences

Can we use logistic regression to address questions like "If a subject is a female and answers all five of the gravity questions correctly, what is the chance (probability) that she passes the water-level task?" Also, ask questions like

Which set of predictor variables do the best job of predicting the outcome on the water-level task?
If we "control" or "adjust" for overall knowledge about physics (TOTPHYS), spatial ability as measured by the test on Mental Rotations (VANDER) and performance on a task akin to the water-level one, for example, TREE, does the observed difference between the sexes vanish?

Exploratory Analysis - 1

Test the Equality of Two Proportions

The SAS program water.sas provides the following frequency table (and others) of the water level study data:

sas output

Why was the passing rate so low? What factors affect passing?

In the past statisticians have used ordinary regression when experiments involved categorical data. Wouldn't it be interesting to see how bad an ordinary regression analysis is compared to using logistic regression?

First we could run a Pearson Chi-Square to test the equality of two proportions. Our hypothesis at this stage is that the proportion of males passing is the same as the proportion of females that passed. As the frequency table above reports, the observed percentage of females who passed is 29.91% and the observed proportion of males who passed is 64.41%.

When we look at the Pearson Chi-squared test of equality of two proportions we would find a Chi-Square value of 18.562, p-value = 0.000.

SAS output

This is highly significant, (because the p-value is also < 0.05), so, we reject the hypothesis that the proportions passing are the same for females and males.

Exploratory Analysis - 2

Logisitic Regression with a Qualitatitve (Categorical) Variable

Logistic Regression

Logistic Regression of Pass/Fail

Let's use logistic regression to test passing versus failing. We can test the model:

\(\text{Model: } ln\{\pi(sex)/[1-\pi(sex)]\}=\beta_0+\beta_1 \ast (sex)=\begin{array} {l @{\quad,\quad} l}
\beta_0+\beta_1 & \text{for females}\\ \beta_0 & \text{for males} \end{array} \)

and use the SAS program water_level1.sas below. This program uses the frequency counts for both sex and whether they passed the test:

sas program

What do the results indicate? In this case we can see that in testing the following:

H₀: No Sex effect or H₀ : β₁ = 0 vs. the alternative H_a: β₁ ≠ 0

the Likelihood Ratio, G² = 18.6578 ...

SAS output

Therefore, we must reject null hypothesis - no sex effect - and conclude that there is statictically significant difference between females and males in proportion passing the test.

We can fit the model using these values from the output:

SAS output

where the

fitted logit(females) = 0.5931 - 1.4446 = -0.8515 for females
fitted logit(males) = 0.5931 for males

SAS output

The odds ratio (males vs. females) = s^-1.4446 = 0.236

SAS output

The odds ratio = (38)(75)/(21)(32) = 4.24 = s^-1.4446

Exploratory Analysis - 3

Logistic Regression with a Quantitative Variable
(Pass/Fail on x = Gravity)

Now, let's see if the quantitative variable gravity has any effect on the passing or failing and test: Model: \( ln \pi(x)/[1-\pi(x)]\). We can use the SAS program water_level2.sas below to do this.

sas program

Our hypothesis is:

H₀: No gravity effect or H₀: β₁ = 0 vs. the alternative H_a: β₁ ≠ 0

The output from the program provides us with a G² = 42.1765...

SAS output

Therefore, we must reject H₀, there is no gravity effect and conclude there is a statistically significant difference between the gravity score and the proportion passing the task.

We can fit the model using the values from the output:

SAS output

such that the:

Fitted Model: Estimated logit[π (x)] = -2.8156 + 0.7998x

Here is the Odds Ratio Estimates output:

SAS output

which in a sense tell us that the odds of passing the water level task increase by 2.225 for each additional right answer on gravity.

If we take the observed and fitted proportions that are given (below) in the output:

SAS output

we have added a couple of lines of code to our program so that SAS displays a graph of the observed and fitted(phat) proportions, below:

SAS plot

How does the 'fit' look?

Exploratory Analysis - 4

Logistic Regression with 1 Qualitative and 1 Quantitative Variable
(Pass/Fail on x = Sex and Gravity)

First, let's perform logisitic regression of passing or failing the test on the variables sex and gravity using the following models:

Model: logit [π(sex, gravity)] = β₀+ β₁* (sex) + β₂*gravity

(β₀+ β₁) + β₂*gravity, for females, and
(β₀+ 2β₁) + β₂*gravity, for males

We can use the first PROC LOGISTIC procedure in the following SAS program water_level3a.sas to run this.

SAS program

First we are testing:

H₀: sex and gravity together do not affect passing the water level task, or

H₀: β₁ = β₂ = 0 vs. H_a: at least one of the parameters is not 0.

We can see by the output that results:

SAS output

that G² = 50.9766 = LRT .

We will conclude that the logistic regression of pass/fail on sex and gravity is not statistically significant.

SAS output

The estimated logit(sex, gravity) = -4.1676 + 1.1220sex + 0.7404gravity.

Note that sex is coded as 1 for females and 2 for males.

No Gravity Effect, Adjusted for Sex?

If we were to test the hypothesis that there is no gravity effect, adjusted for 'sex', we would calculate the change in G² for the model with both variables included and the model with only sex included (see water_level1.sas output). For instance,

G² (sex, gravity) - G²(sex) = 50.9766 - 42.1765 = 8.801.

Or, we could calculate the change in the 2loglikelihood:

-2ln(sex) - [-2ln(sex, gravity)] = 183.859 - 175.059 = 8.800

Compare this with the Wald chi-square of 8.6117.

No Sex Effect, Adjusted for Gravity?

Now let's test the hypothesis that there is no sex effect, adjusted for the gravity score. We would calculate the change in G² for the model with both variables included and the model with only gravity (see water_level2.sas output).

G² (sex, gravity) - G²(gravity) = 50.9766 - 18.6568 = 32.319.

Or, we could calculate the change in the 2loglikelihood:

-2ln(gravity) - [-2ln(sex, gravity)] = 207.478 - 175.059 = 32.419

Now, how does this compare this with the Wald chi-square of 25.4979?

SAS output

Predicted values and confidence limits for population proportions:

SAS output

Edited fitted values are given below.

edited values here...

A plot of phat vs. gravity for females and males is given in the graph.

graph here...

Logistic Regression of Pass/Fail on Sex, Gravity and Sex* Gravity (Interaction Model)

Here our model is:

Model: logit [π(sex, gravity)] = β₀+ β₁* (sex) + β₂*gravity + β₃*(sex*gravity)

(β₀+ β₁) + (β₂+ β₃)gravity, for females, and
(β₀+ 2β₁) + (β₂+ 2β₃)gravity, for males

SAS output:

SAS output

Exploratory Analysis - 5

Binary Logisitic Regression on a Categorical Variable with 3 Values
(Pass/Fail on x = 'Sex Move')

Binary Logistic Regression

What we have looked at thus far in this exploratory analysis were 2 × 2 tables. Now we are going to move to 2 × 3 tables.

First we will tally the discrete variable Moving. Moving was coded as 1 if the person said that the glass was not moving when they drew the line, and 2 if it was. 29 out of the 166 subjects said that the glass was moving.

Moving	Count
1	137
2	29
	N = 166

Now we will create a new variable called 'sexMove' as follows: Gender is coded 0 = female and 1 = male. Moving was coded as 1 if the person said that the glass was not moving when they drew the line, and 2 if it was. We will let the combined 'Gender by Move' = 10*Gender + Move.

According to the dataset 79 females said the glass was not moving, 28 females said the glass was moving. 58 of the males said the glass was not moving and only 1 male said the glass was moving

Female, Not moving	79
Female, Moving	28
Male, Not moving`	58
Male, Moving	1
	N = 166

For the purposes of this analysis we will combine the last two rows and label it Male such that this new variable, SexMove, will have 3 values, 1, 2 and 3.

Value	Description	Count
1	if the person is female and said the glass was not moving	79
2	if the person is female and said the glass was moving	28
3	if the person is male	59

We can run the binary logisitic regression using the SAS program ???

SAS program image here...

SAS output and discussion here ...

Conclusion

There is a very highly significant difference in the proportions of persons passing for the three values of SexMove. Only 7.14% of females who said the glass was moving passed the water level task. 37.97% of the females who said the glass was not moving passed, and 64.41% of the males passed the task. Only one male out of 59 said the glass was moving compared to 28 out of 107 females.

Exploratory Analysis - 6

Backward Elimination & Stepwise Selection Procedures

We will begin here by using two subset selection procedures in SAS Proc Logistic for choosing variables related to the response:

Backward elimination
Stepwise selection

Which Model Should I Fit?

Take a look at this SAS program (water_level3.sas):

SAS program

The data are input, the variables identified and then the PROC LOGISTIC procedure is called specifying a model where Y (subjects passed, 1 or failed, 0) is the response. Notice, highlighted in purple, the use of the word 'backward' and 'stepwise' to specify the two different subset selection procedures.

Backward Elimination

In the output, the procedure begins by entering all of the variables:

SAS output

and then one by one the variables are removed...

sas output

Each time the model is re-fit until at the end of the procedure the note below is reported along with the four variables that were removed from the model fit.

sas output

Directly after this the procedure lists the variables that are retained in the model as their p-values and all < 0.05:

sas output

along with the coefficients that make up the fitted model.

Stepwise Selection

This procedure takes the opposite approach beginning with one variable and subsequently adding additional variables, on at a time, to the model, fitting it each time.

sas output

until at the end of the procedure the following note is given:

sas output

and a summary list of the variables that remain in the model is displayed:

sas output

Odds Ratio Estimates

If we look at the Odds Ratio Estimates for both procedures:

Backward Elimination

sas output

Stepwise Selection

sas output

The two procedures each selected 6 variables with 5 in common; backward elimination chose ‘gravity’ while stepwise chose ‘totphysics’. The odds ratio and confidence interval estimates are quite close for all variables.

Furthermore, neither model includes the variable ‘sex’. We conclude that adjusted for these 6 independent variables ‘sex’ does not affect passing/failing.

This handout covers this information as well: WaterStudyModelSelection.pdf

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

CASE STUDY: The Water Level Study

Introduction

Case Overview

The Penn State Study

The Dataset and Variables

Exploratory Analysis - 1

Test the Equality of Two Proportions

Exploratory Analysis - 2

Logisitic Regression with a Qualitatitve (Categorical) Variable

Logistic Regression of Pass/Fail

Exploratory Analysis - 3

Logistic Regression with a Quantitative Variable (Pass/Fail on x = Gravity)

Exploratory Analysis - 4

Logistic Regression with 1 Qualitative and 1 Quantitative Variable (Pass/Fail on x = Sex and Gravity)

No Gravity Effect, Adjusted for Sex?

No Sex Effect, Adjusted for Gravity?

Logistic Regression of Pass/Fail on Sex, Gravity and Sex* Gravity (Interaction Model)

Exploratory Analysis - 5

Binary Logisitic Regression on a Categorical Variable with 3 Values (Pass/Fail on x = 'Sex Move')

Conclusion

Exploratory Analysis - 6

Backward Elimination & Stepwise Selection Procedures

Backward Elimination

Stepwise Selection

Odds Ratio Estimates

Logistic Regression with a Quantitative Variable
(Pass/Fail on x = Gravity)

Logistic Regression with 1 Qualitative and 1 Quantitative Variable
(Pass/Fail on x = Sex and Gravity)

Binary Logisitic Regression on a Categorical Variable with 3 Values
(Pass/Fail on x = 'Sex Move')