The Dataset and Variables

The Dataset and Variables

The datafile, water_level.txt, records 166 observations of the following variables:

dataset

The response variable was the outcome on the water-level task. Subjects passed (Y = 1) if they were right on at least 5 out of the 6 water-level drawings and failed (Y = 0) if they missed two or more. There were 10 predictor variables:

SEX: Female (1), Male (2).
GRAVITY: Number of gravity tasks answered correctly.
BRYANT: Total Score on Bryant's test -0, 1, 2, 3, 4.
VANDER: Total number of correct answers (0, 1, …, 12)
TRIANGLE: Score on the triangle task -0, 1, … (degrees from horizontal).
TRAILER: Total Score on the trailer test -0, 1, 2, 3, 4.
TREE: Total Score on the tree drawings -0, 1, …, 8.
COMPHYS: Number of Complex Physics questions answered correctly.
MOVING: Values as given above.

In addition, two other variables, derived from these, will be used:

TOTPHYS: Sum of COMPHYS and GRAVITY-VALUES ARE 0, 1, 2, …, 9.
TOTAL: Sum of TOTPHYS , VANDER, TRIANGLE, BRYANT, TRAILER, AND TREE.

The variables SEX and MOVING are class variables and the rest are quantitative.

Going About Explaining Gender Differences

Can we use logistic regression to address questions like "If a subject is a female and answers all five of the gravity questions correctly, what is the chance (probability) that she passes the water-level task?" Also, ask questions like

  1. Which set of predictor variables do the best job of predicting the outcome on the water-level task?
  2. If we "control" or "adjust" for overall knowledge about physics (TOTPHYS), spatial ability as measured by the test on Mental Rotations (VANDER) and performance on a task akin to the water-level one, for example, TREE, does the observed difference between the sexes vanish?

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility