Published on *STAT 504* (https://onlinecourses.science.psu.edu/stat504)

The table below refers to a sample of subjects randomly selected for an Italian study on the relation between income and whether one possesses a travel credit card (such as American Express or Diner’s Club). At each level of annual income in millions of lira (the currency in Italy before euro), the table indicates the number of subjects sampled and the number of these subjects possessing at least one travel credit card.

This example has information on individuals grouped by their income, the number of individuals (cases) within that income group and number of credit cards. (link to datafile: creditcard.txt [2])

Number Credit Income Cases Cards 24 1 0 27 1 0 28 5 2 29 3 0 30 9 1 31 5 1 32 8 0 33 1 0 34 7 1 35 1 1 38 3 1 39 2 0 40 5 0 41 2 0 42 2 0 45 1 1 48 1 0 49 1 0 50 10 2 52 1 0 59 1 0 60 5 2 65 6 6 68 3 3 70 5 3 79 1 0 80 1 0 84 1 0 94 1 0 120 6 6 130 1 1

By using an OFFSET option in the MODEL statement in GENMOD in SAS we specify an **offset variable**. The offset variable serves to normalize the fitted cell means per some space, grouping or time interval in order to model the rates.

We are going to see how to do this with the following data on credit cards.

Here is a link to the SAS program credit_card.sas [3] below:

And, here is the output from this program:

The model is:

log(μ/t) = −2.3866+0.0208×Income

where log(t)=lcases.

By adding “offset” in the MODEL statement in GLM in R we can specify an **offset variable**. The offset variable serves to normalize the fitted cell means per some space, grouping or time interval in order to model the rates.

Below is the R program, (see creditcard.R [4]).

In the crab example, we used offset as an option in the model statement. For this example, we will use offset as an additional part in the model statement. Are they the same?

Here is the output we get:

The model is:

log(μ/t) = −2.3866 + 0.0208 × Income

where log(t) = cases.

What is the estimated average rate of incidence, i.e. the usage of credit cards given the income?

Is income a significant predictor? Does the overall model fit?

Use your knowledge of GLM, and in this case the SAS PROC GENMOD output or the glm() output in R.

The table below lists the observed income, the number of cases for each income, the number of credit cards per income level and the offset (e.g., lcases), along with the predicted/fitted/expected number of credit cards based on the fitted model.

We can also get the predicted/fitted/expected number of credit cards below based on the fitted model. Take a look at how this is done in the last line of the code shown below.

So, in the group of six people that earn about 65 million lira, the expected number in the group with at least one travel credit cards is 2.126, while the observed number is 6.

$log(\hat{μ}/t) $= −2.3866+0.0208×Income = −2.3866 + 0.0208 × 65

$log(\hat{μ})$ = −2.3866 + 0.0208 × 65 + log(t)

$log(\hat{μ})$ = −2.3866 + 0.0208 × 65 + 1.79176

$\hat{μ} = 2.12641$

Notice, that *lcases* = log(t) = log(6) for this specific case. The expected rate would be $\hat{μ}/t ≈ 0.356$

**Question**: How many people would we expect to have at least one travel credit card in a group of 10 people who earn about 120 million lira?

**Links:**

[1] https://www.dynamicdrive.com

[2] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson07/creditcard.txt

[3] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson07/credit_card.sas

[4] https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson07/credit_card.R