9.3 - Poisson Regression Model for Rate Data

Printer-friendly versionPrinter-friendly version

Example - Credit Cards

image of credit cardsThe table below refers to a sample of subjects randomly selected for an Italian study on the relation between income and whether one possesses a travel credit card (such as American Express or Diner’s Club). At each level of annual income in millions of lira (the currency in Italy before euro), the table indicates the number of subjects sampled and the number of these subjects possessing at least one travel credit card.

This example has information on individuals grouped by their income, the number of individuals (cases) within that income group  and number of credit cards. (link to datafile: creditcard.txt)

          	Number		Credit
Income		Cases		Cards
24		1		0
27		1		0
28		5		2
29		3		0
30		9		1
31		5		1
32		8		0
33		1		0
34		7		1
35		1		1
38		3		1
39		2		0
40		5		0
41		2		0
42		2		0
45		1		1
48		1		0
49		1		0
50		10		2
52		1		0
59		1		0
60		5		2
65		6		6
68		3		3
70		5		3
79		1		0
80		1		0
84		1		0
94		1		0
120		6		6
130		1		1

SAS logoBy using an OFFSET option in the MODEL statement in GENMOD in SAS we specify an offset variable. The offset variable serves to normalize the fitted cell means per some space, grouping or time interval in order to model the rates.

We are going to see how to do this with the following data on credit cards.

Here is a link to the SAS program credit_card.sas below:

SAS program

And, here is the output from this program:

SAS output

The model is:

log(μ/t)  = −2.3866+0.0208×Income

where log(t)=lcases.

R logoBy adding “offset” in the MODEL statement in GLM in R we can specify an offset variable. The offset variable serves to normalize the fitted cell means per some space, grouping or time interval in order to model the rates.

Below is the R program, (see creditcard.R).

R program code

In the crab example, we used offset as an option in the model statement. For this example, we will use offset as an additional part in the model statement. Are they the same?

Here is the output we get:

R output

The model is:

log(μ/t) = −2.3866 + 0.0208 × Income

where log(t) = lcases.

What is the estimated average rate of incidence, i.e. the usage of credit cards given the income?

 Is income a significant predictor? Does the overall model fit?

Use your knowledge of GLM, and in this case the SAS PROC GENMOD output or the glm() output in R.

SAS logoThe table below lists the observed income, the number of cases for each income, the number of credit cards per income level and the offset (e.g., lcases), along with the predicted/fitted/expected number of credit cards based on the fitted model.

SAS output

R logoWe can also get the predicted/fitted/expected number of credit cards below based on the fitted model.  Take a look at how this is done in the last line of the code shown below.

R output

So, in the group of six people that earn about 65 million lira, the expected number in the group with at least one travel credit cards is 2.126, while the observed number is 6.

$log(\hat{μ}/t) $= −2.3866+0.0208×Income = −2.3866 + 0.0208 × 65

$log(\hat{μ})$ = −2.3866 + 0.0208 × 65 + log(t)

$log(\hat{μ})$ = −2.3866 + 0.0208 × 65 + 1.79176

$\hat{μ} = 2.12641$

Notice, that lcases = log(t) = log(6) for this specific case. The expected rate would be $\hat{μ}/t ≈ 0.356$

Question:  How many people would we expect to have at least one travel credit card in a group of 10 people who earn about 120 million lira?