Welcome to STAT 504!

Welcome to STAT 504 – Analysis of Discrete Data! Section

The focus of this class is a multivariate analysis of discrete data. Here we deal with data which are discretely measured responses such as counts, proportions, nominal variables, ordinal variables, discrete interval variables with few values, continuous variables grouped into a small number of categories, etc. We will learn basic statistical methods and discuss issues relevant for the analysis of some discrete distribution, cross-classified tables of counts, (i.e., contingency tables), success/failure records, questionnaire items, judge's ratings, etc.

STAT 504 is an applied course intended for graduate students who have successfully completed at least two other graduate statistics courses. The prerequisite basically means that in order to succeed in STAT 504, you must have good understanding of the basic concepts such as populations and parameters, samples and statistics, confidence intervals, and hypothesis tests, and how to fit and interpret regression type models. Being familiar with matrix algebra is a plus. We will not spend time reviewing those, but I strongly encourage you to do it with A Review of the Principles of Statistics section that you will find in the Resource Menu block on the left. You should at least make sure to review the sections on Probability and Distributions and Statistical Inference and Estimation. You can always come back to these pages as needed.

A Statistical View of the World

Randomly chosen subjects Population Sample Inference


  • Contains N subjects
  • Unobserved variables
  • Some of this population's parameters include: \(\mu,\  \beta_0,\  \beta_{location},\  \beta,\  \sigma^2,\  \sigma^2_{location}\), etc...
Example: \(\mu\) = true unknown average prices of a 1-bedroom apartment in State College


  • Contains n randomly chosen subjects
  • Observed explanatory and outcome variables
  • Some statistics for this sample include: \(\bar{x},\  \hat{\beta}_0,\  \hat{\beta}_{location},\  \hat{\beta},\  s^2,\  s^2_{location}\), etc...
Example: \(\bar{x}\)= estimated average of a 1-bedroom apartment in State College

The prerequisite also means that you are comfortable with either R or SAS, are a quick learner of software packages and basic programming, or will be able to figure out how to do the required analyses in another statistical software of your choice. Links to a very basic R tutorial and SAS tutorial are under Review Materials and Quick Tutorials section in the Resource Menu block on your left. For other statistical software see our Statistical Software page for more information.

The Challenges in this Course Section

Many students find this course significantly harder than other statistics courses they have taken. There are a number of reasons for this. Many introductory and intermediate courses focus on analysis of continuous data and normal distributions; less emphasis is put on discrete distributions such as binomial, multinomial and Poisson, and on models relevant for categorical data. This naturally leads to a course that presents many new concepts and statistical methods within a short period of time, and it requires you to keep up with the schedule. Furthermore, there is a great diversity in the extent to which software packages cover tools for categorical data analysis. Unless you are willing to do some programming, sometimes you may need to use several different packages to do a thorough analysis, and as you will see in the notes, there is more than one way to do the same analysis as well.

Get Involved as You Learn! Section

The use of the on-line discussion boards and collaborations are encouraged, especially for clearing up the introduced concepts and to resolve any computing or programming issues. Finding an appropriate solution on the internet is also acceptable as long as you provide a website, textbook and/or peer reviewed journal reference! Like any textbook, our notes and code may also have errors, so please do point these out. Do not wait to the point of frustration. If you are having an issue it's likely that someone else is as well! We will do our best to address such issues as soon as possible.

The goal of this class is to help you build a foundation for analysis of categorical data, and not provide the cook-book recipes of how to do the analyses. It is our hope that the basic knowledge you gain here will allow you to more easily communicate with others about categorical data and to learn many new, and possibly more advanced, methods for analysis of categorical data. Some of the notes are very detailed whereas other notes are less so. It is a good idea to first skim through the entire lesson to get a sense of a bigger picture, and then go back and work out the details. It is also a good idea that before starting to read through the lesson, that you take a quick look at the homework. The lectures build on each other, and as we move through the course it is expected that you will be familiar with previously covered material. Furthermore, at times, certain new concepts will be left open to give you a chance to explore them on your own, but in a "safe" environment. This is intended to help you start feeling comfortable with exploring new topics in categorical data analysis that you are likely to encounter in practice. Nowadays, within the maze of knowledge that surrounds us, you need to learn to be in control of your learning fate, and your instructors is there to facilitate your navigation through this maze. A Nobel Laureate, Herbert A. Simon, said:

“Learning results from what the student does and thinks and only from what the student does and thinks. The teacher can advance learning only by influencing what the student does to learn.""