STAT 555
Published on STAT 555 (https://onlinecourses.science.psu.edu/stat555)

Home > Welcome to STAT 555!

Welcome to STAT 555!

Computer with DNA microarray

Welcome to STAT 555, The Statistical Analysis of Genomics Data.  The emphasis in this course will be understanding statistical testing and estimation in the context of "omics" data so that you can appropriately design and analyze a high-throughput study. Since the measurement technologies are evolving rapidly, important objectives of the course are for students to gain a basic understanding of statistical principles and familiarity with flexible software tools so that you can continue to assess and use new statistical methodology as it is developed for new types of data.

By the end of the course, you should be able to tailor the analysis of your data to your needs while maintaining statistical validity.  You should come out of the course with insight so that you can assess the validity of new statistical methodologies as they are introduced as well as understand appropriate statistical analyses for data types not discussed in the class. 

The emphasis throughout will be on the discovery of reproducible effects. Typically we have been taught to think of reproducibility in terms of repeating an experiment and obtaining a similar result.  With the complexity of "omics" data, we also need to think of reproducibility of the data capture and statistical analysis.  Reproducibility will be a theme throughout the course.

Students and visitors to this course come from many backgrounds. Accordingly, we will start with introductory material in genomics and statistics.  For the first few weeks, there will also be a parallel set of lectures and exercises to introduce you to our main software package: R Studio. This means that at various points, especially in the first 4 weeks, each of you will find that there is a mix of material that you know already and new material. I hope that everyone will feel free to both ask “stupid” questions and to help correct or enhance material that I introduce.

Asking Questions

I started my journey into bioinformatics by asking “What is genomics?” and then “What is a gene?” My biology collaborators were still heatedly responding to one another an hour later and I learned a great deal through this process. Just about everything I thought was “obvious” turned out to be incorrect or controversial.  I cannot stress enough the importance of asking your questions and respectfully responding to the questions of others.   Because of the diversity of your backgrounds, we can all learn from one another more effectively than you can learn everything from the teacher.

The primary focus of the statistical analyses will be differential gene expression using microarrays and sequencing. These analyses use most of the statistical tools that are also useful for other array and sequencing based studies such as chromatin immunoprecipitation (ChIP), methylation and marker studies such as genome-wide association studies (GWAS) using single nucleotide polymorphisms (SNPs), two other types of studies that we will discuss (more briefly). These statistical tools are also useful for proteomics and other “omics” data although we will not discuss these. A number of important topics will only be touched upon briefly.  Some of these are covered in Applied Bioinformatics, BMMB 597D, Bioinformatics I: Foundations in Data Driven Life Sciences BMMB597B and Elements of Network Science and its Applications, PHYS 580.


Source URL: https://onlinecourses.science.psu.edu/stat555/node/1