Statistical Inference is the 6th course in the Coursera data science specialization

 

 
Quick Overview
Duration of the course 4 weeks
Work load 5-10 hours pr week, depending on your background and whether you want to do the swirl and homework exercises.
Videos with slides About 5 hours in total.
Quizzes 4, one per week.
Other material 14 swirl exercises (not mandatory). 4 homework exercises made in slidify (not mandatory), 4 homework videos, about 30 minutes each.
Course project In week 3, see details below.
Formal prerequisites  R programming, mathematical aptitude. 
Course dependencies: This course has hard dependencies on R Programming and The Data Scientist’s Toolbox. In addition, students will need basic (non calculus) mathematics skills.

Additional, helpful 

prerequisites

A little bit of experience with basic statistics and probability calculus will make this course a lot easier.
Level of difficulty given only the formal prerequisites You will need to work hard.
Level of difficulty given the formal and additional prerequisites Medium.

 

Course Project


The course project consists of two separate projects, each delivered as a 3 page pdf report created with knitter or other software.
The first is a simulation exercise where you will use R to simulate the distribution of means of an exponential distribution. You will compare the simulated mean and variance with the theoretical ones and show that the distribution of averages is normal. You should demonstrate knowledge of the Law of large numbers and the central limit theorem. You can do this part after understanding the week 2 material.
The second project is an inferential analysis of a toothgrowth dataset. First you should understand the dataset and demonstrate this in the report by performing a basic exploratory data analysis and provide a summary of the data. Loading the dataset in R Studio and writing *?ToothGrowth* will provide you with a brief explanation of the dataset which gives some context. Then you are asked to perform relevant confidence intervals and/or hypothesis tests. You will be using R to perform students T-tests and look at confidence intervals or p-values. You can do this part after understanding the week 3 material.

 

About the course

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data. [Coursera].

 

What will you learn

Week 1

  • Basic rules og probability and conditional probability.
  • Diagnostic tests and Bayes rule.
  • Distinguishing between sample and population quantities.
  • Expected value defined.
  • Probalibity mass function, probability density function.
  • Quantile, percentile, median.

Week 2

  • Variance, standard deviation and standard error.
  • Common distributions: Bernoulli, Binomial, Normal, Poisson.
  • Central Limit Theorem, Law of Large Numbers. (links)
  • Wald Confidence interval.

Week 3

  • T confidence intervals.
  • T distribution, dependent or independent sample groups.
  • Hypothesis testing, null hypothesis, type I and type II errors.
  • P-values.


Week 4

  • Power, effect.
  • R package manipulate.
  • Multiple comparison, error measure.
  • Correction algorithms: Bonferroni correction, controlling false discovery rate, adjusted p-values.
  • Resampling and permutation testing.
  • Bootstrap.

Review

The course gives an introduction to applied statistics, it touches upon quite a few topics in only 4 weeks, some topics are treated in depth and some only introduced. The week 4 material is mostly to expand the horizons. The material in week 1-3 covers topics that you would meet in a first year course for statisticians, with a bit less theory though. I think John Hopkins does a good job in introducing basic statistics in only 4 weeks, and with both videos, slides, homework and swirl there is material to suit most needs, the only thing I am missing is a good course book, as it can be difficult to extract the theory from the slides in a well-structured manner.

Where to go next

Move on:

After finishing this course, consider:
Coursera Regression Models, the 7th course in the Coursera data science specialization.

Go deeper:

If you want to explore this topic further, consider:
Book: Introductory Statistics with R by Peter Dalgaard.