The course project consists of two separate projects, each delivered as a 3 page pdf report created with knitter or other software.
The first is a simulation exercise where you will use R to simulate the distribution of means of an exponential distribution. You will compare the simulated mean and variance with the theoretical ones and show that the distribution of averages is normal. You should demonstrate knowledge of the Law of large numbers and the central limit theorem. You can do this part after understanding the week 2 material.
The second project is an inferential analysis of a toothgrowth dataset. First you should understand the dataset and demonstrate this in the report by performing a basic exploratory data analysis and provide a summary of the data. Loading the dataset in R Studio and writing *?ToothGrowth* will provide you with a brief explanation of the dataset which gives some context. Then you are asked to perform relevant confidence intervals and/or hypothesis tests. You will be using R to perform students T-tests and look at confidence intervals or p-values. You can do this part after understanding the week 3 material.
About the course
Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data. [Coursera].
What will you learn
- Basic rules og probability and conditional probability.
- Diagnostic tests and Bayes rule.
- Distinguishing between sample and population quantities.
- Expected value defined.
- Probalibity mass function, probability density function.
- Quantile, percentile, median.
- Variance, standard deviation and standard error.
- Common distributions: Bernoulli, Binomial, Normal, Poisson.
- Central Limit Theorem, Law of Large Numbers. (links)
- Wald Confidence interval.
- T confidence intervals.
- T distribution, dependent or independent sample groups.
- Hypothesis testing, null hypothesis, type I and type II errors.
- Power, effect.
- R package manipulate.
- Multiple comparison, error measure.
- Correction algorithms: Bonferroni correction, controlling false discovery rate, adjusted p-values.
- Resampling and permutation testing.
The course gives an introduction to applied statistics, it touches upon quite a few topics in only 4 weeks, some topics are treated in depth and some only introduced. The week 4 material is mostly to expand the horizons. The material in week 1-3 covers topics that you would meet in a first year course for statisticians, with a bit less theory though. I think John Hopkins does a good job in introducing basic statistics in only 4 weeks, and with both videos, slides, homework and swirl there is material to suit most needs, the only thing I am missing is a good course book, as it can be difficult to extract the theory from the slides in a well-structured manner.
Where to go next
If you want to explore this topic further, consider:
Book: Introductory Statistics with R by Peter Dalgaard.