About this class
This class will focus on classical “statistical learning”. We can discuss whether to attempt “modern” work (e.g., neural networks) in second semester after learning some of the foundations.
Goals
-
You will see numerical evidence and do experiments to discover facts about probability and statistics.
-
You will be able to analyze a dataset using R. You will know how to ask interesting questions and answer them with statistical reasoning.
-
You will know how to reason statistically using probability. For example, you will know how to compute the probability of an occurrence assuming some hypotheses is true (p-values and more).
-
You will have a medium-deep understanding of bias and variance.
- You will know the mathematical definitions.
- You will know how to reason informally about whether a process is “high variance” or “low variance” and the consequences.
- You will know about the bias-variance tradeoff and you will have experienced it in action while working with data.
-
You will be a competent R user, comfortable with the tools from the R for Data Science book. In particular, you will be able to make any infographic from the raw underlying data (using
ggplot
). -
You will know the basics of using linear regression to model a process. You will do a small amount of analysis of the errors in the predictions of a linear model.
Understanding First
Machine learning is a massive field. The breadth of knowledge required to actually understand the magic is beyond the reach of almost all high school students. Understanding a single corner of it well could be a Ph.D. thesis.
A course like this has to cut corners and emphasize certain topics even more than a college course would. You can influence what we study by asking questions and showing your thinking - right or wrong.
I will pick a sequence of topics that lead to an understanding of something. I will emphasize reasoning over memorizing. If it’s hard to understand exactly why something happens, we will at least look at experimental evidence and talk about it.
After this class
You might not master every topic we study. Master some of them. When you have to explain what you learned in the class, mention only the topics you believe you understand well.
Your credibility (and mine) will be questioned. Nobody is going to believe you really did anything except copy the instructor’s code and change a few things. Whatever you really learned, you will need to demonstrate mastery of in the face of skepticism.
When I summarized this view in class, I wrote the equation “HSML=BS”. Do everything you can to refute that view.
Misc
- You will be able to decompose a complicated event into simpler events and think about it using conditional probabilities. This includes calculating probability, expected value, and variance.