p-values

Problem set P1: p values and the null hypothesis.

Key questions for class discussion:

  • I get the purpose of the null hypothesis $H_0$, what is the purpose of the other distribution?

  • How often do you expect your experiment to produce $p$-values less than 0.05?

Answer 1: The “other distribution” is the truth. Knowing the truth lets you make a distribution of what p-values you expect. This gives you an idea of how often your experiment will give $p<0.05$ for example. (Since the $p$-value is a random variable, you won’t see small $p$-values all of the time… unless the truth is very very far from your null hypothesis.)

More questions:

  • Under what circumstances should you “accept the null hypothesis”? Discuss what goes wrong with this idea when you start to think about the details.

  • Does scaling $x \mapsto x/3$ change the area under the curve for the Gaussian distribution? Why not?

Problem Set P1

Given information: your null hypothesis $H_0$ is that $X$ comes from a normal distribution (the fancy name is Gaussian) with mean 5 and variance 9.

  1. You observe $x=8$.
    1. What $p$-value do you report? Explain how you get your number.
    2. Interpret this value in a sentence. Use a percentage in your answer don’t just say something about “reject the null hypothesis”.
  2. (Standardization or $z$-score.)
    1. What linear transformation changes the distribution for $X$ into the unit normal distribution?
    2. The outputs of the transformation above are called z-scores. What is the z-score of $x=10$?
    3. What is the $p$-value associated with observing $x=0$?
  3. Suppose you have set your signficance level of $\alpha=0.03$ and the true distribution of the random variable $Y$ is a normal $\mathcal{N}\left(\mu=1, \sigma^2 = 1/4\right)$.
    1. Show the distribution of $p$-values for 10,000 sample y’s.
    2. What is the probability that you will get a significant result from a single experiment? (An estimate is ok.)

Appendix

It seems like applying a factor shinking $x$ by $\sigma$ would change the area under the normal distribution curve. Some time we should talk about why that does not happen. If you look at the density function for a given $\sigma$, you will see that there are two places that $\sigma$ appears, … one of them acts as the “du” area correction factor for your transformed integral. (This is too terse to be a real explanation.)

The formula for the Gaussian (normal) distribution with a mean of 0 and a standard deviation of $\sigma$ is: $$ f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left(-\frac{1}{2}(x/\sigma)^2\right) $$

Last modified August 18, 2023: 2022-2023 End State (7352e87)