2023-01-23 Weekly

Homework set ISL-05W, confidence intervals and the bootstrap method.

Weekly Homework

Big Picture

  1. What is the definition of a p-value? What (abstract) information do you need to have in order to compute a p-value?

  2. Given an estimate $a = 13$ and an unknown true answer $r$, what is the meaning of the statement that “the 85% confidence interval for the answer $r$ is $a \pm 3$”?

Monday

  1. X is a fair 4-sided dice. Estimate the variance of X by doing two rolls (called A and B) then using the formula $$ A * B - (A+B)/2 $$ Find the 50% confidence interval for your estimate. Show your evidence that you are right.

Tuesday: Discrete Confidence Intervals

Abbreviation note: d4 = four sided dice, numbered 1 through 4; d10 = ten sided dice, numbered 1 through 10. All dice are fair.

I use the term size of a confidence interval to mean its “radius”: the number $v$ so that the confidence interval is $x \pm v$.

  1. You’re estimating the expected difference between the roll of a d4 and a d10. Estimation procedure: roll d4 and d10, take the absolute value of the difference.

    1. Find the size of a 47.5% confidence interval.
    2. Find the size of a 97.5% confidence interval.
  2. Set up a data frame that you can use to compute the confidence interval above.

    1. The starting point is a data frame with all possible combinations of results from a d4 and a d10.

        df_dice <- full_join(tibble(x = seq(4)), tibble(y = seq(10)), by=character())
      
    2. Make a column with the positive difference between the x and y.

    3. Compute the true mean of the positive differences (all of these are equally likely).

    4. Make a column containing the positive “deviation”: the difference between the true mean and the positive difference for a particular x and y.

    5. Make a column containing the probabilty of each event. This will be constant.

    6. Make a summary showing the probabilty of each deviation (so each deviation occurs only once in the results table). Arrange this in ascending order of deviation value.

    7. Add in a cumulative probabilty column (cp) that shows the probabilty of a deviation the given value or smaller.

    Your results table should begin like this:

    Deviation p cp
    0.5 0.225 0.225
    1.5 0.250 0.475
  3. In this problem you will investigate the effect of using an unfair die on your confidence interval.

    The unfair d6 is called $U$ and has $p(U=1) = 0.4$, $p(U=6) = 0.2$, and the other probabilities all $0.1$.

    The fair d6 is called $F$.

    This problem is intended to be done on R.

    1. What is the expected value $E[U + F]$?
    2. What is the variance of $U+F$?
    3. Using the estimation method of making a single roll of each to estimate $U+F$, produce a table (data frame) of possible outcomes and the probability of each. (Begin by listing outcomes and probabilies for every possible result, then combine those with the same outcome to make your table.)
    4. What is the size of a 70% confidence interval for your estimate of the sum?

Thursday: Continous Confidence Intervals

Continuous distributions:

  • Normal. (norm) Give mean, sd.
  • Uniform. (unif) Give min and max.
  • Exponential. (exp) Give rate.
  • Student t. (t) Give degrees of freedom.

We will be working with probabilities “densities” of an exact value $x$ being randomly chosen. (They are called densities because their area makes a probability.)

  1. Suppose that X has a normal distribution with mean 20 and standard deviation of 8. Get R to tell you the probability density at x=28.
  2. Suppose that Y has a uniform distribution with min=5 and max=20. Get R to tell you the probability density of Y at y=10.
  3. Suppose that Z has an exponential distribution with rate 1/3. Get R to tell you the probability density of Z at z=1.5.
  4. (Mean of X)
    1. Use the method of averaging two samples from X to estimate the mean of X.
    2. Create a data frame with x1 and x2 both containing N=30 samples from the X distribution. Create a mean column that is computed from the two samples. We will say the probability of each row is 1/30 because all of the choices are equally likely. (There is a note about this.) You can make a p column for this value.
    3. Find the average of these means, call that the “estimated mean”. Add a deviation column (dev) that is the absolute difference between the the mean in a single row and the “estimated mean”.
  5. Continuing the last problem: What is the size (“radius”) of the 30% confidence interval around your mean? (Note: this is the absolute deviation that gives you a 30% cumulative probability when the columns are arranged properly.)
  6. Turn your solution to the last problem into a function auto_confidence that takes in a number N to use for the number of samples to use and outputs the radius of the resulting confidence interval.
  7. Make a histogram showing the results from 30 confidence intervals with N=10.
  8. (A different distribution.) You are going to estimate the mean of the exponential distribution Z with rate 1/3 by taking the mean of two random samples. Use the same process as above. Produce a histogram of your confidence interval results.

Saturday: Continous Expectation

The method: break up the possible x values into a bunch of small intervals every 0.01. When computing the probability of x, use 0.01*p(x). The 0.01 is there because actually you are estimating the probability of getting any value in the interval $[x, x+0.01]$ using the same probability.

Use any probability in that interval (usually people use the probability on the left or right end).

  1. Suppose that X is exponentially distributed with rate=0.8. You can make the probability graph below. (Note: X is always at least 0.)

    probexp <- function (x) { dexp(x, rate=0.8) }
    ggplot() +
        geom_function(fun=probexp) +
        scale_x_continuous(limits=c(0,5))
    
    1. Make a graph of the probability density of X.

    2. What is the probability (“density”) when X=1?

    3. Use a built-in function to find the probability P(X<1)?

    4. Find P(X<1) by:

      1. Making a data frame with every 1/100th between 0 and 1. x=seq(0,1,0.01)
      2. Making a probability column.
      3. Adding up each probability value multiplied by the range of x’s it covers (always 0.01 in this case).
    5. Find P(1.5 < X < 2) using the same method.

    6. Find the expected value of X given that X is in the interval $0 \le x \le 1.0$. This is written $E[X | 0\le X \le 1]$. You need to divide by the probabilty that X is in the interval. (Discussion?)

    7. Find the expected value of X on the interval $1.5 \le x \le 2.0$ using both (i) 0.01 and (ii) 0.001 size steps. (iii) Compare the results. This is $E[X | 1.5 \le X \le 2]$.

    8. Find the expected value of $X^2$ on the interval $1.5 \le x \le 2.0$. This is $E[X^2 | 1.5 \le X\le 2]$

    9. Find the variance of X when restricted to the interval $1.5 \le x \le 2.0$. This is written $\Var(X | 1.5 \le X \le 2)$.

Last modified August 18, 2023: 2022-2023 End State (7352e87)