Ch13

Relational data.

Setup

The Lahman data is a huge dataset of baseball statistics. See the overview of the Sean “Lahman” baseball database.

install.packages(c("Lahman"))
library(Lahman)

Exercises

  1. The People database has biographical information about the players.

    • Make a histogram showing when players passed away in the 20th century. Don’t use the default bin width.
  2. The AllstarFull dataset

    • Make a bar chart showing the frequency of players being in “x” all-star games. Only include players with x>6. Fill the bars in with colors to denote Amerian League and National League player counts.
  3. (Deleted)

  4. Make two bar charts of allstars counting how many players in the all-star database bat with each hand.

    • count every all-star appearance once (so the person who is an all-star 20 times counts 20 times)
    • count every all-star player only once (so the person who is an all-star 20 times counts once)
  5. Tag players with all-star appearances in the people data set. Make a box plot of salaries of players with all-star appearances vs without all-star appearances.

  6. (Prep/easy.) The Teams data frame has information about every team for every year. A team is in the playoffs if it is a league winner, division winner, or wildcard winner.

    Add a boolean column playoffs that reflects this information.

     LgWin == "Y" | DivWin == "Y" | WCWin == "Y"
    
  7. The Appearances data frame has information about what team each player played for, each year. Combine this information with the above playoffs information to create a data frame of appearance information for everyone who was part of a team that went to the playoffs.

  8. Building on your previous answer, make a table containing the playerID and the first year they were on a team that went to the playoffs.

  9. Use the previous questions to make a scatterplot showing year of death vs first year in the playoffs. There are a lot of points, so experiment with transparency (alpha).

Open ended

  1. Look at the data available in the Lahman datasets and brainstorm ten questions you could answer.

  2. Answer one of your questions.

Last modified August 18, 2023: 2022-2023 End State (7352e87)