Ch13
Setup
The Lahman data is a huge dataset of baseball statistics. See the overview of the Sean “Lahman” baseball database.
install.packages(c("Lahman"))
library(Lahman)
Exercises
-
The
People
database has biographical information about the players.- Make a histogram showing when players passed away in the 20th century. Don’t use the default bin width.
-
- Make a bar chart showing the frequency of players being in “x” all-star games. Only include players with x>6. Fill the bars in with colors to denote Amerian League and National League player counts.
-
(Deleted)
-
Make two bar charts of allstars counting how many players in the all-star database bat with each hand.
- count every all-star appearance once (so the person who is an all-star 20 times counts 20 times)
- count every all-star player only once (so the person who is an all-star 20 times counts once)
-
Tag players with all-star appearances in the people data set. Make a box plot of salaries of players with all-star appearances vs without all-star appearances.
-
(Prep/easy.) The
Teams
data frame has information about every team for every year. A team is in the playoffs if it is a league winner, division winner, or wildcard winner.Add a boolean column
playoffs
that reflects this information.LgWin == "Y" | DivWin == "Y" | WCWin == "Y"
-
The
Appearances
data frame has information about what team each player played for, each year. Combine this information with the aboveplayoffs
information to create a data frame of appearance information for everyone who was part of a team that went to the playoffs. -
Building on your previous answer, make a table containing the
playerID
and the first year they were on a team that went to the playoffs. -
Use the previous questions to make a scatterplot showing year of death vs first year in the playoffs. There are a lot of points, so experiment with transparency (
alpha
).
Open ended
-
Look at the data available in the Lahman datasets and brainstorm ten questions you could answer.
-
Answer one of your questions.