Sources
Source of lots of datasets: R datasets.
Search the HTML index until you find the dataset you want. Example: 1377 is “education expenditure data”.
To load the dataset, use read_csv
. Get the link by right
clicking on the “CSV” column in the table. If you download the
data to a file, then use a filename here.
edu_raw <- read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/robustbase/education.csv')
glimpse(edu_raw)
Run glimpse
to make sure the result of reading the data was
acceptable.
Read the documentation for the dataset by clicking on the link in the “Docs” column. For example, the education expenditure data documentation.
Many datasets come in a package, but downloading them from Github can be more convenient than installing the package.
Recommendation: store the download in a _raw
variable and then
do any transforming or renaming into the “real” variable you use
for the rest of the document.
In this example I rename the myserious columns X1, X2 and X3.
edu <- edu_raw %>%
rename(residents = X1, percapita = X2, young = X3)