Sources

Quality sources of data and notes about them.

Source of lots of datasets: R datasets.

Search the HTML index until you find the dataset you want. Example: 1377 is “education expenditure data”.

To load the dataset, use read_csv. Get the link by right clicking on the “CSV” column in the table. If you download the data to a file, then use a filename here.

edu_raw <- read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/robustbase/education.csv')
glimpse(edu_raw)

Run glimpse to make sure the result of reading the data was acceptable.

Read the documentation for the dataset by clicking on the link in the “Docs” column. For example, the education expenditure data documentation.

Many datasets come in a package, but downloading them from Github can be more convenient than installing the package.

Recommendation: store the download in a _raw variable and then do any transforming or renaming into the “real” variable you use for the rest of the document.

In this example I rename the myserious columns X1, X2 and X3.

edu <- edu_raw %>%
   rename(residents = X1, percapita = X2, young = X3)
Last modified August 18, 2023: 2022-2023 End State (7352e87)