2025-06-10
“Graphical excellence is the well-designed presentation of interesting data—a matter of substance, of statistics, and of design…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“It is that which gives to the view the great number of ideas in the shortest time with the least ink in the smallest space…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“It is nearly always multivariate…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
“Graphical excellence requires telling the truth about the data…”
Edward Tufte, The Visual Display of Quantitative Information, 1983
Charles Minard’s Napoleon’s March
“[Minard’s classic image] can be described and admired, but there are no compositional principles on how to create that one wonder graphic in a million.””
Edward Tufte, The Visual Display of Quantitative Information, 1983
Instead, Tufte suggests:
We will revisit more of Tufte’s principles throughout the course.
ggplot2
ggplot2
is a powerful package for creating data visualizationsgit
is a powerful tool for version controlgit
is not GitHubgit
pandas
and numpy
matplotlib
and seaborn
ggplot2
syntax, since LLMs really can mostly solve technical visualization problemsYou may need to configure your git username and email.
On Windows, you can run this in “Git Bash”. On Mac, you can run this in the terminal.
I recommend these conventions:
work/
├── example_* # Learning examples
├── ps1_* # Problem set 1
├── ps2_* # Problem set 2
├── ps3_* # Problem set 3
├── final_project_* # Final project
├── shared_* # Shared resources
└── data/ # Data directory
Flat Structure:
work/
├── data_prep.qmd
├── analysis.qmd
└── data/
Nested Structure:
work/
├── data_prep/
│ └── data_prep.qmd
├── analysis/
│ └── analysis.qmd
└── data/
Relative Paths are Tricky!
../../data/file.csv
are:
.Rproj
Filehere
Package../
../
countinggit
git
examples/project-example/
_quarto.yml
for configurationdata/
directory.gitignore
work/
using RStudio (File -> New Project -> Existing Directory)I have tested this and RStudio handles the remote repository in the directory one higher up.
Important!
The work/
directory is your personal workspace for everything in this course:
You are responsible for:
This is your space - keep it clean and organized!
Let’s get a file set up to work with Quarto and have data to read from.
_quarto.yml
from examples
to _quarto.yml
in your new projectdata/
directoryWe are now going to create a file called .gitignore
to tell git to ignore certain files.
.gitignore
(if it doesn’t already exist).gitignore
Never Commit Sensitive Data!
.gitignore
tells Git which files to ignore.gitignore
Matters.gitignore
SetupWith the course-wide .gitiginore repository file, you will see these lines:
work/data/*
work/.Rproj.user/
work/.Rhistory
work/.RData
work/data/*
: Keeps all data files in the work directory localwork/.Rproj.user
: RStudio temporary fileswork/.Rhistory
: Command historywork/.RData
: R workspace filesIn RStudio:
example_cars_1_data_prep.qmd
in your work/
directoryIn RStudio:
You can remove the editor: visual
line – we’re going to try to work with text.
Let’s create the data preparation setup file.
Include this as a setup block:
```{r setup-prep}
#| echo: false
#| message: false
library(dplyr)
library(readr)
library(stringr)
library(ggplot2)
```
And then this to load and prepare the data:
```{r load-data}
# Load and prepare data
mtcars_clean <- mtcars |>
mutate(
car_name = rownames(mtcars),
make = word(car_name, 1), # First word is make
model = str_remove(car_name, paste0(make, " ")), # Rest is model
efficiency = mpg / wt
)
# Save processed data
write_csv(mtcars_clean, "data/mtcars_clean.csv")
```
Render the file to see the results (click the “Render” button above the editor)
We will get two outputs:
data/
directoryIn RStudio:
example_cars_2_analysis.qmd
in your work/
directory```{r setup-analysis}
#| echo: false
#| message: false
library(dplyr)
library(readr)
library(ggplot2)
library(forcats)
```
```{r load-processed}
# Load processed data
df <- read_csv("data/mtcars_clean.csv")
df |> head()
```
You have:
Now, let’s set up version control in your project.
In the terminal:
git add .
git commit -m "Description of changes"
git push
Choose whichever method you’re most comfortable with as both accomplish the same thing!
From here on out, it’s up to you to create the code blocks, such as below:
```{r}
# Code goes here
```
ggplot2
ggplot2
gg
or grammar of graphicsWhat’s wrong with this?
efficiency
variableefficiency_by_make <- df |>
group_by(make) |>
summarise(avg_efficiency = mean(efficiency)) |>
mutate(make = fct_reorder(make, avg_efficiency)) |>
ggplot(aes(x = make, y = avg_efficiency)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
theme(panel.grid.major.y = element_blank()) +
labs(
title = "Average Fuel Efficiency by Make",
x = NULL,
y = "Average Efficiency (MPG/1000 lbs)"
)
Graduate Summer Institute of Epidemiology and Biostatistics