Day 2 Postmortem

Author

Erik Westlund

Published

June 11, 2025

Video Recording

The video recording can be viewed by clicking here

The password has been shared via email and CoursePlus.

Course Site

There is now a course site at https://erikwestlund.github.io/data-viz-summer-25/.

This archives the lectures, examples (in final rendered form), and postmortems.

The sources files are still in the GitHub repository in lectures/, examples/, and postmortems/.

The compiled files are in the docs/ directory (these are the files GitHub pages makes publicly available at the above URL).

Location of Slides

The slides have been moved from the slides directory in the GitHub repository to docs/slides.

Problem Set 2

Due Date

I have allowed for an extension on the submission of Problem Set 2.

Assignments

Problem Set 2 asks you to create a three-variable visualization, building it up later by layer. This typically means in ggplot2 you will have:

  • Two variables accounted for in the aesthetic mapping. (e.g., ggplot(data, aes(x = x, y = y))
  • One other variable handled using fill, facet_wrap, or another technique.

An example of this approach is provided in the prams_3_iteration_aggregation.qmd file.

For example, in the example, we account for three variables in this way:

  • Two variables accounted for in the aesthetic mapping: x = subgroup and y = depression_within_3_months_birth
  • A third variable (location_abbr) accounted for with facet_wrap(~ location_abbr)

The final plot uses three variables as well, even if it appears to use two. The aggregation done before summarizing variables turns subgroup and depression_within_3_months_birth into a single variable.

AI/LLM Illustration

I have updated the AI example with the prompts we used (most of them) and the results of our inquiry.

You may want to review it in order to see how you can interact with an LLM to work through a data file. Remember:

  • You must be cautious with private data, and so this type of workflow is not always appropriate.
  • You should spend more time than we did with data documentation and codebook before jumping in.

I would also note that the more detailed instructions you give the LLM, the better it performs. A tactic I’ve used with success it to send my prompts to one LLM, ask it to refine it to give to another, and then ask the second LLM to respnod. It tends to provide more context which helps the second LLM perform better.

Other Examples

Please review the examples we worked through in class. Due to time limitations, we cannot discuss every line of code and every detail.

GitHub Issues

Several students had trouble pushing to GitHub with their work. Please try the following steps:

Using The Command Line

If you having issues with RStudio’s Git functionality, you can try using the command line. Click “Terminal” in the bottom left pane or select “Tools” > “Shell” > “Terminal” from the menu.

There you can use the following commands:

git add .
git commit -m "Your message"
git push

Authentication Issues

If you are having trouble getting GitHub to accept your push after providing it credentials, try the following:

  1. Go to your GitHub account settings.
  2. Click “Developer settings”
  3. Click “Personal access tokens”
  4. Click “Generate new token (Classic)””
  5. Give it a name and description
  6. Select all boxes on “repo”
  7. Click “Generate token”
  8. Copy the token
  9. Use the token as your GitHub password when you push

Remote Issues

Ensure your remote is set correctly.

First, check out which if any remotes are set:

git remote -v

If you see nothing, then run:

git remote add origin https://github.com/YOUR-USERNAME/data-viz-summer-25.git

If you see an unfamiliar remote, you can remove it with:

git remote remove origin

If you see a remote, you can set it to the correct one with:

git remote set-url origin https://github.com/YOUR-USERNAME/data-viz-summer-25.git

Note that the above call GitHub the origin remote.

Master vs. Main Branch

Community standards are moving away from the master branch to the main branch. If you are using master, you can change it to main with:

git branch -m master main

You can then push or pull like so, assuming an origin remote is set:

git push origin main
git pull origin main

Errors About The Materials In Your Repo Being More Up To Date

If you want to pull the latest materials from the repository, you can do so with the following command:

git pull

If you are absolutely certain you’ve never pushed any work up, you can force the materials in on the command line:

git push --force

This will overwrite the materials in the repository with your current work.