posit::conf(2023)
Teaching Data Science Masterclass
Please complete the survey collecting your GitHub user names!
We need an environment where
data, analysis, and results are tightly connected, or better yet, inseparable
reproducibility is built in
documentation is human readable and syntax is minimal
In addition to teaching good (and popular) data science workflows:
Centralize the distribution (and collection) of all student assignments
Enable students to work collaboratively
Make Git and GitHub part of student workflow
Put students on a smooth path to open publishing of their project outputs
Select the option for a free course organization.
You will need to provide the following to request teacher benefits:
A brief description of how you plan to use GitHub
Establishing connection to an academic institution by verifying with a school-issued email address + school ID or some other proof of academic affiliation
Information about the school - link to website, address, etc.
Request verification early
Verification is manual and can take up to a few days, do it well before your semester begins!
Once you’ve set up your organization, automate redundant tasks by working directly with GitHub’s immensely rich API!
And do so from the comfort of your own home, i.e., using R.
Tools for managing github class organization accounts
Need students’ GitHub github_names
at a minimum
Recommend also collecting their emails, as students tend to make typos in their GitHub github_name
You can also use the user_exists()
function to check for validity of usernames your students provide
You need to instruct students to create GitHub accounts
Consider data privacy rules of institution / country (e.g., you may need to enter a data protection agreement for GDPR compliance)
Give some guidance for choosing a github_name
Can have students choose and submit github_name as an in-class activity during the first week of classes
ghclass
uses the GitHub API to interact with your course organization and repos - the API verifies your identity using a personal access token which must be created and saved in such a way that ghclass
can find and use it.
GITHUB_PAT
using## ✔ Invited user 'florence-nightingale' to org 'data-sci-101'.
## ✔ Invited user 'web-dubois' to org 'data-sci-101'.
[1] "aaronbaggett" "abbicormier" "alexCardazzi" "ali-day" "allissadillman" "anaidech" "andersolarsson" "caalo"
[9] "catalamarti" "cesBis" "davonperson" "deeprich" "dkon1" "drtbibel" "edward-burn" "erinbugbee"
[17] "erwinlares" "FulyaGokalp2" "gcicc" "georgestagg" "howarder" "ibertchen" "jakesauter" "jeremymcwilliams"
[25] "jnese" "jns6eey" "jonlinca" "KenSaville" "kriistiana" "kyeager4" "laylaguyot" "lbozzone"
[33] "LibbyHeeren" "math-mcshane" "mccrea-cobb" "mine-cetinkaya-rundel" "mkln" "murraylax" "nickduran" "norcalbiostat"
[41] "ritika-giri" "sigurdurthorjonsson" "tracykteal" "web-dubois" "wffadel"
All students should accept invites
It’s recommended that all students accept their invite before you start creating repos for them so that they’re all “members” not “outside contributors” – will make book keeping easier for you as an instructor.
org_create_assignment()
function to create copies of the starter repo with correct permissions for each of your students (or teams)Demo:
hw-1
Demo:
github_name
to the repo nameMake your starter repo a template
You will regularly forget to do this, but try not to. It makes the next step A LOT faster, which is important especially for larger courses.
If the starter repo is not a template, org_create_assignment()
first clones the repo locally, than pushes to GitHub, which is slow.
If the starter repo is a template, org_create_assignment()
, copying happens on GitHub, which is a lot faster (and you can also use the GitHub UI to make copies on the fly)
Your role: Student
Set the scene: You’re a student in my class who is about to start working on their first assignment that requires the use of Git and GitHub. In this class, you access RStudio via Posit Cloud, which means your Posit Cloud account should be able to interact with your GitHub account.
Connect your Posit Cloud and GitHub accounts:
Your role: Student
15:00
Demo:
Create a copy of the starter repo for each team, appending the their team_name
to the repo name
Give write access to each student in a team for their own team repo
Your role: Student
Go to the course organization on GitHub: github.com/data-sci-101.
Locate your Lab 1, read through the Getting Started section, follow the instructions with your team members.
Clone the repo using HTTPS – each person in a team should do this.
Discuss Question 1 as a team, identify one team member as the scribe, and have them write up the answer and commit and push.
Then, have all other team members pull that change.
10:00
Use the GitHub UI to add issues to each student’s repo
Instructors (and TAs) can view all repositories within the course organization with their owner
role in a GitHub organization
Make sure to @
mention the student so that they are notified when an issue is opened
Consider keeping points out of issues
Your role: Instructor
owner
s. (Please don’t delete any repos!)@
mention their github_name
. Submit your issue.Your role: Student
Check your email to confirm that you got notified of an issue being filed by your neighbor in your repo, then review the issue in on GitHub.
10:00
Trigger a GitHub action every time a student pushes to their repo to render their document and provide high level “rendered / didn’t render” type feedback automatically
Fetch artifacts from actions to obtain (and grade) an independently, automatically rendered version of students’ work for a high fidelity check on reproducibility
Beckman, M. D. et. al. (2021). Implementing version control with Git and GitHub as a learning objective in statistics and data science courses. Journal of Statistics and Data Science Education, 29(sup1), S132-S144. https://doi.org/10.1080/10691898.2020.1848485
The various approaches described in the article span different implementation strategies to suit student background, course type, software choices, and assessment practices. By presenting a wide range of approaches to teaching Git, the article aims to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.
🔗 pos.it/teach-ds-conf23 / Module 2