1 - Introduction

Deploy and maintain models with vetiver

Welcome!

Wi-Fi network name

Posit Conf 2023

Wi-Fi password

conf2023

Welcome!

There are gender-neutral bathrooms located among the Grand Suite Bathrooms
There are two meditation/prayer rooms: Grand Suite 2A and Grand Suite 2B
- Open Sunday - Tuesday 7:30 AM - 7:00 PM, Wednesday 8:00 AM - 6:00 PM
The lactation room is located in Grand Suite 1
- Open Sunday - Tuesday 7:30 AM - 7:00 PM, Wednesday 8:00 AM - 6:00 PM
Participants who do not wish to be photographed have red lanyards; please note everyone’s lanyard colors before taking a photo and respect their choices
The Code of Conduct and COVID policies can be found at https://posit.co/code-of-conduct/
- Please review them carefully! ❤️
- You can report Code of Conduct violations in person, by email, or by phone; see the policy linked above for contact information

Who are you?

You have intermediate R or Python knowledge
You can read data from CSV and other flat files, transform and reshape data, and make a wide variety of graphs
You can fit a model to data with your modeling framework of choice wide variety of graphs
You have exposure to basic modeling and machine learning practice
You do not need expert familiarity with advanced ML or MLOps topics

Who are we?

@isabelizimm

@isabelizimm@fosstodon.org

isabelizimm.github.io

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com

Asking for help

🟪 “I’m stuck and need help!”

🟩 “I finished the exercise”

If you prefer, post on GitHub Discussions for help:

https://github.com/posit-conf-2023/vetiver/discussions

Plan for this workshop

Versioning
- Managing change in models ✅
Deploying
- Putting models in REST APIs 🎯
Monitoring
- Tracking model performance 👀

Introduce yourself to your neighbors 👋

Optional

Post an introduction on GitHub Discussions: https://github.com/posit-conf-2023/vetiver/discussions

What is machine learning?

MLOps is…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

MLOps with vetiver

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should be the one to operationalize that model!

Your turn 🏺

Activity

What language does your team use for machine learning?

What kinds of models do you commonly use?

Have you ever deployed a model?

03:00

Workshop infrastructure

Log in at pos.it/class with the identifier vetiver2023
Even if you plan to work locally, set this up with us so you can use Posit Connect as a deployment target
For Posit Workbench, use RStudio for R or VS Code for Python
Open the folder class-work in the vetiver directory

Your turn 🏺

Activity

Start a new session, either RStudio or VS Code.

In your new session, open the folder class-work in the vetiver directory, and choose the first Quarto file!

05:00

Chicago food inspections data

The city of Chicago offers programmatic access to health code inspections of restaurants
Can certain measurements be used to predict inspection outcome?
Data from Chicago Department of Public Health, available at https://data.cityofchicago.org/

Inspection results

N = 6967
A class outcome, results
Other variables to use for prediction:
- facility_type is a nominal predictor
- risk is a nominal (or maybe ordinal) predictor
- total_violations is a numeric predictor
- inspection_date is a date predictor

R

library(arrow)
path <- here::here("data", "inspections.parquet")
inspections <- read_parquet(path)

Python

import pandas as pd
inspections = pd.read_parquet('../data/inspections.parquet')

Inspection results

results	inspection_date	aka_name	facility_type	risk	total_violations
PASS	2019-10-16	ORIGINAL MAXWELL STREET GRILL	RESTAURANT	RISK 2 (MEDIUM)	30
PASS	2022-10-20	THE DAILY GRIND	RESTAURANT	RISK 2 (MEDIUM)	0
FAIL	2020-03-05	KRISPY'S SEA FOOD AND CHICKEN	RESTAURANT	RISK 2 (MEDIUM)	25
FAIL	2020-05-18	SLIM'S	RESTAURANT	RISK 1 (HIGH)	41
FAIL	2019-11-21	FOOD STOP	GROCERY STORE	RISK 3 (LOW)	5
FAIL	2019-05-30	METROPOLITAN WATER RECLAMATION	RESTAURANT	RISK 1 (HIGH)	18
FAIL	2019-11-07	STOCKTON	RESTAURANT	RISK 3 (LOW)	0
PASS	2020-08-25	U.B. DOGS	RESTAURANT	RISK 1 (HIGH)	28
PASS	2022-08-10	BOXCAR BETTYS	RESTAURANT	RISK 1 (HIGH)	5
PASS	2021-06-25	SIZZLIN SKILLETS	RESTAURANT	RISK 1 (HIGH)	18
PASS	2020-10-27	FINEST FOOD & SUBS	GROCERY STORE	RISK 2 (MEDIUM)	30
PASS	2020-10-07	CARNITAS & TACOS MARAVATIO INC.	RESTAURANT	RISK 1 (HIGH)	40
PASS	2020-08-27	DAN'S HOT DOG STAND	RESTAURANT	RISK 2 (MEDIUM)	56
PASS	2022-12-05	MURA MURA RAMEN	RESTAURANT	RISK 1 (HIGH)	8
PASS	2022-11-28	TAQUERIA LOS GALLOS	RESTAURANT	RISK 1 (HIGH)	107

Your turn 🏺

Activity

Explore the inspections data on your own!

What’s the distribution of the outcome results?
What’s the distribution of the numeric variable total_violations?
How do results differ across facility type?

Share something you noticed with your neighbor.

08:00

library(tidyverse)
inspections |> 
  group_by(inspection_date = floor_date(inspection_date, unit = "month")) |> 
  summarise(results = mean(results == "PASS")) |> 
  ggplot(aes(inspection_date, results)) +
  geom_line(alpha = 0.8, linewidth = 1.5) +
  scale_y_continuous(labels = scales::percent) +
  labs(y = "% of inspections that have a PASS result", x = NULL)

from plotnine import ggplot, aes, geom_boxplot, coord_flip
(ggplot(inspections, aes('facility_type', 'total_violations', fill = 'facility_type')) 
  + geom_boxplot(alpha = 0.5, show_legend = False)
  + coord_flip()
)
#> <Figure Size: (640 x 480)>

inspections |>
  ggplot(aes(inspection_date, total_violations, z = as.integer(results) - 1)) +
  stat_summary_hex(alpha = 0.7) +
  scale_fill_viridis_c(labels = scales::percent) +
  labs(fill = "% passed", x = NULL)

Time for building a model!

Spend your data budget

R

library(tidymodels)
set.seed(123)
inspect_split <- initial_split(inspections, prop = 0.8)
inspect_train <- training(inspect_split)
inspect_test <- testing(inspect_split)

Python

from sklearn import model_selection
import numpy as np
np.random.seed(123)
inspections['inspection_date'] = pd.to_datetime(inspections['inspection_date'])
inspections['month'] = inspections['inspection_date'].dt.month
inspections['year'] = inspections['inspection_date'].dt.year
X, y = inspections.drop(columns=['aka_name', 'results', 'inspection_date']), inspections['results']
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size = 0.2
)

Fit a logistic regression model 🚀

Or your model of choice!

R
Python

inspection_rec <- 
  recipe(results ~ facility_type + risk + total_violations + inspection_date, 
         data = inspect_train) |> 
  step_date(inspection_date, features = c("month", "year"), keep_original_cols = FALSE)

inspection_fit <-
  workflow(inspection_rec, logistic_reg()) |> 
  fit(data = inspect_train)

from sklearn import preprocessing, linear_model, pipeline, compose

categorical_features = ['facility_type', 'risk', 'month']

oe = compose.make_column_transformer(
    (preprocessing.OrdinalEncoder(
            handle_unknown="use_encoded_value", unknown_value=-1),
        categorical_features,),
    remainder="passthrough",
).fit(X_train)
lr = linear_model.LogisticRegression().fit(oe.transform(X_train), y_train)
inspection_fit = pipeline.Pipeline([("ordinal_encoder", oe), ("random_forest", lr)])

Your turn 🏺

Activity

Split your data in training and testing.

Fit a model to your training data.

05:00

Create a deployable bundle

Create a deployable model object

R

library(vetiver)
v <- vetiver_model(inspection_fit, "inspection-result-rstats")
v
#> 
#> ── inspection-result-rstats ─ <bundled_workflow> model for deployment 
#> A glm classification modeling workflow using 4 features

Python

from vetiver import VetiverModel
v = VetiverModel(inspection_fit, "inspection-result-python", prototype_data = X_train)
v.description
#> 'A scikit-learn Pipeline model'

Deploy preprocessors and models together

What is wrong with this?

Your turn 🏺

Activity

Create your vetiver model object.

Check out the default description that is created, and try out using a custom description.

Show your custom description to your neighbor.

05:00

Version your model

pins 📌

The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.

You can pin objects to a variety of pin boards, including:

a local folder (like a network drive or even a temporary directory)
Posit Connect
Amazon S3
Azure Storage
Google Cloud

Version your model

Learn about the pins package for Python and for R

Python
R

from pins import board_temp
from vetiver import vetiver_pin_write

board = board_temp(allow_pickle_read = True)
vetiver_pin_write(board, v)
#> Model Cards provide a framework for transparent, responsible reporting. 
#>  Use the vetiver `.qmd` Quarto template as a place to start, 
#>  with vetiver.model_card()
#> Writing pin:
#> Name: 'inspection-result-python'
#> Version: 20230917T184302Z-0e8f6

library(pins)

board <- board_temp()
board |> vetiver_pin_write(v)
#> Creating new version '20230917T184302Z-977d3'
#> Writing to pin 'inspection-result-rstats'
#> 
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start

Your turn 🏺

Activity

Pin your vetiver model object to a temporary board.

Retrieve the model metadata with pin_meta().

05:00

Posit Connect

Posit Connect is a publishing platform for data science
For Python, generate an API key: https://docs.posit.co/connect/user/api-keys/
For R, set up publishing from RStudio: https://docs.posit.co/connect/user/publishing/

Version your model

R

library(pins)
library(vetiver)

board <- board_temp()
v <- vetiver_model(inspection_fit, "inspection-result-rstats")
board |> vetiver_pin_write(v)

Python

from pins import board_temp
from vetiver import VetiverModel, vetiver_pin_write

board = board_temp(allow_pickle_read = True)
v = VetiverModel(inspection_fit, "inspection-result-python", prototype_data = X_train)
vetiver_pin_write(board, v)

Version your model

R

library(pins)
library(vetiver)

board <- board_connect()
v <- vetiver_model(inspection_fit, "julia.silge/inspection-result-rstats")
board |> vetiver_pin_write(v)

Python

from pins import board_connect
from vetiver import VetiverModel, vetiver_pin_write
from dotenv import load_dotenv
load_dotenv()

board = board_connect(allow_pickle_read = True)
v = VetiverModel(inspection_fit, "isabel.zimmerman/inspection-result-python", prototype_data = X_train)
vetiver_pin_write(board, v)

Your turn 🏺

Activity

Either:

Set up Connect publishing from RStudio.
Create an API key for your Posit Connect server, and save it on Workbench in your working directory (in .Renviron for R or .env for Python).

Create a new vetiver model object that includes your username, and pin this vetiver model to your Connect instance.

Visit your pin’s homepage on Connect.

Train your model again, using a different ML algorithm (decision tree or random forest are good options).

Write this new version of your model to the same pin, and see what versions you have with pin_versions.

10:00