1 - Introduction

Deploy and maintain models with vetiver

Welcome!

Wi-Fi network name

Posit Conf 2023

Wi-Fi password

conf2023

Welcome!

  • There are gender-neutral bathrooms located among the Grand Suite Bathrooms
  • There are two meditation/prayer rooms: Grand Suite 2A and Grand Suite 2B
    • Open Sunday - Tuesday 7:30 AM - 7:00 PM, Wednesday 8:00 AM - 6:00 PM
  • The lactation room is located in Grand Suite 1
    • Open Sunday - Tuesday 7:30 AM - 7:00 PM, Wednesday 8:00 AM - 6:00 PM
  • Participants who do not wish to be photographed have red lanyards; please note everyone’s lanyard colors before taking a photo and respect their choices
  • The Code of Conduct and COVID policies can be found at https://posit.co/code-of-conduct/
    • Please review them carefully! ❤️
    • You can report Code of Conduct violations in person, by email, or by phone; see the policy linked above for contact information

Who are you?

  • You have intermediate R or Python knowledge

  • You can read data from CSV and other flat files, transform and reshape data, and make a wide variety of graphs

  • You can fit a model to data with your modeling framework of choice wide variety of graphs

  • You have exposure to basic modeling and machine learning practice

  • You do not need expert familiarity with advanced ML or MLOps topics

Who are we?

Asking for help

🟪 “I’m stuck and need help!”

🟩 “I finished the exercise”

If you prefer, post on GitHub Discussions for help:

https://github.com/posit-conf-2023/vetiver/discussions

Plan for this workshop

  • Versioning
    • Managing change in models ✅
  • Deploying
    • Putting models in REST APIs 🎯
  • Monitoring
    • Tracking model performance 👀

Introduce yourself to your neighbors 👋

Optional

Post an introduction on GitHub Discussions: https://github.com/posit-conf-2023/vetiver/discussions

What is machine learning?

What is machine learning?

MLOps is…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

MLOps with vetiver

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should be the one to operationalize that model!

Your turn 🏺

Activity

What language does your team use for machine learning?

What kinds of models do you commonly use?

Have you ever deployed a model?

03:00

Workshop infrastructure

  • Log in at pos.it/class with the identifier vetiver2023
  • Even if you plan to work locally, set this up with us so you can use Posit Connect as a deployment target
  • For Posit Workbench, use RStudio for R or VS Code for Python
  • Open the folder class-work in the vetiver directory

Your turn 🏺

Activity

Log in at pos.it/class with the identifier vetiver2023

Start a new session, either RStudio or VS Code.

In your new session, open the folder class-work in the vetiver directory, and choose the first Quarto file!

05:00

Chicago food inspections data

  • The city of Chicago offers programmatic access to health code inspections of restaurants
  • Can certain measurements be used to predict inspection outcome?
  • Data from Chicago Department of Public Health, available at https://data.cityofchicago.org/

Inspection results

  • N = 6967
  • A class outcome, results
  • Other variables to use for prediction:
    • facility_type is a nominal predictor
    • risk is a nominal (or maybe ordinal) predictor
    • total_violations is a numeric predictor
    • inspection_date is a date predictor

R

library(arrow)
path <- here::here("data", "inspections.parquet")
inspections <- read_parquet(path)

Python

import pandas as pd
inspections = pd.read_parquet('../data/inspections.parquet')

Inspection results

results inspection_date aka_name facility_type risk total_violations
PASS 2019-10-16 ORIGINAL MAXWELL STREET GRILL RESTAURANT RISK 2 (MEDIUM) 30
PASS 2022-10-20 THE DAILY GRIND RESTAURANT RISK 2 (MEDIUM) 0
FAIL 2020-03-05 KRISPY'S SEA FOOD AND CHICKEN RESTAURANT RISK 2 (MEDIUM) 25
FAIL 2020-05-18 SLIM'S RESTAURANT RISK 1 (HIGH) 41
FAIL 2019-11-21 FOOD STOP GROCERY STORE RISK 3 (LOW) 5
FAIL 2019-05-30 METROPOLITAN WATER RECLAMATION RESTAURANT RISK 1 (HIGH) 18
FAIL 2019-11-07 STOCKTON RESTAURANT RISK 3 (LOW) 0
PASS 2020-08-25 U.B. DOGS RESTAURANT RISK 1 (HIGH) 28
PASS 2022-08-10 BOXCAR BETTYS RESTAURANT RISK 1 (HIGH) 5
PASS 2021-06-25 SIZZLIN SKILLETS RESTAURANT RISK 1 (HIGH) 18
PASS 2020-10-27 FINEST FOOD & SUBS GROCERY STORE RISK 2 (MEDIUM) 30
PASS 2020-10-07 CARNITAS & TACOS MARAVATIO INC. RESTAURANT RISK 1 (HIGH) 40
PASS 2020-08-27 DAN'S HOT DOG STAND RESTAURANT RISK 2 (MEDIUM) 56
PASS 2022-12-05 MURA MURA RAMEN RESTAURANT RISK 1 (HIGH) 8
PASS 2022-11-28 TAQUERIA LOS GALLOS RESTAURANT RISK 1 (HIGH) 107

Your turn 🏺

Activity

Explore the inspections data on your own!

  • What’s the distribution of the outcome results?
  • What’s the distribution of the numeric variable total_violations?
  • How do results differ across facility type?

Share something you noticed with your neighbor.

08:00

library(tidyverse)
inspections |> 
  group_by(inspection_date = floor_date(inspection_date, unit = "month")) |> 
  summarise(results = mean(results == "PASS")) |> 
  ggplot(aes(inspection_date, results)) +
  geom_line(alpha = 0.8, linewidth = 1.5) +
  scale_y_continuous(labels = scales::percent) +
  labs(y = "% of inspections that have a PASS result", x = NULL)

from plotnine import ggplot, aes, geom_boxplot, coord_flip
(ggplot(inspections, aes('facility_type', 'total_violations', fill = 'facility_type')) 
  + geom_boxplot(alpha = 0.5, show_legend = False)
  + coord_flip()
)
#> <Figure Size: (640 x 480)>

inspections |>
  ggplot(aes(inspection_date, total_violations, z = as.integer(results) - 1)) +
  stat_summary_hex(alpha = 0.7) +
  scale_fill_viridis_c(labels = scales::percent) +
  labs(fill = "% passed", x = NULL)

Time for building a model!

Spend your data budget

R

library(tidymodels)
set.seed(123)
inspect_split <- initial_split(inspections, prop = 0.8)
inspect_train <- training(inspect_split)
inspect_test <- testing(inspect_split)

Python

from sklearn import model_selection
import numpy as np
np.random.seed(123)
inspections['inspection_date'] = pd.to_datetime(inspections['inspection_date'])
inspections['month'] = inspections['inspection_date'].dt.month
inspections['year'] = inspections['inspection_date'].dt.year
X, y = inspections.drop(columns=['aka_name', 'results', 'inspection_date']), inspections['results']
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size = 0.2
)

Fit a logistic regression model 🚀

Or your model of choice!

inspection_rec <- 
  recipe(results ~ facility_type + risk + total_violations + inspection_date, 
         data = inspect_train) |> 
  step_date(inspection_date, features = c("month", "year"), keep_original_cols = FALSE)

inspection_fit <-
  workflow(inspection_rec, logistic_reg()) |> 
  fit(data = inspect_train)
from sklearn import preprocessing, linear_model, pipeline, compose

categorical_features = ['facility_type', 'risk', 'month']

oe = compose.make_column_transformer(
    (preprocessing.OrdinalEncoder(
            handle_unknown="use_encoded_value", unknown_value=-1),
        categorical_features,),
    remainder="passthrough",
).fit(X_train)
lr = linear_model.LogisticRegression().fit(oe.transform(X_train), y_train)
inspection_fit = pipeline.Pipeline([("ordinal_encoder", oe), ("random_forest", lr)])

Your turn 🏺

Activity

Split your data in training and testing.

Fit a model to your training data.

05:00

Create a deployable bundle

Create a deployable model object

R

library(vetiver)
v <- vetiver_model(inspection_fit, "inspection-result-rstats")
v
#> 
#> ── inspection-result-rstats ─ <bundled_workflow> model for deployment 
#> A glm classification modeling workflow using 4 features

Python

from vetiver import VetiverModel
v = VetiverModel(inspection_fit, "inspection-result-python", prototype_data = X_train)
v.description
#> 'A scikit-learn Pipeline model'

Deploy preprocessors and models together

What is wrong with this?

Your turn 🏺

Activity

Create your vetiver model object.

Check out the default description that is created, and try out using a custom description.

Show your custom description to your neighbor.

05:00

Version your model

pins 📌

The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.

You can pin objects to a variety of pin boards, including:

  • a local folder (like a network drive or even a temporary directory)
  • Posit Connect
  • Amazon S3
  • Azure Storage
  • Google Cloud

Version your model

Learn about the pins package for Python and for R

from pins import board_temp
from vetiver import vetiver_pin_write

board = board_temp(allow_pickle_read = True)
vetiver_pin_write(board, v)
#> Model Cards provide a framework for transparent, responsible reporting. 
#>  Use the vetiver `.qmd` Quarto template as a place to start, 
#>  with vetiver.model_card()
#> Writing pin:
#> Name: 'inspection-result-python'
#> Version: 20230917T184302Z-0e8f6
library(pins)

board <- board_temp()
board |> vetiver_pin_write(v)
#> Creating new version '20230917T184302Z-977d3'
#> Writing to pin 'inspection-result-rstats'
#> 
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start

Your turn 🏺

Activity

Pin your vetiver model object to a temporary board.

Retrieve the model metadata with pin_meta().

05:00

Posit Connect

Version your model

R

library(pins)
library(vetiver)

board <- board_temp()
v <- vetiver_model(inspection_fit, "inspection-result-rstats")
board |> vetiver_pin_write(v)

Python

from pins import board_temp
from vetiver import VetiverModel, vetiver_pin_write

board = board_temp(allow_pickle_read = True)
v = VetiverModel(inspection_fit, "inspection-result-python", prototype_data = X_train)
vetiver_pin_write(board, v)

Version your model

R

library(pins)
library(vetiver)

board <- board_connect()
v <- vetiver_model(inspection_fit, "julia.silge/inspection-result-rstats")
board |> vetiver_pin_write(v)

Python

from pins import board_connect
from vetiver import VetiverModel, vetiver_pin_write
from dotenv import load_dotenv
load_dotenv()

board = board_connect(allow_pickle_read = True)
v = VetiverModel(inspection_fit, "isabel.zimmerman/inspection-result-python", prototype_data = X_train)
vetiver_pin_write(board, v)

Your turn 🏺

Activity

Either:

  • Set up Connect publishing from RStudio.
  • Create an API key for your Posit Connect server, and save it on Workbench in your working directory (in .Renviron for R or .env for Python).

Create a new vetiver model object that includes your username, and pin this vetiver model to your Connect instance.

Visit your pin’s homepage on Connect.

Train your model again, using a different ML algorithm (decision tree or random forest are good options).

Write this new version of your model to the same pin, and see what versions you have with pin_versions.

10:00