03:00
Deploy and maintain models with vetiver
Welcome!
Wi-Fi network name
Posit Conf 2023
Wi-Fi password
conf2023
You have intermediate R or Python knowledge
You can read data from CSV and other flat files, transform and reshape data, and make a wide variety of graphs
You can fit a model to data with your modeling framework of choice wide variety of graphs
You have exposure to basic modeling and machine learning practice
You do not need expert familiarity with advanced ML or MLOps topics
🟪 “I’m stuck and need help!”
🟩 “I finished the exercise”
If you prefer, post on GitHub Discussions for help:
Optional
Post an introduction on GitHub Discussions: https://github.com/posit-conf-2023/vetiver/discussions
Illustration credit: https://vas3k.com/blog/machine_learning/
Illustration credit: Chapter 1 of Tidy Modeling with R
a set of practices to deploy and maintain machine learning models in production reliably and efficiently
Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.
you can operationalize that model!
you likely should be the one to operationalize that model!
Activity
What language does your team use for machine learning?
What kinds of models do you commonly use?
Have you ever deployed a model?
03:00
vetiver2023
class-work
in the vetiver
directoryActivity
Log in at pos.it/class with the identifier vetiver2023
Start a new session, either RStudio or VS Code.
In your new session, open the folder class-work
in the vetiver
directory, and choose the first Quarto file!
05:00
Image by Christopher Alvarenga
N = 6967
results
facility_type
is a nominal predictorrisk
is a nominal (or maybe ordinal) predictortotal_violations
is a numeric predictorinspection_date
is a date predictorresults | inspection_date | aka_name | facility_type | risk | total_violations |
---|---|---|---|---|---|
PASS | 2019-10-16 | ORIGINAL MAXWELL STREET GRILL | RESTAURANT | RISK 2 (MEDIUM) | 30 |
PASS | 2022-10-20 | THE DAILY GRIND | RESTAURANT | RISK 2 (MEDIUM) | 0 |
FAIL | 2020-03-05 | KRISPY'S SEA FOOD AND CHICKEN | RESTAURANT | RISK 2 (MEDIUM) | 25 |
FAIL | 2020-05-18 | SLIM'S | RESTAURANT | RISK 1 (HIGH) | 41 |
FAIL | 2019-11-21 | FOOD STOP | GROCERY STORE | RISK 3 (LOW) | 5 |
FAIL | 2019-05-30 | METROPOLITAN WATER RECLAMATION | RESTAURANT | RISK 1 (HIGH) | 18 |
FAIL | 2019-11-07 | STOCKTON | RESTAURANT | RISK 3 (LOW) | 0 |
PASS | 2020-08-25 | U.B. DOGS | RESTAURANT | RISK 1 (HIGH) | 28 |
PASS | 2022-08-10 | BOXCAR BETTYS | RESTAURANT | RISK 1 (HIGH) | 5 |
PASS | 2021-06-25 | SIZZLIN SKILLETS | RESTAURANT | RISK 1 (HIGH) | 18 |
PASS | 2020-10-27 | FINEST FOOD & SUBS | GROCERY STORE | RISK 2 (MEDIUM) | 30 |
PASS | 2020-10-07 | CARNITAS & TACOS MARAVATIO INC. | RESTAURANT | RISK 1 (HIGH) | 40 |
PASS | 2020-08-27 | DAN'S HOT DOG STAND | RESTAURANT | RISK 2 (MEDIUM) | 56 |
PASS | 2022-12-05 | MURA MURA RAMEN | RESTAURANT | RISK 1 (HIGH) | 8 |
PASS | 2022-11-28 | TAQUERIA LOS GALLOS | RESTAURANT | RISK 1 (HIGH) | 107 |
Activity
Explore the inspections
data on your own!
results
?total_violations
?Share something you noticed with your neighbor.
08:00
library(tidyverse)
inspections |>
group_by(inspection_date = floor_date(inspection_date, unit = "month")) |>
summarise(results = mean(results == "PASS")) |>
ggplot(aes(inspection_date, results)) +
geom_line(alpha = 0.8, linewidth = 1.5) +
scale_y_continuous(labels = scales::percent) +
labs(y = "% of inspections that have a PASS result", x = NULL)
from sklearn import model_selection
import numpy as np
np.random.seed(123)
inspections['inspection_date'] = pd.to_datetime(inspections['inspection_date'])
inspections['month'] = inspections['inspection_date'].dt.month
inspections['year'] = inspections['inspection_date'].dt.year
X, y = inspections.drop(columns=['aka_name', 'results', 'inspection_date']), inspections['results']
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y,
test_size = 0.2
)
Or your model of choice!
from sklearn import preprocessing, linear_model, pipeline, compose
categorical_features = ['facility_type', 'risk', 'month']
oe = compose.make_column_transformer(
(preprocessing.OrdinalEncoder(
handle_unknown="use_encoded_value", unknown_value=-1),
categorical_features,),
remainder="passthrough",
).fit(X_train)
lr = linear_model.LogisticRegression().fit(oe.transform(X_train), y_train)
inspection_fit = pipeline.Pipeline([("ordinal_encoder", oe), ("random_forest", lr)])
Activity
Split your data in training and testing.
Fit a model to your training data.
05:00
Activity
Create your vetiver model object.
Check out the default description
that is created, and try out using a custom description.
Show your custom description to your neighbor.
05:00
The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of pin boards, including:
Learn about the pins package for Python and for R
from pins import board_temp
from vetiver import vetiver_pin_write
board = board_temp(allow_pickle_read = True)
vetiver_pin_write(board, v)
#> Model Cards provide a framework for transparent, responsible reporting.
#> Use the vetiver `.qmd` Quarto template as a place to start,
#> with vetiver.model_card()
#> Writing pin:
#> Name: 'inspection-result-python'
#> Version: 20230917T184302Z-0e8f6
library(pins)
board <- board_temp()
board |> vetiver_pin_write(v)
#> Creating new version '20230917T184302Z-977d3'
#> Writing to pin 'inspection-result-rstats'
#>
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start
Activity
Pin your vetiver model object to a temporary board.
Retrieve the model metadata with pin_meta()
.
05:00
from pins import board_connect
from vetiver import VetiverModel, vetiver_pin_write
from dotenv import load_dotenv
load_dotenv()
board = board_connect(allow_pickle_read = True)
v = VetiverModel(inspection_fit, "isabel.zimmerman/inspection-result-python", prototype_data = X_train)
vetiver_pin_write(board, v)
Activity
Either:
.Renviron
for R or .env
for Python).Create a new vetiver model object that includes your username, and pin this vetiver model to your Connect instance.
Visit your pin’s homepage on Connect.
Train your model again, using a different ML algorithm (decision tree or random forest are good options).
Write this new version of your model to the same pin, and see what versions you have with pin_versions
.
10:00