Posit Conf 2024
Shannon Pileggi’s talk on Context is King (labelling)
- Label fields in dataframes to provide context. Useful for reproducibility and business continuity.
- The labelled package can help with this.
- But also labels are just attributes so can be done with
attr(). - Labels are supported in other packages like ggplot2 and GT
- Use ggeasy to add labels to plots.
- Use a metadata.csv file to add labels programmatically.
- Use labelled::generate_dictionary() to create a data dictionary - which can then be saved as a table in your dB.
Automating package purpose documentation
Annotater package can automatically give comments for the packages required.
Works with pacman.
Tidypredict and orbital
Orbital takes a finished model and decomposes it to a series of instructions (an Orbital object) that is somewhat portable. It minimises the number of variables and process in the model, and works with tree or linear - based models. It also includes the pre - processing steps from a tidymodels workflow, including recipes etc.
The orbital object can be implemented inside a database (MySQL, duckdb) as a SQL statement using the orbital_sql() function. Hence predictions can be made inside the database without using R or other data science languages.
The orbital object can also be converted to an R function and used to, for example, drive a shiny app without the overhead of running a full model in the background. Similarly you can convert that R function to javascript, python etc to use in other applications.
This is an interesting approach to model portability and deployment, allowing for efficient predictions without needing the full model context. This could be implemented in a duckdb database to drive an evidence app for example.