TLDR; The source database for the LNRS Data Model is a DuckDB database created from R scripts that process and normalize biodiversity data. The database serves as the source of truth for the LNRS toolkit, and should enable future changes to the data model and data in a structured way.

The data for the LNRS toolkit originates from a fairly messy spreadsheet structure which poured out of the brain of the PM. It captures relationships between biodiversity priorities, areas, measures (actions) and grants.

To create a suitably structured set of datasets to build a toolkit (app) to support farmers and landowners in identifying measures to improve their land, lots of data wrangling was required to ingest the data into tables, clean, reshape and export as CSV files and GIS data for the areas.

My weapon of choice for this was R as I can iterate confidently and respond to change requests from the PM. There was significant pressure to deliver the app by November 2024, so there was some tolerance of a somewhat sub optimal process.

As the app approaches maturity, there is a need to formalise the data model, for future - me and ultimately to enable changes to the data to be implemented in a sane and structured way.

The tables generated by the R scripts were used to create a DuckDB database which is the source for the LNRS Data Model. The idea is that the DuckDB database will be the source of truth for the data model, and any changes to the data will be made in the DuckDB database.

I used Perplexity to help decompose the single large table into constituent tables in 3rd normal form. This normalization process involved identifying functional dependencies, removing transitive dependencies, and ensuring each table represents a single entity or relationship.

Once the database was built and populated I created an Entity Relationship Diagram (ERD) using Mermaid syntax. This ERD is a visual representation of the database structure, showing the relationships between the tables. Google Gemini helped with this - I needed to unquote some of the column names in the schema I fed it, otherwise it thought those were comments and didn’t render properly.

Mermaid ERD


<
Previous Post
Corey Schafer’s UV Video
>
Next Post
Workflow for diagrams with AI