Bristol Apprenticeship Starts Analysis

Time Series Analysis by Learner Home Location

Author

Generated by Databot

Published

October 31, 2025

Note

This report was generated using artificial intelligence (Claude from Anthropic) under general human direction. At the time of generation, the contents have not been comprehensively reviewed by a human analyst.

Meta Information:

This document was generated by Databot using the following prompt: Produce a time series plot of apprenticeship starts delivered in Bristol, broken down by learner home local authority district (LAD). Use the WECA apprenticeship starts dataset in data/weca_apprenticeship_starts.csv.

Databot is a tool that leverages large language models to generate data analyses based on user prompts. It runs in the Positron IDE. It is primarily intended to support EDA and initial analysis tasks. The product is BYOK (Bring Your Own Key) and uses the Anthropic API key provided in the Positron Assistant setup.

The assistant starts by running exploratory data analysis to understand the dataset structure, followed by filtering and aggregating the data as specified in the prompt. It then generates visualizations and summary statistics to highlight key findings.

There are a few \commands provided. \report summarises the EDA into a quarto document - as shown below. This seems like a pretty powerful analysis model for quick initial data exploration.

Introduction

This analysis examines apprenticeship starts delivered in Bristol, with particular focus on understanding the geographic distribution of learners’ home locations. The analysis aims to reveal cross-boundary learning patterns and temporal trends in apprenticeship uptake across the West of England Combined Authority (WECA) region.

Data Source

The analysis uses the WECA apprenticeship starts dataset (weca_apprenticeship_starts.csv), which contains detailed information about apprenticeship programs including:

  • Provider and delivery location details
  • Learner home location information
  • Apprenticeship characteristics (level, subject area, etc.)
  • Start dates and demographic information
Code
library(tidyverse)
library(lubridate)

# Load the apprenticeship data
apprenticeship_data <- read_csv("data/weca_apprenticeship_starts.csv")

# Display basic dataset information
cat("Dataset dimensions:", nrow(apprenticeship_data), "rows x", ncol(apprenticeship_data), "columns\n")
Dataset dimensions: 24925 rows x 47 columns

Analysis

Data Preparation

We focus specifically on apprenticeships delivered in Bristol and analyze them by the learner’s home local authority district (LAD).

Code
# Filter for Bristol delivery locations
bristol_data <- apprenticeship_data |> 
  filter(delivery_la == "Bristol")

cat("Number of apprenticeship starts delivered in Bristol:", nrow(bristol_data), "\n\n")
Number of apprenticeship starts delivered in Bristol: 8729 
Code
# Show the distribution of learner home locations
cat("Distribution by learner home LAD:\n")
Distribution by learner home LAD:
Code
bristol_data |> 
  count(learner_home_lad, sort = TRUE) |>
  mutate(percentage = round(n / sum(n) * 100, 1)) |>
  print()
# A tibble: 4 × 3
  learner_home_lad                 n percentage
  <chr>                        <int>      <dbl>
1 Bristol, City of              5331       61.1
2 South Gloucestershire         2107       24.1
3 North Somerset                 832        9.5
4 Bath and North East Somerset   459        5.3

Time Series Analysis

We create a monthly time series to examine temporal patterns in apprenticeship starts across different learner home locations.

Code
# Create monthly time series data grouped by learner_home_lad
monthly_data <- bristol_data |>
  mutate(year_month = floor_date(start_date, "month")) |>
  group_by(year_month, learner_home_lad) |>
  summarise(total_starts = sum(starts, na.rm = TRUE), .groups = "drop") |>
  arrange(year_month)

# Show data range
cat("Time series covers:", min(monthly_data$year_month), "to", max(monthly_data$year_month))
Time series covers: 17897 to 19327

Visualization

Code
# Create the time series plot
ggplot(monthly_data, aes(x = year_month, y = total_starts, color = learner_home_lad)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2, alpha = 0.7) +
  labs(
    title = "Apprenticeship Starts in Bristol by Learner Home Location",
    subtitle = "Monthly time series showing apprenticeships delivered in Bristol",
    x = "Date",
    y = "Number of Apprenticeship Starts",
    color = "Learner Home LAD"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 12),
    legend.position = "bottom",
    legend.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) +
  scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
  scale_color_viridis_d(option = "plasma", end = 0.8)

Summary Statistics

Code
# Calculate summary statistics by learner home location
summary_stats <- monthly_data |>
  group_by(learner_home_lad) |>
  summarise(
    mean_monthly_starts = round(mean(total_starts), 1),
    median_monthly_starts = median(total_starts),
    max_monthly_starts = max(total_starts),
    min_monthly_starts = min(total_starts),
    total_starts = sum(total_starts),
    .groups = "drop"
  ) |>
  arrange(desc(total_starts))

summary_stats
# A tibble: 4 × 6
  learner_home_lad  mean_monthly_starts median_monthly_starts max_monthly_starts
  <chr>                           <dbl>                 <dbl>              <dbl>
1 Bristol, City of                123.                   114.                318
2 South Gloucester…                48.3                   39                 155
3 North Somerset                   18.2                   15                  61
4 Bath and North E…                10.1                    8                  33
# ℹ 2 more variables: min_monthly_starts <dbl>, total_starts <dbl>

Key Findings

Geographic Distribution

The analysis reveals clear patterns in cross-boundary apprenticeship uptake:

  • Bristol, City of accounts for 61.1% of all apprenticeships delivered in Bristol
  • South Gloucestershire is the second largest source, representing 24.1% of learners
  • Significant cross-boundary movement occurs, with 38.9% of apprentices traveling from neighboring authorities

Temporal Patterns

The time series analysis shows:

  • Consistent seasonal variation across all learner home locations
  • Similar temporal patterns suggest external factors (potentially including COVID-19 impacts) affecting apprenticeship starts region-wide
  • Bristol maintains its position as the dominant source throughout the time period

Cross-Boundary Learning

The data demonstrates Bristol’s role as a regional hub for apprenticeship training, attracting learners from across the West of England area. This pattern reflects:

  • Bristol’s position as the major economic center in the region
  • Availability of diverse apprenticeship opportunities in the city
  • Strong transport links facilitating cross-boundary learning

Technical Details

Data Processing

The analysis involved:

  1. Filtering the full WECA dataset to include only Bristol delivery locations (8729 records)
  2. Aggregating individual apprenticeship records to monthly totals by learner home LAD
  3. Creating time series visualization using ggplot2 with appropriate color coding for different locations

Data Quality Notes

  • The dataset contains 24925 total records across the WECA region
  • All Bristol delivery records had valid learner home LAD information
  • Time series spans 3.9 years of data