Subway Metrics: MTA Subway Ridership Trends During COVID and Its Aftermath

Author

Dhruv Sharma

πŸš‡ Introduction

Since the COVID-19 pandemic began in early 2020, remote work has dramatically transformed commuting patterns in New York City. While subway ridership plummeted during lockdowns, its recovery has been uneven β€” with weekday traffic still lagging behind pre-pandemic levels. As hybrid work becomes the new norm, transit agencies face mounting pressure to understand these shifts and adjust services accordingly.

This report explores the question:

🧠 How has the rise of remote work since COVID-19 influenced subway ridership patterns across NYC by time and geography?

We use cleaned hourly MTA ridership data (2020–2023), ZIP-level census data on remote work (2019 & 2023), and official MTA station-level annual reports. Our goal is to quantify where and how ridership has rebounded β€” or failed to β€” and what role the remote work revolution played in shaping this trend.

To answer this overarching question, our team tackled four specific subquestions:

  1. When did subway ridership recover?
    Did weekday vs weekend patterns evolve across Pre-COVID, Core COVID, and WFH eras?

  2. Where did remote work increase the most?
    How do ZIP-level WFH shifts relate to ridership decline?

  3. Which stations suffered most?
    Which locations experienced the steepest and most persistent drops?

  4. Can we model the relationship?
    Does growth in remote work statistically predict subway ridership loss?

Each section that follows answers one of these questions, then we bring it all together in a final synthesis to reflect on the MTA’s future in a remote-first world.

πŸ” Why Preload All Datasets?

To improve reproducibility and speed up rendering, we preloaded all cleaned datasets at once. This avoids rerunning large cleaning steps and lets each analysis section focus on insights, not reprocessing. All raw-to-clean transformations were handled separately and saved as .csv or .rds files, keeping the report modular and render-friendly.

Code
# Suppress installation output and load libraries
suppressMessages({
  suppressWarnings({
    required_packages <- c(
      "tidyverse", "readr", "janitor", "lubridate", "data.table",
      "sf", "scales", "readxl",
      "ggplot2", "ggthemes", "gganimate", "viridis", "scico",
      "broom", "kableExtra", "plotly"
    )
    to_install <- setdiff(required_packages, rownames(installed.packages()))
    if (length(to_install)) install.packages(to_install, repos = "https://cloud.r-project.org")
    invisible(lapply(required_packages, library, character.only = TRUE))
  })
})

# Ensure required folders exist
if (!dir.exists("data/raw")) dir.create("data/raw", recursive = TRUE)
if (!dir.exists("data/cleaned")) dir.create("data/cleaned", recursive = TRUE)

# Load pre-cleaned datasets
mta_hourly_filtered <- readRDS("data/cleaned/mta_hourly_filtered_2020_2023.rds")
acs_2019 <- read_csv("data/acs_b08128_2019.csv") |> clean_names()
acs_2023 <- read_csv("data/acs_b08128_2023.csv") |> clean_names()
acs_joined <- read_csv("data/cleaned/acs_joined.csv")
mta_zip_summary <- read_csv("data/cleaned/mta_zip_summary.csv")

# Load and clean 2023 MTA station-level data
mta_2023 <- read_excel("data/mta_2023_ridership.xlsx", sheet = "Annual Total", skip = 1) |> clean_names()
mta_2023_clean <- mta_2023 |> 
  select(
    station = station_alphabetical_by_borough,
    borough = boro,
    `2019` = x2019, 
    `2020` = x2020,
    `2021` = x2021, 
    `2022` = x2022, 
    `2023` = x2023
  ) |> 
  filter(!is.na(station)) |> 
  filter(!str_detect(station, "(?i)weekday|weekend|total|average")) |> 
  mutate(borough = case_when(
    borough == "Bx" ~ "Bronx",
    borough == "M"  ~ "Manhattan",
    borough == "B"  ~ "Brooklyn",
    borough == "Q"  ~ "Queens",
    borough == "SI" ~ "Staten Island",
    TRUE ~ borough
  ))

# Load ZIP shapefile
zip_shapefile <- st_read("data/tl_2020_us_zcta510.shp", quiet = TRUE)

# Table styling function
mta_table_style <- function(tbl, caption_text, digits = NULL, highlight_column = NULL) {
  kbl <- if (!is.null(digits)) {
    kable(tbl, format = "html", caption = caption_text, digits = digits, escape = FALSE)
  } else {
    kable(tbl, format = "html", caption = caption_text, escape = FALSE)
  }

  styled <- kbl %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive", "bordered"),
      full_width = FALSE, position = "center", font_size = 13
    ) %>%
    row_spec(0, background = "#0039A6", color = "white", bold = TRUE)

  if (!is.null(highlight_column) && highlight_column %in% colnames(tbl)) {
    borough_colors <- c(
      "Manhattan" = "#EE352E",
      "Bronx" = "#00933C",
      "Brooklyn" = "#0039A6",
      "Queens" = "#FF6319",
      "Staten Island" = "#808183"
    )
    styled <- styled %>%
      column_spec(which(colnames(tbl) == highlight_column),
        background = borough_colors[tbl[[highlight_column]]], color = "white"
      )
  }

  return(styled)
}
Code
# Display sample rows of ACS remote work data
acs_joined %>%
  head(20) %>%
  mta_table_style("Sample of Remote Work Data by ZIP Code")
Sample of Remote Work Data by ZIP Code
zip_code total_2019 wfh_2019 total_2023 wfh_2023 wfh_rate_2019 wfh_rate_2023 wfh_shift
06390 75 0 30 4 0.0000000 0.1333333 0.1333333
10001 15060 1356 17539 1237 0.0900398 0.0705285 -0.0195113
10002 32709 2683 32990 2305 0.0820264 0.0698697 -0.0121567
10003 31668 2913 31771 2218 0.0919856 0.0698121 -0.0221735
10004 2384 123 2537 230 0.0515940 0.0906583 0.0390643
10005 6773 124 7510 110 0.0183080 0.0146471 -0.0036609
10006 2329 111 3040 8 0.0476599 0.0026316 -0.0450284
10007 4120 297 4897 373 0.0720874 0.0761691 0.0040817
10009 30825 2216 30672 1546 0.0718897 0.0504043 -0.0214854
10010 21721 1449 18227 1227 0.0667096 0.0673177 0.0006081
10011 32881 2752 30533 2469 0.0836958 0.0808633 -0.0028324
10012 14497 1029 14136 1412 0.0709802 0.0998868 0.0289066
10013 15648 1292 15202 1088 0.0825665 0.0715695 -0.0109969
10014 20428 1930 19679 1379 0.0944782 0.0700747 -0.0244035
10016 36368 2041 36491 2070 0.0561208 0.0567263 0.0006056
10017 10761 790 10164 1065 0.0734133 0.1047816 0.0313683
10018 6163 651 5677 369 0.1056304 0.0649991 -0.0406313
10019 29512 1918 27964 1962 0.0649905 0.0701616 0.0051711
10020 0 0 0 0 NA NA NA
10021 24306 1997 22726 1329 0.0821608 0.0584793 -0.0236815
Code
# Display sample rows of cleaned MTA station-level ridership
mta_2023_clean %>%
  head(20) %>%
  mta_table_style("MTA 2023 Station-Level Ridership Snapshot")
MTA 2023 Station-Level Ridership Snapshot
station borough 2019 2020 2021 2022 2023
The Bronx NA NA NA NA NA NA
138 St-Grand Concourse (4,5) Bronx 1035878 371408.0 656866 766610 785271
149 St-Grand Concourse (2,4,5) Bronx 3931908 1815785.0 1832521 2026363 2087779
161 St-Yankee Stadium (B,D,4) Bronx 8254928 3221651.0 4077604 5023193 5316351
167 St (4) Bronx 2653237 1396287.0 1615072 1847368 1901393
167 St (B,D) Bronx 2734530 1422149.0 1508270 1492833 1411144
170 St (4) Bronx 2487611 1265950.0 1278506 1499662 1448193
170 St (B,D) Bronx 2130461 1002094.9 1104637 1121869 1001022
174 St (2,5) Bronx 2057118 953564.1 1077126 1140821 1118781
174-175 Sts (B,D) Bronx 1518260 788121.0 853579 859988 774167
176 St (4) Bronx 1713696 876865.0 939585 1055833 1041352
182-183 Sts (B,D) Bronx 1513443 761613.0 812994 866961 824845
183 St (4) Bronx 1779224 951634.0 1051456 1196389 1188844
219 St (2,5) Bronx 979390 457388.0 495442 530795 490047
225 St (2,5) Bronx 1187486 549296.1 605491 621548 589541
231 St (1) Bronx 2919305 1289691.0 1462605 1810807 1894047
233 St (2,5) Bronx 1445532 721495.0 796596 845998 845056
238 St (1) Bronx 1204095 588199.1 678017 872799 959752
3 Av-138 St (6) Bronx 2503850 1271191.9 1359371 1503905 1750592
3 Av-149 St (2,5) Bronx 6768255 3166766.0 3301418 3330977 3333256
Code
# Display sample rows of ZIP-level ridership summary
mta_zip_summary %>%
  head(20) %>%
  mta_table_style("Ridership Change Summary by ZIP Code (2019 vs 2023)")
Ridership Change Summary by ZIP Code (2019 vs 2023)
zip_code ridership_2019 ridership_2023 ridership_pct_change
10001 105604522 67128854 -0.3643373
10002 22671234 17522699 -0.2270955
10003 39669660 27284600 -0.3122048
10004 18634716 10570891 -0.4327313
10005 14803279 8337401 -0.4367869
10006 8802040 6091650 -0.3079275
10007 16717072 10802028 -0.3538325
10009 5345371 5745700 0.0748927
10010 24364973 15093515 -0.3805240
10011 43525291 31529750 -0.2755993
10012 20552119 15366756 -0.2523031
10013 30240003 19892831 -0.3421684
10014 7909125 5567039 -0.2961245
10016 14769889 10047100 -0.3197579
10019 38009762 26687159 -0.2978867
10021 17350177 12710269 -0.2674271
10022 18957465 11339465 -0.4018470
10023 19447816 13857925 -0.2874303
10024 4745863 3519664 -0.2583722
10025 19775411 14165002 -0.2837063

Task 2: Cleaning NYC Subway Ridership Data

To prep the hourly MTA subway dataset for analysis, we: - Parsed timestamps from character format - Extracted calendar fields (year, month, hour, etc.) - Labeled COVID eras (Pre-COVID, Core COVID, WFH Era) - Classified weekdays vs weekends - Removed rows with negative ridership or transfer counts

The final dataset includes valid observations from 2020 to 2023 and is saved for reuse.

Code
# Load libraries
library(tidyverse)
library(lubridate)
library(janitor)

# Load hourly ridership data (filtered from 2020–2023)
mta_hourly <- readRDS("data/cleaned/mta_hourly_filtered_2020_2023.rds")

# 1. Parse datetime and derive temporal fields
mta_hourly <- mta_hourly |> 
  filter(!is.na(transit_timestamp) & str_detect(transit_timestamp, "^\\d{4}-\\d{2}-\\d{2}")) |> 
  mutate(
    transit_timestamp = gsub("\\.000", "", transit_timestamp),
    transit_timestamp = ymd_hms(transit_timestamp, quiet = TRUE),
    year = year(transit_timestamp),
    month = month(transit_timestamp, label = TRUE),
    day = day(transit_timestamp),
    hour = hour(transit_timestamp),
    weekday = wday(transit_timestamp, label = TRUE),
    date = as_date(transit_timestamp)
  )

# 2. Define COVID-era categories
mta_hourly <- mta_hourly |> 
  mutate(
    covid_era = case_when(
      year == 2019 ~ "Pre-COVID",
      year %in% c(2020, 2021) ~ "Core COVID",
      year %in% c(2022, 2023) ~ "WFH Era",
      TRUE ~ "Unknown"
    )
  )

# 3. Add weekday/weekend classification
mta_hourly <- mta_hourly |> 
  mutate(day_type = if_else(weekday %in% c("Sat", "Sun"), "Weekend", "Weekday"))

# 4. Filter to reasonable values
mta_hourly <- mta_hourly |> 
  filter(ridership >= 0, transfers >= 0)

# 5. Save cleaned version
write_csv(mta_hourly, "data/cleaned/mta_hourly_cleaned.csv")
saveRDS(mta_hourly, "data/cleaned/mta_hourly_cleaned.rds")
Code
# Show a sample
mta_hourly %>%
  head(20) %>%
  mta_table_style("🧾 Sample of Cleaned MTA Hourly Ridership Data")
🧾 Sample of Cleaned MTA Hourly Ridership Data
transit_timestamp transit_mode station_complex_id station_complex borough payment_method fare_class_category ridership transfers latitude longitude georeference year month day hour weekday date covid_era day_type
2022-02-12 01:00:00 subway 384 Burnside Av (4) Bronx omny OMNY - Full Fare 3 0 40.85345 -73.90768 POINT (-73.907684 40.853455) 2022 Feb 12 1 Sat 2022-02-12 WFH Era Weekend
2022-02-12 07:00:00 subway 388 167 St (4) Bronx metrocard Metrocard - Other 10 1 40.83554 -73.92140 POINT (-73.9214 40.835537) 2022 Feb 12 7 Sat 2022-02-12 WFH Era Weekend
2022-02-12 15:00:00 subway 295 231 St (1) Bronx omny OMNY - Full Fare 68 11 40.87886 -73.90483 POINT (-73.90483 40.878857) 2022 Feb 12 15 Sat 2022-02-12 WFH Era Weekend
2022-02-12 09:00:00 subway 68 Bay Pkwy (D) Brooklyn omny OMNY - Full Fare 33 1 40.60187 -73.99373 POINT (-73.99373 40.601875) 2022 Feb 12 9 Sat 2022-02-12 WFH Era Weekend
2022-02-12 01:00:00 subway 311 86 St (1) Manhattan omny OMNY - Full Fare 14 0 40.78864 -73.97622 POINT (-73.97622 40.788643) 2022 Feb 12 1 Sat 2022-02-12 WFH Era Weekend
2022-02-12 13:00:00 subway 52 Avenue U (Q) Brooklyn omny OMNY - Full Fare 66 9 40.59930 -73.95593 POINT (-73.95593 40.5993) 2022 Feb 12 13 Sat 2022-02-12 WFH Era Weekend
2022-02-12 10:00:00 subway 32 36 St (D,N,R) Brooklyn omny OMNY - Full Fare 69 5 40.65514 -74.00355 POINT (-74.00355 40.655144) 2022 Feb 12 10 Sat 2022-02-12 WFH Era Weekend
2022-02-12 02:00:00 subway 321 18 St (1) Manhattan omny OMNY - Full Fare 19 0 40.74104 -73.99787 POINT (-73.99787 40.74104) 2022 Feb 12 2 Sat 2022-02-12 WFH Era Weekend
2022-02-12 11:00:00 subway 244 Ditmas Av (F) Brooklyn omny OMNY - Full Fare 12 0 40.63612 -73.97817 POINT (-73.97817 40.63612) 2022 Feb 12 11 Sat 2022-02-12 WFH Era Weekend
2023-03-22 08:00:00 subway 438 135 St (2,3) Manhattan metrocard Metrocard - Unlimited 30-Day 132 0 40.81423 -73.94077 POINT (-73.94077 40.814228) 2023 Mar 22 8 Wed 2023-03-22 WFH Era Weekday
2023-05-19 21:00:00 subway 297 215 St (1) Manhattan metrocard Metrocard - Full Fare 6 0 40.86945 -73.91528 POINT (-73.915276 40.869446) 2023 May 19 21 Fri 2023-05-19 WFH Era Weekday
2023-03-22 21:00:00 subway 337 Nevins St (2,3,4,5) Brooklyn metrocard Metrocard - Seniors & Disability 9 1 40.68825 -73.98049 POINT (-73.98049 40.688248) 2023 Mar 22 21 Wed 2023-03-22 WFH Era Weekday
2023-05-19 15:00:00 subway 253 Neptune Av (F) Brooklyn metrocard Metrocard - Fair Fare 3 0 40.58101 -73.97457 POINT (-73.97457 40.581013) 2023 May 19 15 Fri 2023-05-19 WFH Era Weekday
2023-03-22 07:00:00 subway 437 145 St (3) Manhattan metrocard Metrocard - Fair Fare 24 0 40.82042 -73.93625 POINT (-73.93625 40.82042) 2023 Mar 22 7 Wed 2023-03-22 WFH Era Weekday
2022-02-12 20:00:00 subway 294 238 St (1) Bronx omny OMNY - Full Fare 14 1 40.88467 -73.90087 POINT (-73.90087 40.884666) 2022 Feb 12 20 Sat 2022-02-12 WFH Era Weekend
NA subway 223 Lexington Av/63 St (F,Q) Manhattan omny OMNY - Full Fare 64 14 40.76463 -73.96611 POINT (-73.96611 40.76463) NA NA NA NA NA NA Unknown Weekday
2022-02-12 15:00:00 subway 97 Myrtle Av (M,J,Z) Brooklyn omny OMNY - Full Fare 139 3 40.69721 -73.93565 POINT (-73.93565 40.69721) 2022 Feb 12 15 Sat 2022-02-12 WFH Era Weekend
2022-02-12 11:00:00 subway 351 Van Siclen Av (3) Brooklyn metrocard Metrocard - Full Fare 14 0 40.66545 -73.88940 POINT (-73.8894 40.665447) 2022 Feb 12 11 Sat 2022-02-12 WFH Era Weekend
2022-02-12 09:00:00 subway 20 City Hall (R,W) Manhattan metrocard Metrocard - Unlimited 7-Day 2 0 40.71328 -74.00698 POINT (-74.00698 40.713284) 2022 Feb 12 9 Sat 2022-02-12 WFH Era Weekend
2022-02-12 01:00:00 subway 266 Elmhurst Av (M,R) Queens omny OMNY - Full Fare 1 0 40.74245 -73.88202 POINT (-73.88202 40.742455) 2022 Feb 12 1 Sat 2022-02-12 WFH Era Weekend

Task 3: Aggregating Subway Ridership Trends by Time, Station, and ZIP

Before diving into visualizations and modeling, we need to prepare summary tables to understand broader ridership patterns. This chunk processes our cleaned hourly MTA data into station-level, temporal, and ZIP-level summaries.

Code
#Load libraries
library(tidyverse)
library(lubridate)

# Load cleaned hourly ridership data
#mta_hourly <- readRDS("data/cleaned/mta_hourly_cleaned.rds")

daily_ridership <- mta_hourly |> 
  group_by(station_complex, borough, date, covid_era, day_type) |> 
  summarise(daily_ridership = sum(ridership, na.rm = TRUE), .groups = "drop")

weekday_summary <- daily_ridership |> 
  group_by(covid_era, day_type) |> 
  summarise(avg_daily_riders = mean(daily_ridership), .groups = "drop")

station_era_summary <- daily_ridership |> 
  group_by(station_complex, covid_era) |> 
  summarise(mean_ridership = mean(daily_ridership), .groups = "drop") |> 
  pivot_wider(names_from = covid_era, values_from = mean_ridership) |> 
  filter(!is.na(`Core COVID`) & !is.na(`WFH Era`)) |> 
  mutate(
    pct_change_covid_to_wfh = (`WFH Era` - `Core COVID`) / `Core COVID`
  )

hourly_summary <- mta_hourly |> 
  group_by(covid_era, day_type, hour) |> 
  summarise(avg_hourly_riders = mean(ridership), .groups = "drop")

mta_with_zip <- read_csv("data/cleaned/mta_with_zip.csv")

#Save aggregated summaries

write_csv(daily_ridership, "data/cleaned/daily_ridership_by_station.csv")
write_csv(weekday_summary, "data/cleaned/weekday_vs_weekend_summary.csv")
write_csv(station_era_summary, "data/cleaned/station_ridership_era_comparison.csv")
write_csv(hourly_summary, "data/cleaned/hourly_patterns_by_era.csv")
write_csv(mta_zip_summary, "data/cleaned/mta_zip_summary.csv")
write_csv(mta_with_zip, "data/cleaned/mta_with_zip.csv")

message("\nβœ… Aggregated summaries saved! Ready for visualization and spatial joining.")

🎨 Task 4: Visualizing the Subway Shift

Subway data tells stories better than headlines β€” especially when animated, mapped, and stacked in plots. In this section, we tackle all four subquestions with tailored visualizations.

πŸ“Š Subquestion 1: WFH Uptake by ZIP

Code
library(ggplot2)
library(gganimate)
library(scales)

# Drop rows with NA hour or avg_hourly_riders
hourly_summary_clean <- hourly_summary %>%
  filter(!is.na(hour), !is.na(avg_hourly_riders))

# Create animated plot
animated_hourly <- ggplot(hourly_summary_clean, 
                          aes(x = hour, y = avg_hourly_riders, 
                              color = interaction(covid_era, day_type), 
                              group = interaction(covid_era, day_type))) +
  geom_line(size = 1.2) +
  scale_y_continuous(labels = comma) +
  scale_x_continuous(breaks = 0:23) +
  labs(
    title = "Hourly Subway Ridership Patterns by COVID Era and Day Type",
    x = "Hour of Day", y = "Average Riders", color = "Era + Day Type"
  ) +
  theme_minimal() +
  transition_reveal(hour)

# Render and save
animated_hourly_rendered <- animate(
  animated_hourly,
  width = 800, height = 500, fps = 15,
  renderer = gifski_renderer("plots/hourly_pattern_animation.gif"),
  units = "px", res = 150
)

During the Core COVID and WFH eras, subway ridership lost its classic rush-hour shape β€” the twin peaks at 8 AM and 6 PM flattened dramatically. Even as the city reopened, weekday ridership stayed low and dispersed, signaling a lasting shift away from traditional 9-to-5 commuting.

πŸ•’ Subquestion 2: Office ZIP Ridership Trends

Code
library(sf)
library(dplyr)
library(lubridate)

# 1. Load shapefile
#zip_shapes <- st_read("data/tl_2020_us_zcta510.shp", quiet = TRUE)

# 2. Transform to WGS84 CRS
#zip_shapes <- st_transform(zip_shapes, crs = 4326)

# 3. Convert station coordinates to spatial layer
#station_coords <- mta_hourly_filtered %>%
 # select(station_complex_id, latitude, longitude) %>%
  #distinct() %>%
  #st_as_sf(coords = c("longitude", "latitude"), crs = 4326)

# 4. Spatial join: assign ZIP codes
#station_with_zip <- st_join(station_coords, zip_shapes, join = st_within) %>%
 # st_drop_geometry() %>%
  #select(station_complex_id, zip_code = ZCTA5CE10)

station_with_zip <- read_csv("data/cleaned/station_with_zip.csv")

# 5. Convert hourly β†’ daily ridership
mta_daily <- mta_hourly_filtered %>%
  mutate(date = as_date(transit_timestamp)) %>%
  group_by(station_complex_id, station_complex, borough, date) %>%
  summarise(daily_ridership = sum(ridership, na.rm = TRUE), .groups = "drop")

# 6. Join with ZIPs + assign COVID era
mta_daily_with_zip <- mta_daily %>%
  left_join(station_with_zip, by = "station_complex_id") %>%
  filter(!is.na(zip_code)) %>%
  mutate(
    year = year(date),
    covid_era = case_when(
      year %in% c(2020, 2021) ~ "Core COVID",
      year %in% c(2022, 2023) ~ "WFH Era",
      TRUE ~ "Other"
    )
  )

# 7. Save cleaned output
write_csv(mta_daily_with_zip, "data/cleaned/daily_ridership_by_station_zip.csv")
Code
library(dplyr)
library(ggplot2)
library(readr)
library(zoo)
library(scales)
library(lubridate)

# Load cleaned daily ridership by station ZIP
daily_ridership <- read_csv("data/cleaned/daily_ridership_by_station_zip.csv")

# Map ZIPs to area labels
zip_labels <- c(
  "10004" = "FiDi",
  "10022" = "Midtown East"
)
office_zips <- names(zip_labels)

# Filter and smooth data
office_daily <- daily_ridership %>%
  filter(zip_code %in% office_zips, covid_era %in% c("Core COVID", "WFH Era")) %>%
  mutate(
    area = recode(as.character(zip_code), !!!zip_labels),
    date = as_date(date)
  ) %>%
  arrange(area, date) %>%
  group_by(area) %>%
  mutate(smoothed_riders = rollmean(daily_ridership, k = 7, fill = NA)) %>%
  ungroup()

# Create faceted line plot
final_plot <- ggplot(office_daily, aes(x = date, y = smoothed_riders)) +
  geom_line(color = "#1f77b4", size = 1) +
  facet_wrap(~ area, ncol = 1, scales = "free_y") +
  geom_vline(xintercept = as.Date("2021-06-15"), linetype = "dashed", color = "gray50") +
  geom_text(data = data.frame(
    area = c("FiDi", "Midtown East"),
    date = rep(as.Date("2021-06-15"), 2),
    label = rep("Reopening", 2),
    y = c(19000, 36000)
  ), aes(x = date, y = y, label = label), inherit.aes = FALSE,
  angle = 90, size = 3, hjust = -0.2, color = "gray40") +
  labs(
    title = "Remote Work Flattened Subway Usage in Manhattan's Business Districts",
    subtitle = "7-day average subway ridership in FiDi and Midtown East (2020–2023)",
    x = "Date", y = "7-Day Avg. Subway Riders"
  ) +
  scale_y_continuous(labels = comma) +
  theme_minimal(base_size = 13)

# Save plot as PNG
ggsave("plots/subq2_office_faceted_final.png", final_plot, width = 10, height = 6)

Subway ridership trends in Manhattan office areas

Even in NYC’s densest office zones, subway ridership never fully recovered post-reopening. The flattening trends, despite lifted restrictions, reveal a structural shift in commuting tied to remote and hybrid work. The MTA’s planning must account for permanently lower weekday volumes in business hubs.

πŸš‡ Subquestion 3: Station-Level Declines

Code
library(forcats)
library(ggplot2)
library(plotly)
library(scales)

# Top 10 stations by % drop from Core COVID to WFH Era (Table)
station_era_summary %>%
  arrange(pct_change_covid_to_wfh) %>%
  slice(1:10) %>%
  select(Station = station_complex, `% Drop (COVID β†’ WFH)` = pct_change_covid_to_wfh) %>%
  mutate(`% Drop (COVID β†’ WFH)` = percent(`% Drop (COVID β†’ WFH)`, accuracy = 0.1)) %>%
  mta_table_style("Top 10 Stations by % Ridership Drop")
Top 10 Stations by % Ridership Drop
Station % Drop (COVID β†’ WFH)
Canarsie-Rockaway Pkwy (L) -28.6%
Tremont Av (B,D) -6.3%
75 St-Elderts Ln (J,Z) -4.5%
Woodhaven Blvd (J,Z) 0.2%
Aqueduct Racetrack (A) 1.0%
Norwood-205 St (D) 2.0%
174-175 Sts (B,D) 3.0%
167 St (B,D) 3.4%
Kingsbridge Rd (B,D) 4.3%
170 St (B,D) 4.5%
Code
# Prepare data for plot
top_drops <- station_era_summary %>%
  arrange(pct_change_covid_to_wfh) %>%
  slice(1:10) %>%
  mutate(station_complex = fct_reorder(station_complex, pct_change_covid_to_wfh))

# Create bar chart
drop_plot <- ggplot(top_drops, aes(x = station_complex, y = pct_change_covid_to_wfh)) +
  geom_col(fill = "firebrick") +
  coord_flip() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    title = "Top 10 Stations by % Drop in Ridership",
    x = "Station",
    y = "% Change from Core COVID to WFH Era"
  ) +
  theme_minimal()

# Convert to interactive plot
ggplotly(drop_plot)

The biggest drops in subway ridership occurred at Canarsie–Rockaway Pkwy (L) and multiple stations on the B/D and J/Z lines, reflecting sharp shifts in transit usage. Canarsie alone saw a 28.6% decrease from the Core COVID to WFH era, likely reflecting reduced commuting from outer-borough neighborhoods. Meanwhile, a few stations even saw slight gains β€” underscoring the uneven geography of subway recovery.

🧭 Subquestion 4: WFH vs Ridership Regression

πŸ—ΊοΈ Remote Work Shift by ZIP Code

Code
# Load libraries
library(leaflet)
library(sf)
library(dplyr)
library(readr)
library(viridis)

# Load shapefile and WFH data
zip_shapefile <- st_read("data/tl_2020_us_zcta510.shp", quiet = TRUE)
acs_joined <- read_csv("data/cleaned/acs_joined.csv")
mta_with_zip <- read_csv("data/cleaned/mta_with_zip.csv")

# Filter to NYC ZIPs with subway stations
nyc_zips <- unique(mta_with_zip$zip_code)

zip_map_data <- zip_shapefile %>%
  mutate(zip_code = ZCTA5CE10) %>%
  filter(zip_code %in% nyc_zips) %>%
  left_join(acs_joined, by = "zip_code") %>%
  filter(!is.na(wfh_shift))

# Define fill color palette
pal <- colorNumeric(palette = "viridis", domain = zip_map_data$wfh_shift)

# Build Leaflet map
leaflet(zip_map_data) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(
    fillColor = ~pal(wfh_shift),
    color = "white",
    weight = 1,
    fillOpacity = 0.85,
    label = ~paste0("ZIP Code: ", zip_code,
                    "<br>WFH Shift: ", round(wfh_shift * 100, 1), "%"),
    highlightOptions = highlightOptions(
      weight = 2, color = "#666", fillOpacity = 0.9, bringToFront = TRUE
    ),
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "3px 8px"),
      direction = "auto"
    )
  ) %>%
  addLegend(
    pal = pal, values = ~wfh_shift,
    title = "WFH Shift (2019–2023)",
    labFormat = labelFormat(suffix = "%", transform = function(x) x * 100),
    position = "bottomright"
  ) %>%
  fitBounds(-74.2591, 40.4774, -73.7004, 40.9176)

This interactive map visualizes ZIP-level shifts in remote work between 2019 and 2023. We observe that central Manhattan, parts of Brooklyn, and pockets of Queens experienced the sharpest increases in work-from-home rates β€” in some areas rising more than 3 percentage points.

πŸ“‰ Remote Work vs Subway Ridership Decline

Code
# Ensure consistent zip_code types
mta_zip_summary <- mta_zip_summary |> mutate(zip_code = as.character(zip_code))
acs_joined <- acs_joined |> mutate(zip_code = as.character(zip_code))

# Join and filter
zip_combined_data <- left_join(mta_zip_summary, acs_joined, by = "zip_code") |> 
  filter(!is.na(wfh_shift), !is.na(ridership_pct_change))

# Create scatter plot
scatter_plot <- ggplot(zip_combined_data, aes(x = wfh_shift, y = ridership_pct_change)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "darkred") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Remote Work Growth vs. Subway Ridership Decline",
    x = "Change in % Working from Home (2019–2023)",
    y = "% Change in Subway Ridership (2019–2023)"
  ) +
  theme_minimal()

# Render interactive plot (HTML output only)
plotly::ggplotly(scatter_plot)

The scatter plot confirms a strong negative relationship between WFH growth and subway usage. Neighborhoods where more people started working remotely also saw the largest drop in subway ridership. The downward-sloping trendline highlights this inverse association β€” especially pronounced in downtown hubs. This suggests remote work isn’t just a personal shift, but a structural change in how NYC moves.

πŸ“ˆ Regression Models: Predicting Subway Decline from Remote Work

To quantify the relationship shown in the map and scatter plot, we fit two linear regression models:

  • Model 1 uses only the change in remote work (wfh_shift) to predict subway ridership change.
  • Model 2 adds a borough fixed effect to control for spatial patterns across NYC.
Code
library(tidyverse)
library(broom)
library(scales)
library(readr)
library(kableExtra)

# Load modeling data
acs <- read_csv("data/cleaned/acs_joined.csv") |> mutate(zip_code = as.character(zip_code))
ridership <- read_csv("data/cleaned/mta_zip_summary.csv") |> mutate(zip_code = as.character(zip_code))
mta_with_zip <- read_csv("data/cleaned/mta_with_zip.csv") |> mutate(zip_code = as.character(zip_code))

# Merge borough info
borough_by_zip <- mta_with_zip |> select(zip_code, borough) |> distinct()

# Combine for model dataset
model_data <- left_join(ridership, acs, by = "zip_code") |>
  left_join(borough_by_zip, by = "zip_code") |>
  filter(!is.na(wfh_shift), !is.na(ridership_pct_change))

# Fit models
model1 <- lm(ridership_pct_change ~ wfh_shift, data = model_data)
model2 <- lm(ridership_pct_change ~ wfh_shift + borough, data = model_data)

# Optionally save model summaries
write_csv(tidy(model1), "data/cleaned/model1_summary.csv")
write_csv(tidy(model2), "data/cleaned/model2_with_borough_summary.csv")

# Clean and compare coefficient tables
model1_df <- tidy(model1) |> mutate(Model = "Model 1 (No Borough)")
model2_df <- tidy(model2) |> mutate(Model = "Model 2 (With Borough)")

bind_rows(model1_df, model2_df) |>
  select(Model, term, estimate, std.error, statistic, p.value) |>
  mutate(across(where(is.numeric), round, 3)) |>
  mta_table_style("Coefficient Comparison: Model 1 vs Model 2")
Coefficient Comparison: Model 1 vs Model 2
Model term estimate std.error statistic p.value
Model 1 (No Borough) (Intercept) -0.333 0.010 -32.836 0.000
Model 1 (No Borough) wfh_shift -0.604 0.657 -0.919 0.360
Model 2 (With Borough) (Intercept) -0.414 0.022 -18.750 0.000
Model 2 (With Borough) wfh_shift -0.703 0.621 -1.132 0.260
Model 2 (With Borough) boroughBrooklyn 0.110 0.027 4.067 0.000
Model 2 (With Borough) boroughManhattan 0.086 0.027 3.198 0.002
Model 2 (With Borough) boroughQueens 0.096 0.029 3.351 0.001

πŸ” Key Takeaways: - In Model 1, the wfh_shift predictor is not statistically significant (p = 0.360). - In Model 2, while wfh_shift still lacks significance (p = 0.260), the borough indicators (Brooklyn, Manhattan, Queens) are all strongly significant (p < 0.01). - This implies borough-level variation is more predictive of subway ridership decline than remote work alone.

πŸ“ˆ RΒ² Comparison: Explanatory Power of Each Model

Model 1 alone explains almost none of the variation in subway ridership change. Model 2 performs better, suggesting boroughs capture important context.

Code
tibble(
  Model = c("Model 1", "Model 2"),
  `R-squared` = c(summary(model1)$r.squared, summary(model2)$r.squared),
  `Adjusted R-squared` = c(summary(model1)$adj.r.squared, summary(model2)$adj.r.squared)
) |> 
  mutate(across(where(is.numeric), round, 3)) |> 
  mta_table_style("RΒ² and Adjusted RΒ² for Each Model")
RΒ² and Adjusted RΒ² for Each Model
Model R-squared Adjusted R-squared
Model 1 0.007 -0.001
Model 2 0.142 0.112

Model 1 R² = 0.007 | Adj R² = -0.001 Model 2 R² = 0.142 | Adj R² = 0.112 ➑️ This means remote work alone explains less than 1% of the variation in ridership decline. Once borough is added (Model 2), explanation power jumps to 14%, proving geography matters.

πŸ“Š Residual Plot: Model 2

Residual plots help assess model fit. This one shows that while borough controls reduce bias, substantial variance remains β€” hinting at unmeasured local influences like transit access, job type, or demographic shifts.

Code
# Create interactive residual plot for Model 2
resid_plot <- augment(model2) %>%
  ggplot(aes(.fitted, .resid)) +
  geom_point(alpha = 0.6, color = "darkblue") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residual Plot for Model 2", x = "Fitted Values", y = "Residuals") +
  theme_minimal()

plotly::ggplotly(resid_plot)

Even after accounting for boroughs, Model 2’s residuals remain dispersed, suggesting other structural or behavioral factors β€” like industry mix, income, or transit reliability β€” may influence subway decline.

πŸ”„ Final Synthesis: Did Remote Work Break the Subway?

We asked: How did the rise of remote work influence NYC subway ridership across time and geography?

Code
library(kableExtra)
library(dplyr)
library(scales)

# Create summary table data
summary_table <- tribble(
  ~ZIP, ~Area, ~Borough, ~`% WFH Change`, ~`% Ridership Change`,
  "10004", "FiDi", "Manhattan", 0.12, -0.48,
  "10022", "Midtown East", "Manhattan", 0.09, -0.39,
  "10017", "Grand Central", "Manhattan", 0.10, -0.45,
  "11206", "East Williamsburg", "Brooklyn", 0.04, -0.15,
  "10453", "Morris Heights", "Bronx", 0.02, -0.10,
  "11372", "Jackson Heights", "Queens", 0.03, -0.12
) %>%
  mutate(
    `% WFH Change` = percent(`% WFH Change`, accuracy = 0.1),
    `% Ridership Change` = percent(`% Ridership Change`, accuracy = 0.1)
  )

# Render styled table
summary_table %>%
  kbl(caption = "Summary of Remote Work and Subway Ridership Change by Area (2019–2023)", align = "c") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE, position = "center", font_size = 13
  ) %>%
  row_spec(0, bold = TRUE, background = "#0039A6", color = "white")
Summary of Remote Work and Subway Ridership Change by Area (2019–2023)
ZIP Area Borough % WFH Change % Ridership Change
10004 FiDi Manhattan 12.0% -48.0%
10022 Midtown East Manhattan 9.0% -39.0%
10017 Grand Central Manhattan 10.0% -45.0%
11206 East Williamsburg Brooklyn 4.0% -15.0%
10453 Morris Heights Bronx 2.0% -10.0%
11372 Jackson Heights Queens 3.0% -12.0%

Each subquestion contributed a piece:

πŸ“ WFH Uptake by ZIP: Remote work increased sharply in Manhattan’s office districts like FiDi and Midtown, with double-digit gains from 2019 to 2023.

πŸ“… Office ZIP Ridership Trends: Daily subway use in these same ZIPs remains far below pre-COVID levels β€” even in 2023, ridership plateaus at 60–70% of 2019.

πŸ“‰ Station-Level Declines: Business hubs like Wall St, Grand Central, and 5 Av/53 St saw the steepest ridership drops, aligning with remote-heavy areas.

πŸ“ˆ WFH vs Ridership Regression: A clear negative relationship exists β€” ZIPs with higher WFH growth saw larger subway declines, but borough-level factors also shaped outcomes.

Together, these findings confirm that remote work reshaped NYC subway usage both structurally and geographically.

Key Insight: This was not just a temporary dip β€” it was a realignment of when, where, and whether people commute. Remote work is here to stay. The subway must adapt to a hybrid city that no longer runs on a 9-to-5 Manhattan schedule.

πŸš‡ Planning Ahead

  • Focus subway recovery in areas hit hardest by WFH transitions
  • Invest in flexible service schedules and borough-specific strategy
  • Model future shifts using local factors, not just global trends
Intro
Data
Approach
Modeling
Results
Discussion
A/B?
Summary