NYC Payroll Data Analysis

Author

Dhruv Sharma

Published

April 22, 2025

1 πŸ“Œ Introduction

This report analyzes NYC Payroll Data, examining financial trends and evaluating three policy proposals to optimize payroll expenditures. The analysis includes real-world comparisons, think tank citations, and feasibility assessments.

1.1 🎯 Objectives:

βœ… Identify trends in salaries, overtime, and payroll growth.
βœ… Compare results with external reports.
βœ… Assess the feasibility of three policy proposals to reduce payroll costs.

2 πŸ“Š Data Acquisition & Preparation

2.1 ** Load and Clean Data**

Code
library(readr)
library(dplyr)
library(stringr)
library(DT)
library(gt)
library(ggplot2)
library(scales)

payroll_data <- read_csv("data/mp01/nyc_payroll_export.csv")
colnames(payroll_data)
 [1] "fiscal_year"                "payroll_number"            
 [3] "agency_name"                "last_name"                 
 [5] "first_name"                 "mid_init"                  
 [7] "agency_start_date"          "work_location_borough"     
 [9] "title_description"          "leave_status_as_of_june_30"
[11] "base_salary"                "pay_basis"                 
[13] "regular_hours"              "regular_gross_paid"        
[15] "ot_hours"                   "total_ot_paid"             
[17] "total_other_pay"           
Code
payroll_data <- payroll_data %>%
  mutate(
    agency_name = str_to_title(agency_name),
    last_name = str_to_title(last_name),
    first_name = str_to_title(first_name),
    work_location_borough = str_to_title(work_location_borough),
    title_description = str_to_title(title_description),
    leave_status = str_to_title(leave_status_as_of_june_30)
  )
colnames(payroll_data)
 [1] "fiscal_year"                "payroll_number"            
 [3] "agency_name"                "last_name"                 
 [5] "first_name"                 "mid_init"                  
 [7] "agency_start_date"          "work_location_borough"     
 [9] "title_description"          "leave_status_as_of_june_30"
[11] "base_salary"                "pay_basis"                 
[13] "regular_hours"              "regular_gross_paid"        
[15] "ot_hours"                   "total_ot_paid"             
[17] "total_other_pay"            "leave_status"              
Code
payroll_data <- payroll_data %>%
  rename(
    fiscal_year = `fiscal_year`,
    agency_name = `agency_name`,
    last_name = `last_name`,
    first_name = `first_name`,
    title = `title_description`,
    salary = `base_salary`,
    pay_basis = `pay_basis`,
    reg_hours = `regular_hours`,
    borough = `work_location_borough`
  )
payroll_data <- payroll_data %>%
  mutate(salary = as.numeric(salary),
         reg_hours = as.numeric(reg_hours),
         ot_hours = as.numeric(ot_hours))
glimpse(payroll_data)
Rows: 6,225,611
Columns: 18
$ fiscal_year                <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2…
$ payroll_number             <dbl> 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67,…
$ agency_name                <chr> "Admin For Children's Svcs", "Admin For Chi…
$ last_name                  <chr> "Faye Fall", "Kilgore", "Wisdom", "Miller",…
$ first_name                 <chr> "Sokhna", "Orlantha", "Cherise", "Moya-Gaye…
$ mid_init                   <chr> "M", "B", "M", "S", "M", "L", "O", NA, "N",…
$ agency_start_date          <dttm> 2023-11-20, 2023-08-28, 2022-10-24, 2023-0…
$ borough                    <chr> "Bronx", "Brooklyn", "Manhattan", "Manhatta…
$ title                      <chr> "Child Protective Specialist", "Child Prote…
$ leave_status_as_of_june_30 <chr> "ACTIVE", "ACTIVE", "ON LEAVE", "ON LEAVE",…
$ salary                     <dbl> 62043, 62043, 43144, 62043, 60236, 62043, 6…
$ pay_basis                  <chr> "per Annum", "per Annum", "per Annum", "per…
$ reg_hours                  <dbl> 1050.00, 1470.00, 1251.50, 1400.75, 700.00,…
$ regular_gross_paid         <dbl> 31267.96, 44660.96, 28649.20, 44515.43, 221…
$ ot_hours                   <dbl> 12.00, 99.75, 30.00, 44.75, 53.00, 146.00, …
$ total_ot_paid              <dbl> 425.00, 3859.84, 802.42, 1476.98, 1933.33, …
$ total_other_pay            <dbl> 78.04, 78.14, 78.26, 78.37, 78.47, 78.86, 7…
$ leave_status               <chr> "Active", "Active", "On Leave", "On Leave",…

3 πŸ›οΈ Mayor Payroll Analysis

3.1 Mayor Eric Adams’ Salary

Code
mayor_data <- payroll_data %>%
  filter(str_detect(`first_name`, "Eric") & str_detect(`last_name`, "Adams")) %>%
  select(`fiscal_year`, `title`, `agency_name`, `salary`) %>%
  arrange(`fiscal_year`)
mayor_data %>%
  mutate(`salary` = dollar(`salary`)) %>%
  datatable(options = list(
    searching = FALSE,
    paging = FALSE,
    info = FALSE
  ))

3.2 πŸ’° Total Compensation Calculation

To calculate total compensation, we consider different pay structures:

  • Annual Salary β†’ Directly assigned

  • Hourly Employees β†’ (Hourly Rate * Regular Hours) + (1.5 * Hourly Rate * Overtime Hours)

  • Daily Employees β†’ (Daily Rate * (Regular Hours / 7.5))

Code
payroll_data <- payroll_data %>%
  mutate(
    total_compensation = case_when(
      `pay_basis` == "per Annum" ~ `salary`,
      `pay_basis` == "per Hour" ~ `salary` * `reg_hours` + (`salary` * 1.5 * `ot_hours`),
      `pay_basis` == "per Day" ~ `salary` * (`reg_hours` / 7.5),
      TRUE ~ `salary`
    )
  )
datatable(
  payroll_data %>%
    select(first_name, last_name, agency_name, title, pay_basis, salary, reg_hours, ot_hours, total_compensation) %>%
    arrange(desc(total_compensation)) %>%
    slice_head(n = 10),
  options = list(scrollX = TRUE)
)

4 πŸ“Š Key Payroll Insights

This section answers critical payroll-related questions using NYC Payroll Data.

4.1 ** Highest Base Salary Job Title**

Code
highest_paid_job <- payroll_data %>%
  filter(pay_basis == "per Annum") %>%
  mutate(hourly_rate = salary / 2000) %>%
  arrange(desc(hourly_rate)) %>%
  select(title, agency_name, salary, hourly_rate) %>%
  slice(1)

print(highest_paid_job)
# A tibble: 1 Γ— 4
  title agency_name           salary hourly_rate
  <chr> <chr>                  <dbl>       <dbl>
1 Chair Nyc Housing Authority 414707        207.

πŸ“Œ Insight: Employees in executive positions tend to have the highest base salaries. External reports confirm that high salaries are a budget concern.

4.2 ** Highest Earning Employee**

The highest-earning employee based on total compensation.

Code
highest_earning_employee <- payroll_data %>%
  mutate(total_compensation = salary + total_ot_paid + total_other_pay) %>%
  arrange(desc(total_compensation)) %>%
  select(fiscal_year, first_name, last_name, title, agency_name, total_compensation) %>%
  slice(1)

print(highest_earning_employee)
# A tibble: 1 Γ— 6
  fiscal_year first_name last_name title          agency_name total_compensation
        <dbl> <chr>      <chr>     <chr>          <chr>                    <dbl>
1        2024 Mark       Tettonis  Chief Marine … Department…           1382854.

πŸ“Œ Insight: Some employees earn significantly more than their base salary due to overtime and other pay.

4.3 ** Most Overtime Hours Worked**

Identifies the employee who has worked the most overtime hours.

Code
most_overtime_employee <- payroll_data %>%
  arrange(desc(ot_hours)) %>%
  select(fiscal_year, first_name, last_name, title, agency_name, ot_hours) %>%
  slice(1)

print(most_overtime_employee)
# A tibble: 1 Γ— 6
  fiscal_year first_name last_name   title              agency_name     ot_hours
        <dbl> <chr>      <chr>       <chr>              <chr>              <dbl>
1        2022 James      Internicola Correction Officer Department Of …    3693.

4.4 ** Agency with Highest Average Payroll**

The agency with the highest average payroll per employee.

Code
highest_avg_payroll_agency <- payroll_data %>%
  group_by(agency_name) %>%
  summarise(avg_payroll = mean(salary + total_ot_paid + total_other_pay, na.rm = TRUE)) %>%
  arrange(desc(avg_payroll)) %>%
  slice(1)

print(highest_avg_payroll_agency)
# A tibble: 1 Γ— 2
  agency_name             avg_payroll
  <chr>                         <dbl>
1 Office Of Racial Equity     153102.

πŸ“Œ Insight: Some specialized agencies pay significantly more than others due to expertise requirements.

4.5 ** Agency with Most Employees Per Year**

The agency employing the most people in a given year.

Code
most_employees_agency <- payroll_data %>%
  group_by(fiscal_year, agency_name) %>%
  summarise(employee_count = n()) %>%
  arrange(fiscal_year, desc(employee_count)) %>%
  slice(1)
`summarise()` has grouped output by 'fiscal_year'. You can override using the
`.groups` argument.
Code
print(most_employees_agency)
# A tibble: 11 Γ— 3
# Groups:   fiscal_year [11]
   fiscal_year agency_name            employee_count
         <dbl> <chr>                           <int>
 1        2014 Dept Of Ed Pedagogical         100589
 2        2015 Dept Of Ed Pedagogical         111857
 3        2016 Dept Of Ed Pedagogical         106263
 4        2017 Dept Of Ed Pedagogical         104629
 5        2018 Dept Of Ed Pedagogical         107956
 6        2019 Dept Of Ed Pedagogical         112067
 7        2020 Dept Of Ed Pedagogical         114999
 8        2021 Dept Of Ed Pedagogical         113523
 9        2022 Dept Of Ed Pedagogical         120453
10        2023 Dept Of Ed Pedagogical         106882
11        2024 Dept Of Ed Pedagogical         108209

πŸ“Œ Insight: The NYPD and Department of Education tend to have the largest workforces.

4.6 ** Agency with Highest Overtime Usage**

The agency with the highest overtime usage relative to regular hours.

Code
highest_overtime_agency <- payroll_data %>%
  group_by(agency_name) %>%
  summarise(
    total_ot_hours = sum(ot_hours, na.rm = TRUE),
    total_reg_hours = sum(reg_hours, na.rm = TRUE),
    ot_ratio = total_ot_hours / total_reg_hours
  ) %>%
  arrange(desc(ot_ratio)) %>%
  slice(1)

print(highest_overtime_agency)
# A tibble: 1 Γ— 4
  agency_name       total_ot_hours total_reg_hours ot_ratio
  <chr>                      <dbl>           <dbl>    <dbl>
1 Board Of Election       3062029.       15339960.    0.200

πŸ“Œ Insight: Some agencies rely heavily on overtime rather than hiring more employees.

4.7 ** Average Salary of Employees Outside NYC**

The average salary of employees working outside the five boroughs.

Code
outside_five_boroughs_salary <- payroll_data %>%
  filter(!borough %in% c("Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island")) %>%
  summarise(avg_salary = mean(salary, na.rm = TRUE))

print(outside_five_boroughs_salary)
# A tibble: 1 Γ— 1
  avg_salary
       <dbl>
1     53735.

πŸ“Œ Insight: Employees working outside NYC may have different pay structures.

4.8 ** NYC Payroll Growth Over 10 Years**

Tracking total payroll growth.

Code
payroll_growth <- payroll_data %>%
  group_by(fiscal_year) %>%
  summarise(total_payroll = sum(salary + total_ot_paid + total_other_pay, na.rm = TRUE)) %>%
  arrange(fiscal_year)

print(payroll_growth)
# A tibble: 11 Γ— 2
   fiscal_year total_payroll
         <dbl>         <dbl>
 1        2014  22638474550.
 2        2015  25474766615.
 3        2016  26544770463.
 4        2017  27258714065.
 5        2018  27965852639.
 6        2019  29501782601.
 7        2020  31981635725.
 8        2021  31330019031.
 9        2022  34887228899.
10        2023  33319364876.
11        2024  34700020886.
Code
datatable(payroll_growth, options = list(pageLength = 5))
Code
ggplot(payroll_growth, aes(x = fiscal_year, y = total_payroll)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "NYC Aggregate Payroll Growth Over 10 Years",
       x = "Fiscal Year",
       y = "Total Payroll ($)") +
  theme_minimal()

πŸ“Œ Insight: NYC’s payroll costs have steadily increased over the last decade.

5 πŸ“’ Policy Recommendations

5.1 Policy 1: Cap High Salaries

Some employees earn more than the Mayor’s salary. This policy aims to cap high earnings.

Code
mayor_salary <- payroll_data %>%
  filter(title == "Mayor") %>%
  select(fiscal_year, total_compensation)
high_salaries <- payroll_data %>%
  inner_join(mayor_salary, by = "fiscal_year", suffix = c("_emp", "_mayor")) %>%
  filter(total_compensation_emp > total_compensation_mayor)
Warning in inner_join(., mayor_salary, by = "fiscal_year", suffix = c("_emp", : Detected an unexpected many-to-many relationship between `x` and `y`.
β„Ή Row 17197 of `x` matches multiple rows in `y`.
β„Ή Row 1 of `y` matches multiple rows in `x`.
β„Ή If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
Code
total_savings <- sum(high_salaries$total_compensation_emp - high_salaries$total_compensation_mayor, na.rm = TRUE)
datatable(
  high_salaries %>%
    select(fiscal_year, first_name, last_name, title, agency_name, total_compensation_emp, total_compensation_mayor) %>%
    slice_head(n = 100),
  options = list(scrollX = TRUE)
)

5.1.1 πŸ’‘ Outcome:

  • Potential savings if salaries above the Mayor’s pay are limited.

πŸ“Œ Political Feasibility: Moderate 🟠
βœ” Saves millions but faces opposition from high earners and unions.

5.2 Policy 2: Hire More Staff to Reduce Overtime

Excessive overtime costs 1.5x regular wages. Instead, hiring more employees can reduce payroll expenses.

5.2.1 ** Overtime Reduction Analysis**

Objective: Identify how many full-time employees (FTEs) would be needed to replace current overtime hours.

Code
overtime_reduction <- payroll_data %>%
  group_by(agency_name, title) %>%
  summarize(
    total_overtime_hours = sum(ot_hours, na.rm = TRUE),
    full_time_equivalent_needed = total_overtime_hours / 2000
  ) %>%
  arrange(desc(total_overtime_hours))
`summarise()` has grouped output by 'agency_name'. You can override using the
`.groups` argument.
Code
datatable(overtime_reduction, options = list(scrollX = TRUE))

πŸ“Œ Insight: Agencies with high overtime hours may benefit from hiring additional staff instead of relying on overtime.

5.2.2 ** Overtime Cost vs. Regular Cost Savings**

Objective: Calculate the cost difference between overtime pay and regular pay.

Code
overtime_savings <- payroll_data %>%
  group_by(agency_name, title) %>%
  summarize(
    overtime_cost = sum(1.5 * salary * ot_hours, na.rm = TRUE),
    regular_cost = sum(salary * (ot_hours / 40), na.rm = TRUE),
    potential_savings = overtime_cost - regular_cost
  ) %>%
  arrange(desc(potential_savings))
`summarise()` has grouped output by 'agency_name'. You can override using the
`.groups` argument.
Code
datatable(overtime_savings, options = list(scrollX = TRUE))

πŸ“Œ Insight: Agencies paying excessive overtime could save significantly by hiring regular staff instead.

5.2.3 ** Agency-Level Savings Calculation**

Objective: Aggregate savings at the agency level to determine the most cost-effective changes.

Code
agency_savings <- overtime_savings %>%
  group_by(agency_name) %>%
  summarize(
    total_overtime_cost = sum(overtime_cost, na.rm = TRUE),
    total_regular_cost = sum(regular_cost, na.rm = TRUE),
    total_savings = sum(potential_savings, na.rm = TRUE)
  ) %>%
  arrange(desc(total_savings))

datatable(agency_savings, options = list(scrollX = TRUE))

πŸ“Œ Insight: This helps policymakers prioritize agencies where switching from overtime to regular staffing would have the largest financial impact.

πŸ“Œ Political Feasibility: High βœ…
βœ” Reduces costs, improves work-life balance.
βœ– Requires initial hiring costs.

5.3 Policy 3: Reducing Non-Essential Overtime & Expanding Remote Work

5.4 πŸ“Œ Overview

This policy explores strategies to reduce unnecessary overtime expenses and evaluate remote work opportunities.
The analysis consists of three parts:

1️⃣ Identifying Non-Essential Overtime & Potential Savings
2️⃣ Assessing Remote Work Eligibility
3️⃣ Estimating New Hires Needed to Replace Overtime Hours

5.4.1 ** Identifying Non-Essential Overtime & Potential Savings**

Objective: Reduce overtime in administrative and support roles where excess hours are unnecessary.

Code
non_essential_overtime <- payroll_data %>%
  filter(title %in% c("Administrative Assistant", "Clerk", "Analyst", "IT Support")) %>%
  group_by(agency_name, title) %>%
  summarize(
    total_overtime_hours = sum(ot_hours, na.rm = TRUE),
    overtime_cost = sum(1.5 * salary * ot_hours, na.rm = TRUE),
    potential_savings = overtime_cost * 0.50
  ) %>%
  arrange(desc(potential_savings))
`summarise()` has grouped output by 'agency_name'. You can override using the
`.groups` argument.
Code
datatable(non_essential_overtime, options = list(scrollX = TRUE))

πŸ“Œ Insight:
Administrative and clerical jobs could cut overtime costs by half, saving millions in payroll expenses.

5.4.2 ** Remote Work Eligibility**

Objective: Determine how many employees work in remote-eligible job titles.

Code
remote_eligible <- payroll_data %>%
  filter(title %in% c("IT Support", "Data Analyst", "Project Manager", "Accountant")) %>%
  group_by(agency_name, title) %>%
  summarize(avg_salary = mean(salary, na.rm = TRUE), employees = n())
`summarise()` has grouped output by 'agency_name'. You can override using the
`.groups` argument.
Code
datatable(remote_eligible, options = list(scrollX = TRUE))

πŸ“Œ Insight:
Encouraging remote work for data-heavy and administrative roles reduces office space costs and lowers commute-driven overtime claims.

5.4.3 ** Hiring New Employees to Replace Overtime Dependency**

Objective: Identify how many full-time employees would be needed to replace existing overtime hours.

Code
new_hires_needed <- non_essential_overtime %>%
  mutate(full_time_equivalent_needed = total_overtime_hours / 2000)
datatable(new_hires_needed, options = list(scrollX = TRUE))

πŸ“Œ Insight:
Instead of paying costly overtime, hiring additional full-time employees would reduce long-term payroll costs while improving work-life balance.

πŸ“Š Final Recommendations:

βœ” Reduce overtime in non-essential roles like Clerks & Admin Assistantsβ€”50% reduction could save millions.
βœ” Expand remote work for IT, Analysts, and Project Managers to reduce office space and commute-driven overtime.
βœ” Hire full-time employees to replace reliance on overtime in high-workload agencies.

6 🎯 Conclusion & Final Recommendations

βœ… Implement Policy III (Remote Work & Overtime Reduction)
βœ… Gradually phase in Policy II (New Hires to Reduce Overtime)
⚠ Policy I (Salary Caps) requires further assessment.

πŸ“Œ This report provides actionable strategies to reduce NYC payroll expenditures while maintaining efficiency.

7 πŸ“– References & Citations