STA9750 Submission Material
  • About Me
  • NYC Payroll Data Analysis
  • Grand Transit Awards: GTA IV Edition
  • Ultimate Playlist: Hustle and Heart

On this page

  • 🎧 Introduction
  • ⚙️ Setup: Load & Install Required Packages
    • 🎧 Spotify Style Setup
  • 🎧 Task 1: Load Spotify Song Characteristics
  • Task 2: Import Playlist Dataset
  • 🎼 Task 3: Rectify Playlist Data to Track-Level Format
  • 🎧 Task 4: Initial Exploration of Track & Playlist Data
    • 🎵 Q1: How many distinct tracks and artists?
    • 🔥 Q2: What are the 5 most common tracks?
    • ❓ Q3: Most Popular Track Not in SONGS
    • 💃 Q4: Most Danceable Track
    • ⏱️ Q5: Playlist with Longest Average Track Duration
    • ⭐ Q6: Most Followed Playlist
  • 🎧 Task 5: Visually Identifying Characteristics of Popular Songs
    • 📈 Q1: Is Popularity Correlated with Playlist Appearances?
    • 📅 Q2: When Were Popular Songs Released?
    • 💃 Q3: When Did Danceability Peak?
    • 📀 Q4: Most Represented Decade
    • 🎹 Q5: Key Frequency (Polar Plot)
    • ⏱️ Q6: Most Common Track Lengths
    • 🎼 Q7: Tempo vs Danceability (Popular Songs)
    • 📊 Q8: Playlist Followers vs Avg. Popularity
  • 🔍 Task 6: Finding Related Songs
    • 🎵 Identify Anchor Tracks
    • 🎬 Anchor Tracks – YouTube Preview
      • 🎧 Heuristic 1: Co-occurring Songs in a Random Playlist
      • 🎚️ Heuristic 2: Similar Tempo & Key
      • 🧑‍🎤 Heuristic 3: Same Artist
      • 🎛️ Heuristic 4: Acoustic / Energy Profile Match
      • 🎚️ Heuristic 5: Valence & Loudness
      • 🎼 Combine Playlist Candidates
      • 📋 Preview of Final Playlist Candidates
  • 🎧 Task 7: Curate and Analyze Your Ultimate Playlist – “Hustle & Heart”
    • Hustle and Heart 🎧

The Ultimate Playlist - Hustle & Heart 🎶

Author

Dhruv

Published

April 25, 2025

🎧 Introduction

From millions of Spotify tracks and playlists, Hustle & Heart emerges as a curated sound journey built on energy, emotion, and authenticity. This project explores what makes songs stick — analyzing popularity, danceability, and musical DNA — before distilling it all into a final 12-track playlist that hits with both data and vibe.

🎶 Just here for the playlist? Tap here

⚙️ Setup: Load & Install Required Packages

This chunk ensures all necessary R packages are installed and loaded before running the rest of the analysis. ✅📦

Code
ensure_package <- function(pkg){
  if (!requireNamespace(pkg, quietly = TRUE)) {
    install.packages(pkg, repos = "https://cloud.r-project.org")
  }
  library(pkg, character.only = TRUE)
}

required_packages <- c(
  "dplyr", "stringr", "tidyr", "purrr", "readr", "jsonlite",
  "ggplot2", "scales", "DT", "rvest", "httr2", "tibble"
)

invisible(lapply(required_packages, ensure_package))

options(dplyr.summarise.inform = FALSE)

🎧 Spotify Style Setup

This chunk sets a custom Spotify-themed style for all plots and tables to give the report a bold, immersive aesthetic. 🎨🟢🖤

Code
library(ggplot2)
library(kableExtra)

theme_spotify <- function() {
  theme_minimal(base_family = "Arial") +
    theme(
      plot.background = element_rect(fill = "#191414", color = NA),
      panel.background = element_rect(fill = "#191414", color = NA),
      panel.grid = element_line(color = "#1DB954", linewidth = 0.1),
      text = element_text(color = "white"),
      axis.title = element_text(face = "bold", color = "white"),
      axis.text = element_text(color = "#b3b3b3"),
      plot.title = element_text(size = 16, face = "bold", color = "#1DB954"),
      plot.subtitle = element_text(size = 12, color = "#b3b3b3")
    )
}

spotify_table <- function(df, caption_text = "") {
  knitr::kable(df, format = "html", caption = caption_text) |>
    kableExtra::kable_styling(
      full_width = TRUE,
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      position = "left"
    ) |>
    kableExtra::row_spec(0, background = "#1DB954", color = "white") |>
    kableExtra::kable_styling(font_size = 14)
}

🎧 Task 1: Load Spotify Song Characteristics

In this first task, we download and clean a Spotify song characteristics dataset made available via GitHub. The dataset includes song-level features such as danceability, energy, valence, and more. Our goal is to create a clean, rectangular dataset where each row corresponds to a single artist-song pair.

id name duration_ms release_date year acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence mode key popularity explicit artist
6KbQ3uYMLKb5jDxLF7wYDD Singende Bataillone 1. Teil 158648 1928 1928 0.995 0.708 0.1950 0.563 0.1510 -12.428 0.0506 118.469 0.7790 1 10 0 0 Carl Woitschach
6KuQTIu1KoTTkLXKrwlLPV Fantasiestücke, Op. 111: Più tosto lento 282133 1928 1928 0.994 0.379 0.0135 0.901 0.0763 -28.454 0.0462 83.972 0.0767 1 8 0 0 Robert Schumann
6KuQTIu1KoTTkLXKrwlLPV Fantasiestücke, Op. 111: Più tosto lento 282133 1928 1928 0.994 0.379 0.0135 0.901 0.0763 -28.454 0.0462 83.972 0.0767 1 8 0 0 Vladimir Horowitz
6L63VW0PibdM1HDSBoqnoM Chapter 1.18 - Zamek kaniowski 104300 1928 1928 0.604 0.749 0.2200 0.000 0.1190 -19.924 0.9290 107.177 0.8800 0 5 0 0 Seweryn Goszczyński
6M94FkXd15sOAOQYRnWPN8 Bebamos Juntos - Instrumental (Remasterizado) 180760 9/25/28 1928 0.995 0.781 0.1300 0.887 0.1110 -14.734 0.0926 108.003 0.7200 0 1 0 0 Francisco Canaro
6N6tiFZ9vLTSOIxkj8qKrd Polonaise-Fantaisie in A-Flat Major, Op. 61 687733 1928 1928 0.990 0.210 0.2040 0.908 0.0980 -16.829 0.0424 62.149 0.0693 1 11 1 0 Frédéric Chopin
6N6tiFZ9vLTSOIxkj8qKrd Polonaise-Fantaisie in A-Flat Major, Op. 61 687733 1928 1928 0.990 0.210 0.2040 0.908 0.0980 -16.829 0.0424 62.149 0.0693 1 11 1 0 Vladimir Horowitz
6NxAf7M8DNHOBTmEd3JSO5 Scherzo a capriccio: Presto 352600 1928 1928 0.995 0.424 0.1200 0.911 0.0915 -19.242 0.0593 63.521 0.2660 0 6 0 0 Felix Mendelssohn
6NxAf7M8DNHOBTmEd3JSO5 Scherzo a capriccio: Presto 352600 1928 1928 0.995 0.424 0.1200 0.911 0.0915 -19.242 0.0593 63.521 0.2660 0 6 0 0 Vladimir Horowitz
6O0puPuyrxPjDTHDUgsWI7 Valse oubliée No. 1 in F-Sharp Major, S. 215/1 136627 1928 1928 0.956 0.444 0.1970 0.435 0.0744 -17.226 0.0400 80.495 0.3050 1 11 0 0 Franz Liszt

Task 2: Import Playlist Dataset

We responsibly download and combine all JSON playlist slices into a single list for future processing.

Code
load_playlists <- function() {
  library(jsonlite)
  library(purrr)
  
  dir_path <- "data/mp03/data1"
  if (!dir.exists(dir_path)) dir.create(dir_path, recursive = TRUE)
  
  base_url <- "https://raw.githubusercontent.com/DevinOgrady/spotify_million_playlist_dataset/main/data1/"
  starts <- seq(0, 999000, by = 1000)
  file_names <- sprintf("mpd.slice.%d-%d.json", starts, starts + 999)
  file_paths <- file.path(dir_path, file_names)
  
  for (i in seq_along(file_names)) {
    if (!file.exists(file_paths[i])) {
      url <- paste0(base_url, file_names[i])
      tryCatch({
        download.file(url, destfile = file_paths[i], mode = "wb", timeout = 300)
      }, error = function(e) {
        message("⚠️ Failed to download: ", file_names[i])
      })
    }
  }

  read_playlist_file <- function(path) {
    tryCatch(
      fromJSON(path)$playlists,
      error = function(e) {
        message("❌ Skipping corrupted file: ", path)
        return(NULL)
      }
    )
  }

  valid_paths <- file_paths[file.exists(file_paths)]
  playlists_list <- map(valid_paths, read_playlist_file)
  playlists_list <- compact(playlists_list)
  
  return(playlists_list)
}

PLAYLISTS_LIST <- load_playlists()
all_playlists <- PLAYLISTS_LIST %>% list_rbind()
DT::datatable(
  head(all_playlists, 10),
  options = list(
    pageLength = 6,
    dom = 'tip',
    scrollX = TRUE
  ),
  class = "display compact stripe hover",
  rownames = FALSE
)

🎼 Task 3: Rectify Playlist Data to Track-Level Format

We flatten the hierarchical playlist JSONs into a clean, rectangular track-level format, stripping unnecessary prefixes and standardizing column names.

Code
strip_spotify_prefix <- function(x){
  str_extract(x, ".*:.*:(.*)")
}

rectified_data <- all_playlists %>%
  select(
    playlist_name = name,
    playlist_id = pid,
    playlist_followers = num_followers,
    tracks
  ) %>%
  unnest(tracks) %>%
  mutate(
    playlist_position = row_number(),
    artist_name = map_chr(artist_name, 1, .default = NA_character_),
    artist_id = strip_spotify_prefix(artist_uri),
    track_name = track_name,
    track_id = strip_spotify_prefix(track_uri),
    album_name = album_name,
    album_id = strip_spotify_prefix(album_uri),
    duration = duration_ms
  ) %>%
  select(
    playlist_name, playlist_id, playlist_position, playlist_followers,
    artist_name, artist_id, track_name, track_id,
    album_name, album_id, duration
  )
spotify_table(head(rectified_data, 10))
playlist_name playlist_id playlist_position playlist_followers artist_name artist_id track_name track_id album_name album_id duration
Throwbacks 0 1 1 Missy Elliott spotify:artist:2wIVse2owClT7go1WT98tk Lose Control (feat. Ciara & Fat Man Scoop) spotify:track:0UaMYEvWZi0ZqiDOoHU3YI The Cookbook spotify:album:6vV5UrXcfyQD1wu4Qo2I9K 226863
Throwbacks 0 2 1 Britney Spears spotify:artist:26dSoYclwsYLMAKD3tpOr4 Toxic spotify:track:6I9VzXrHxO9rA9A5euc8Ak In The Zone spotify:album:0z7pVBGOD7HCIB7S8eLkLI 198800
Throwbacks 0 3 1 Beyoncé spotify:artist:6vWDO969PvNqNYHIOW5v0m Crazy In Love spotify:track:0WqIKmW4BTrj3eJFmnCKMv Dangerously In Love (Alben für die Ewigkeit) spotify:album:25hVFAxTlDvXbx2X2QkUkE 235933
Throwbacks 0 4 1 Justin Timberlake spotify:artist:31TPClRtHm23RisEBtV3X7 Rock Your Body spotify:track:1AWQoqb9bSvzTjaLralEkT Justified spotify:album:6QPkyl04rXwTGlGlcYaRoW 267266
Throwbacks 0 5 1 Shaggy spotify:artist:5EvFsr3kj42KNv97ZEnqij It Wasn't Me spotify:track:1lzr43nnXAijIGYnCT8M8H Hot Shot spotify:album:6NmFmPX56pcLBOFMhIiKvF 227600
Throwbacks 0 6 1 Usher spotify:artist:23zg3TcAtWQy7J6upgbUnj Yeah! spotify:track:0XUfyU2QviPAs6bxSpXYG4 Confessions spotify:album:0vO0b1AvY49CPQyVisJLj0 250373
Throwbacks 0 7 1 Usher spotify:artist:23zg3TcAtWQy7J6upgbUnj My Boo spotify:track:68vgtRHr7iZHpzGpon6Jlo Confessions spotify:album:1RM6MGv6bcl6NrAG8PGoZk 223440
Throwbacks 0 8 1 The Pussycat Dolls spotify:artist:6wPhSqRtPu1UhRCDX5yaDJ Buttons spotify:track:3BxWKCI06eQ5Od8TY2JBeA PCD spotify:album:5x8e8UcCeOgrOzSnDGuPye 225560
Throwbacks 0 9 1 Destiny's Child spotify:artist:1Y8cdNmUJH7yBTd9yOvr5i Say My Name spotify:track:7H6ev70Weq6DdpZyyTmUXk The Writing's On The Wall spotify:album:283NWqNsCA9GwVHrJk59CG 271333
Throwbacks 0 10 1 OutKast spotify:artist:1G9G7WwrXka3Z1r7aIDjI7 Hey Ya! - Radio Mix / Club Mix spotify:track:2PpruBYCo4H7WOBJ7Q2EwM Speakerboxxx/The Love Below spotify:album:1UsmQ3bpJTyK6ygoOOjG1r 235213

🎧 Task 4: Initial Exploration of Track & Playlist Data

This section investigates core statistics of the combined playlist + song characteristics data set.

Code
strip_spotify_prefix <- function(x){
  stringr::str_replace(x, "spotify:track:", "")
}

rectified_data <- rectified_data %>%
  mutate(track_id = strip_spotify_prefix(track_id)) %>%
  filter(!is.na(track_id) & track_id != "")

SONGS <- SONGS %>%
  filter(!is.na(id) & id != "")

joined_data <- inner_join(rectified_data, SONGS, by = c("track_id" = "id"))

🎵 Q1: How many distinct tracks and artists?

Code
distinct_tracks <- joined_data %>% distinct(track_id) %>% nrow()
distinct_artists <- joined_data %>% distinct(artist_id) %>% nrow()

spotify_table(
  tibble(Metric = c("Distinct Tracks", "Distinct Artists"),
         Count = c(distinct_tracks, distinct_artists))
)
Metric Count
Distinct Tracks 50684
Distinct Artists 9609

📝 Analysis: The dataset contains a rich collection of unique tracks and artists, showcasing Spotify’s extensive catalog diversity across user playlists.

🔥 Q2: What are the 5 most common tracks?

Code
top_tracks <- joined_data %>%
  group_by(track_name) %>%
  summarise(Appearances = n(), .groups = "drop") %>%
  arrange(desc(Appearances)) %>%
  slice_head(n = 5)

spotify_table(top_tracks)
track_name Appearances
Champions 27888
No Problem (feat. Lil Wayne & 2 Chainz) 26826
Closer 25742
F**kin' Problems 25136
Sucker For Pain (with Wiz Khalifa, Imagine Dragons, Logic & Ty Dolla $ign feat. X Ambassadors) 25086

📝 Analysis: The most frequently appearing songs offer insight into widely loved and repeat-worthy tracks across millions of playlists.

❓ Q3: Most Popular Track Not in SONGS

Code
missing_tracks <- rectified_data %>%
  filter(!(track_id %in% SONGS$id)) %>%
  group_by(track_name, track_id) %>%
  summarise(count = n(), .groups = "drop") %>%
  arrange(desc(count)) %>%
  slice_head(n = 1)

spotify_table(missing_tracks)
track_name track_id count
One Dance 1xznGGDReH1oQq0xzbwXa3 12094

📝 Analysis: This track, though highly featured on playlists, is not captured in the SONGS dataset, suggesting data lags or catalog discrepancies.

💃 Q4: Most Danceable Track

Code
most_danceable <- SONGS %>% arrange(desc(danceability)) %>% slice_head(n = 1)

danceable_count <- rectified_data %>%
  filter(track_id == most_danceable$id) %>%
  nrow()

spotify_table(most_danceable %>% 
  select(name, artist, danceability, popularity) %>% 
  mutate(`# of Playlists` = danceable_count))
name artist danceability popularity # of Playlists
Funky Cold Medina Tone-Loc 0.988 57 209

📝 Analysis: With high danceability and moderate popularity, this track captures rhythmic excellence while still being somewhat niche.

⏱️ Q5: Playlist with Longest Average Track Duration

Code
longest_avg_playlist <- joined_data %>%
  group_by(playlist_name, playlist_id) %>%
  summarise(avg_duration = mean(duration, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(avg_duration)) %>%
  slice_head(n = 1)

longest_avg_playlist %>%
  mutate(avg_duration_min = round(avg_duration / 60000, 2)) %>%
  select(playlist_name, playlist_id, avg_duration_min) %>%
  spotify_table()
playlist_name playlist_id avg_duration_min
Sleep 611205 68.67

📝 Analysis: This playlist favors longer-form listening experiences—perfect for chill or storytelling-heavy sessions.

⭐ Q6: Most Followed Playlist

Code
most_followed <- joined_data %>%
  select(playlist_id, playlist_name, playlist_followers) %>%
  distinct() %>%
  arrange(desc(playlist_followers)) %>%
  slice_head(n = 1)

spotify_table(most_followed)
playlist_id playlist_name playlist_followers
746359 Breaking Bad 53519

📝 Analysis: High follower count reflects strong user trust and playlist curation quality—these often become global listening staples.

🎧 Task 5: Visually Identifying Characteristics of Popular Songs

We explore audio features to discover what makes songs popular, including trends over time, genre markers, and playlist impact.


📈 Q1: Is Popularity Correlated with Playlist Appearances?

Code
track_popularity <- joined_data %>%
  group_by(track_id, name, popularity) %>%
  summarise(playlist_appearances = n(), .groups = "drop")

ggplot(track_popularity, aes(x = playlist_appearances, y = popularity)) +
  geom_point(alpha = 0.3, color = "#1DB954") +
  geom_smooth(method = "lm", se = FALSE, color = "white") +
  labs(
    title = "Popularity vs Playlist Appearances",
    x = "Playlist Appearances",
    y = "Popularity"
  ) +
  theme_spotify()

📊 Analysis: Popularity vs Playlist Appearances

While there’s a general trend that more playlist appearances boost popularity, the effect flattens at the top — even tracks in 20K+ playlists rarely reach max popularity. Many mid-popularity songs appear in far fewer playlists, suggesting other drivers like artist fame or viral trends. A few standout hits dominate both metrics, but overall, exposure alone doesn’t guarantee peak popularity. This reveals a diminishing return effect beyond a certain playlist count.

📅 Q2: When Were Popular Songs Released?

Code
joined_data %>%
  filter(popularity >= 70, !is.na(year)) %>%
  count(year) %>%
  ggplot(aes(x = year, y = n)) +
  geom_col(fill = "#1DB954") +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Release Year of Popular Songs", x = "Year", y = "Count") +
  theme_spotify()

####📊 Analysis: Release Year of Popular Songs Most popular songs in the dataset were released post-2010, with an explosive surge after 2015. This spike likely reflects both Spotify’s growth and a preference bias in playlist curation toward newer tracks. Songs from earlier decades exist but are underrepresented — possibly due to lower streaming metadata or user nostalgia filters. The sharp rise suggests that recency plays a major role in determining which songs become popular on modern playlists.

💃 Q3: When Did Danceability Peak?

Code
joined_data %>%
  group_by(year) %>%
  summarise(avg_danceability = mean(danceability, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = avg_danceability)) +
  geom_line(color = "#F1C40F", linewidth = 1.2) +
  labs(title = "Danceability Over Time", x = "Year", y = "Average Danceability") +
  theme_spotify()

🎶 Analysis: Danceability Over Time

Danceability levels show considerable fluctuation before the 1950s, likely due to sparse data and inconsistent genre tracking. From the 1970s onward, there’s a noticeable and steady increase in average danceability, suggesting a shift in musical production toward rhythm-centric, movement-friendly tracks. This trend accelerates post-2000, aligning with the rise of pop, hip-hop, and electronic genres that dominate modern playlists. Overall, the data reflects how music has evolved to favor groove and energy.

📀 Q4: Most Represented Decade

Code
joined_data %>%
  mutate(decade = (year %/% 10) * 10) %>%
  count(decade) %>%
  ggplot(aes(x = as.factor(decade), y = n)) +
  geom_col(fill = "#3498DB") +
  scale_y_continuous(labels = label_comma()) +
  labs(title = "Songs by Decade", x = "Decade", y = "Number of Tracks") +
  theme_spotify()

📊 Analysis: Songs by Decade

The number of tracks released per decade has exploded in the digital era. While growth remained modest from the 1950s through the 1990s, the 2000s saw a sharp climb—likely due to the rise of digital recording and online distribution. The 2010s alone account for over 6 million tracks, highlighting how accessible music production and publishing have become. This reinforces the modern trend of music abundance and democratized creation.

🎹 Q5: Key Frequency (Polar Plot)

Code
joined_data %>%
  count(key) %>%
  mutate(key = as.factor(key)) %>%
  ggplot(aes(x = key, y = n)) +
  geom_col(fill = "#8E44AD") +
  coord_polar() +
  labs(title = "Distribution of Musical Keys", x = "Key", y = "Count") +
  theme_spotify()

🎼 Analysis: Distribution of Musical Keys

This polar plot shows the frequency of tracks in each musical key (0–11), where each number corresponds to a semitone in the chromatic scale (e.g., 0 = C, 1 = C♯/D♭, … 11 = B). Keys like C major (0) and G♯/A♭ (8) appear to be the most common, likely due to their favorable sound and playability. Meanwhile, less common keys like F♯ (6) and B♭ (10) are underrepresented. This trend may reflect production preferences in pop and hip-hop, where easier or more resonant keys dominate.

⏱️ Q6: Most Common Track Lengths

Code
joined_data %>%
  mutate(duration_min = duration / 60000) %>%
  filter(duration_min <= 10) %>%  # 🎯 Limit x-axis to songs ≤ 10 minutes
  ggplot(aes(x = duration_min)) +
  geom_histogram(binwidth = 0.25, fill = "#E67E22", color = "black") +
  scale_y_continuous(labels = scales::label_comma()) +
  labs(
    title = "Track Duration Distribution",
    x = "Duration (minutes)",
    y = "Count"
  ) +
  theme_spotify()

Analysis: ⏱️ Track Duration Distribution

Most songs cluster between 2.5 to 4.5 minutes, which aligns with the standard radio-friendly length. The distribution is tightly packed, and tracks beyond 6 minutes are rare. Outliers likely include remixes, intros, or live recordings. This confirms that shorter durations remain the norm for high engagement and replayability on platforms like Spotify.

🎼 Q7: Tempo vs Danceability (Popular Songs)

Code
popular_songs <- joined_data %>% 
  filter(popularity >= 70)

cor_val <- cor(popular_songs$tempo, popular_songs$danceability, use = "complete.obs")

ggplot(popular_songs, aes(x = tempo, y = danceability)) +
  geom_point(alpha = 0.4, color = "#1DB954") +
  geom_smooth(method = "lm", se = TRUE, color = "white") +
  labs(
    title = "Tempo vs Danceability (Popular Songs)",
    subtitle = paste0("Correlation: ", round(cor_val, 2)),
    x = "Tempo (BPM)",
    y = "Danceability"
  ) +
  theme_spotify()

🕺 Analysis: Tempo vs Danceability

The scatterplot reveals a slight negative correlation (r = -0.15) between tempo and danceability among popular songs. Contrary to what one might expect, faster tempos do not necessarily lead to higher danceability. Many highly danceable tracks fall in the 90–120 BPM range, suggesting that groove and rhythm matter more than speed. Extremely fast or slow songs often sacrifice the steady beat that encourages dancing.

📊 Q8: Playlist Followers vs Avg. Popularity

Code
followers_vs_popularity <- joined_data %>%
  group_by(playlist_id, playlist_name, playlist_followers) %>%
  summarise(avg_popularity = mean(popularity, na.rm = TRUE), .groups = "drop")

cor_val <- cor(log1p(followers_vs_popularity$playlist_followers), 
               followers_vs_popularity$avg_popularity, use = "complete.obs")

ggplot(followers_vs_popularity, aes(x = playlist_followers, y = avg_popularity)) +
  geom_point(alpha = 0.2, size = 1.2, color = "#1DB954") +
  geom_smooth(method = "lm", se = TRUE, color = "white") +
  scale_x_log10() +
  labs(
    title = "Followers vs. Avg. Popularity",
    subtitle = paste0("Correlation: ", round(cor_val, 2)),
    x = "Followers (log scale)",
    y = "Average Popularity"
  ) +
  theme_spotify()

📉 Analyze: Followers vs. Average Popularity

Despite the wide range of follower counts (on a log scale), there’s almost no correlation between how many followers a playlist has and how popular its songs are (correlation = -0.01).
This suggests that playlist influence doesn’t directly boost track popularity, or that popular songs are just as likely to appear in smaller playlists.
The dense vertical lines at low follower counts show a long tail of smaller, niche playlists contributing to the ecosystem.

🔍 Task 6: Finding Related Songs

We now build a playlist around two anchor tracks — Drop The World and No Role Modelz — using five custom heuristics to find compatible songs across tempo, mood, popularity, and year.


🎵 Identify Anchor Tracks

Code
anchor_names <- c("Drop The World", "No Role Modelz")
popular_threshold <- 70

anchor_tracks <- joined_data %>%
  filter(track_name %in% anchor_names)

cat("🎵 Anchor Songs Found:", nrow(anchor_tracks), "\n")
🎵 Anchor Songs Found: 11902 

🎬 Anchor Tracks – YouTube Preview

These tracks defined the tone of Hustle & Heart. Watch their official drops below. 👇

Drop the world- By Lil Wayne and eminem

No role modelz- J.Cole

🎧 Heuristic 1: Co-occurring Songs in a Random Playlist

Code
both_anchors_playlists <- joined_data %>%
  filter(track_name %in% anchor_names) %>%
  group_by(playlist_id) %>%
  summarise(anchor_count = n()) %>%
  filter(anchor_count >= 2) %>%
  pull(playlist_id)

set.seed(1010)
chosen_id <- sample(both_anchors_playlists, 1)

co_occurring <- joined_data %>%
  filter(playlist_id == chosen_id, !(track_name %in% anchor_names)) %>%
  distinct(track_id, .keep_all = TRUE)

cat("🎧 Heuristic 1 - Playlist", chosen_id, "→", nrow(co_occurring), "tracks found\n")
🎧 Heuristic 1 - Playlist 974361 → 97 tracks found

🎧 Heuristic 1 applied to Playlist 974361 yielded 97 closely related track candidates based on shared playlist co-occurrence.

🎚️ Heuristic 2: Similar Tempo & Key

Code
tempo_key_match <- joined_data %>%
  filter(
    key %in% anchor_tracks$key,
    abs(tempo - mean(anchor_tracks$tempo, na.rm = TRUE)) <= 5,
    !(track_name %in% anchor_names)
  ) %>%
  distinct(track_id, .keep_all = TRUE)

cat("🎚️ Heuristic 2 - Tempo/Key:", nrow(tempo_key_match), "matches\n")
🎚️ Heuristic 2 - Tempo/Key: 829 matches

These tracks are musically smooth transitions for DJs.

🧑‍🎤 Heuristic 3: Same Artist

Code
same_artist <- joined_data %>%
  filter(artist_name %in% anchor_tracks$artist_name, !(track_name %in% anchor_names)) %>%
  distinct(track_id, .keep_all = TRUE)

cat("🧑‍🎤 Heuristic 3 - Same Artist:", nrow(same_artist), "matches\n")
🧑‍🎤 Heuristic 3 - Same Artist: 92 matches

Curating songs from Eminem, J. Cole, or Lil Wayne’s discographies.

🎛️ Heuristic 4: Acoustic / Energy Profile Match

Code
anchor_year <- unique(anchor_tracks$year)

acoustic_features <- joined_data %>%
  filter(year %in% anchor_year, !(track_name %in% anchor_names)) %>%
  mutate(sim_score = abs(danceability - mean(anchor_tracks$danceability, na.rm = TRUE)) +
           abs(energy - mean(anchor_tracks$energy, na.rm = TRUE)) +
           abs(acousticness - mean(anchor_tracks$acousticness, na.rm = TRUE))) %>%
  arrange(sim_score) %>%
  distinct(track_id, .keep_all = TRUE) %>%
  slice_head(n = 20)

cat("🎛️ Heuristic 4 - Acoustic Profile:", nrow(acoustic_features), "best matches\n")
🎛️ Heuristic 4 - Acoustic Profile: 20 best matches

Tunes that “feel” similar to our anchors in vibe and intensity.

🎚️ Heuristic 5: Valence & Loudness

Code
valence_match <- joined_data %>%
  filter(
    abs(valence - mean(anchor_tracks$valence, na.rm = TRUE)) < 0.1,
    abs(loudness - mean(anchor_tracks$loudness, na.rm = TRUE)) < 2,
    !(track_name %in% anchor_names)
  ) %>%
  distinct(track_id, .keep_all = TRUE)

cat("🎚️ Heuristic 5 - Valence + Loudness:", nrow(valence_match), "\n")
🎚️ Heuristic 5 - Valence + Loudness: 4239 

For emotional and volume consistency in listening flow.

🎼 Combine Playlist Candidates

Code
final_playlist <- bind_rows(
  co_occurring,
  tempo_key_match,
  same_artist,
  acoustic_features,
  valence_match
) %>%
  distinct(track_id, .keep_all = TRUE) %>%
  mutate(popular = popularity >= popular_threshold)

cat("🎼 Final Playlist Candidates:", nrow(final_playlist), "\n")
🎼 Final Playlist Candidates: 5157 
Code
cat("📉 Non-popular (<", popular_threshold, "):", sum(!final_playlist$popular), "\n")
📉 Non-popular (< 70 ): 4918 

📋 Preview of Final Playlist Candidates

Code
final_playlist %>%
  select(track_name, artist_name, popularity, playlist_name) %>%
  distinct() %>%
  slice_head(n = 20) %>%
  spotify_table("🎧 Top 20 Playlist Candidates Based on 5 Heuristics")
🎧 Top 20 Playlist Candidates Based on 5 Heuristics
track_name artist_name popularity playlist_name
Ignition - Remix R. Kelly 70 throwback
Sure Thing Miguel 74 throwback
Power Trip J. Cole 72 throwback
Whatever You Like T.I. 74 throwback
Crooked Smile J. Cole 69 throwback
So Good B.o.B 65 throwback
Rich As Fuck Lil Wayne 62 throwback
Young, Wild & Free (feat. Bruno Mars) - feat. Bruno Mars Snoop Dogg 65 throwback
Strange Clouds (feat. Lil Wayne) - feat. Lil Wayne B.o.B 60 throwback
The Motto Drake 72 throwback
Battle Scars Lupe Fiasco 70 throwback
The Show Goes On Lupe Fiasco 71 throwback
Mercy Kanye West 71 throwback
Satellites Kevin Gates 46 throwback
Love Me Lil Wayne 66 throwback
No Hands (feat. Roscoe Dash and Wale) - Explicit Album Version Waka Flocka Flame 75 throwback
Lollipop Lil Wayne 70 throwback
Rock Your Body Justin Timberlake 71 throwback
Beautiful Girls Sean Kingston 78 throwback
A Milli Lil Wayne 72 throwback

🎧 Task 7: Curate and Analyze Your Ultimate Playlist – “Hustle & Heart”

Twelve tracks. One vibe. Built from raw energy, emotional drive, and underdog spirit. Featuring rap heavyweights, slept-on gems, and genre-bending transitions, “Hustle & Heart” was crafted using 5 analytical heuristics and a whole lot of gut.

🎶 Evolution of Audio Features in ‘Hustle & Heart’ Playlist

Hustle and Heart 🎧

🧠 Note: While most tracks in Hustle & Heart were selected using a data-driven similarity score, two foundational songs — “Drop the World” and “No Role Modelz” — were manually included as thematic anchors due to their lyrical intensity and motivational energy as they were included in data but was dropped down during popularity ranking.

Click ▶️ and enjoy the full curated soundtrack — no skips, no scrolls. 🔥