ClimbWorm

MS in Statistics / UIUC

Home

An easy way to scrape Spotify data ON YOUR OWN

Published Jul 25, 2022

It is often the case that it’s hard to find the exact data you need to do analysis. This article is going to help you scrape all the songs’ information and features in each of the users’ playlist. Here we go!

STEP 1: CREATE AN APP

Go to SpotifyforDeveloper “DASHBOARD” tab: https://developer.spotify.com/dashboard/login

After logging in, you will see this page. Then click “CREATE AN APP” and name it whatever you like. For example I named it as “recommendation system”.

img

img

Take record of your “Client ID” and “Client Secret”, you will need it when scrape your data.

STEP 2: GET A TOKEN

Go to this website for a valid token, unfortunately, the token will expire every 1 hour. So you will need to get a new one if you need a token for more than 1 hour.

Specifically, directly scroll down to the bottom of the page, click the green “GET TOKEN” button, do not need to check any scopes in the next step, just directly click “REQUEST TOKEN”. At last, copy and save the token.

img

STEP 3: LIST THE USERS’ NAME/ID YOU WANT TO SCRAPE

img

And save it in the following format in a .csv file:

img

Good job! You are all set for scraping!

STEP 4: DOWNLOAD THE NOTEBOOK LINKED BELOW AND RUN

I recommend using Google Colab :)

Set up:

%%capture output
!pip install spotipy
import pandas as pd
import numpy as np
import os
from google.colab import drive
import spotipy
from spotipy.oauth2 importSpotifyClientCredentials
drive.mount('/content/drive')os.chdir("drive/MyDrive/YOUR PATH/")#change it to your path#NOTE: !cd does not work in colab
# SET UP YOUR INFORMATION
token = "ADD YOUR REQUESTED TOKEN HERE"

client_credentials_manager = 
SpotifyClientCredentials(client_id="ADD YOUR CLIENT_ID HERE", client_secret="ADD YOUR CLIENT_SECRET HERE")

sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager,auth = token)

# USERNAMES YOU WANT TO SCRAPE, SAVED IN ONE COLUMN CALLED "Usernames" IN A .CSV
user = pd.read_csv('YOUR FILE NAME.csv')
user_list = list(user.Usernames)
columns = ['username','playlist_uri','track_uri','track_name','artist_uri','artist_name','album','track_pop','acousticness','danceability','duration_ms','energy','instrumentalness','key','liveness','loudness','mode','speechiness','tempo','time_signature','valence']
dataset = pd.DataFrame(columns=columns)

Main function:

img

The whole notebook can be found here:

ScrapeSpotifyInfocolab.research.google.com

Finally, you can get a lot of data as below. Cheers!!!

img

Of course, you can add other features written in the API’s docs.

Thank you for reading this article, if you find it helpful, please CLAP ^_^!

HELPFUL RESOURCES:

Welcome to Spotipy! - spotipy 2.0 documentationEdit descriptionspotipy.readthedocs.io

Spotify for DevelopersMusic