Published Jul 25, 2022
It is often the case that it’s hard to find the exact data you need to do analysis. This article is going to help you scrape all the songs’ information and features in each of the users’ playlist. Here we go!
STEP 1: CREATE AN APP
Go to SpotifyforDeveloper “DASHBOARD” tab: https://developer.spotify.com/dashboard/login
After logging in, you will see this page. Then click “CREATE AN APP” and name it whatever you like. For example I named it as “recommendation system”.
Take record of your “Client ID” and “Client Secret”, you will need it when scrape your data.
STEP 2: GET A TOKEN
Go to this website for a valid token, unfortunately, the token will expire every 1 hour. So you will need to get a new one if you need a token for more than 1 hour.
Specifically, directly scroll down to the bottom of the page, click the green “GET TOKEN” button, do not need to check any scopes in the next step, just directly click “REQUEST TOKEN”. At last, copy and save the token.
STEP 3: LIST THE USERS’ NAME/ID YOU WANT TO SCRAPE
And save it in the following format in a .csv file:
Good job! You are all set for scraping!
STEP 4: DOWNLOAD THE NOTEBOOK LINKED BELOW AND RUN
I recommend using Google Colab :)
Set up:
%%capture output
!pip install spotipy
import pandas as pd
import numpy as np
import os
from google.colab import drive
import spotipy
from spotipy.oauth2 importSpotifyClientCredentials
drive.mount('/content/drive')os.chdir("drive/MyDrive/YOUR PATH/")#change it to your path#NOTE: !cd does not work in colab
# SET UP YOUR INFORMATION
token = "ADD YOUR REQUESTED TOKEN HERE"
client_credentials_manager =
SpotifyClientCredentials(client_id="ADD YOUR CLIENT_ID HERE", client_secret="ADD YOUR CLIENT_SECRET HERE")
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager,auth = token)
# USERNAMES YOU WANT TO SCRAPE, SAVED IN ONE COLUMN CALLED "Usernames" IN A .CSV
user = pd.read_csv('YOUR FILE NAME.csv')
user_list = list(user.Usernames)
columns = ['username','playlist_uri','track_uri','track_name','artist_uri','artist_name','album','track_pop','acousticness','danceability','duration_ms','energy','instrumentalness','key','liveness','loudness','mode','speechiness','tempo','time_signature','valence']
dataset = pd.DataFrame(columns=columns)
Main function:
The whole notebook can be found here:
ScrapeSpotifyInfocolab.research.google.com
Finally, you can get a lot of data as below. Cheers!!!
Of course, you can add other features written in the API’s docs.
Thank you for reading this article, if you find it helpful, please CLAP ^_^!
HELPFUL RESOURCES:
Welcome to Spotipy! - spotipy 2.0 documentationEdit descriptionspotipy.readthedocs.io