Getting Started with the Spotify API using “SpotiPy”: Metadata

J. Garrecht Metzger Ph.D.
5 min readMar 26, 2021

Preface: This blog post is part of a series of posts about work I did during my time as in the 12-week Metis Data Science Bootcamp in Winter 2021. This post focuses on gathering track metadata for import into pandas. It will be updated for clarity as feedback is gathered and as my project expands in scope. In the future, I will link articles related to the construction of an audio content-based Spotify song recommender.

Hardware Used: MacBook Pro (Late 2016)

Software Used: Google Chrome (Version 89.0), Jupyter Labs (Version 3.0.7), Spotify Desktop App (Version 1.1.54), Spotify Web App, Google Drive, Google Colab

The audio streaming giant, Spotify, boasts a library of >70 million tracks [1]. Accessing this data in a meaningful way requires navigating an API (application programming interface), which many data scientists find a difficult task as each company has their own data storage structure, naming conventions, and response schema for their respective API. For those newer to programming who are working with Python as their first language (like me), working with a data format like JSON (javascript object notation) can be intimidating as it is not immediately clear how to parse JSON using Python. In this article I will provide a brief overview of how to use SpotiPy, “a lightweight Python library for the Spotify Web API” [2], to start getting data fast. I have included as many screenshots, links, and code blocks to get you over the hurdles I struggled with getting my Spotify content-recommender built.

1) Setup

Install SpotiPy [2]

pip install spotipy

or upgrade it:

pip install spotipy --upgrade

Choose your preferred Python IDE as any one should work for smaller projects. I used Jupyter Labs with local computing for the exploratory part of my project. Later, when I was building a song recommender, more RAM was required so I used the cloud computing services on Google Colab.

2) Create an App

  • Head to https://developer.spotify.com/dashboard/ and log in to Spotify using whatever Spotify account you prefer. Note: I used a premium account (read: paid) and cannot comment on accessibility and usability with a free account.
  • Select “Create an App”
  • Record and store your “Client ID” and “Client Secret” in a secure location. These will be required for accessing the API. Do not share your “Client ID and Client Secret” to avoid getting your account hacked. You can reset the “Client Secret” if you need to, but it is not advised unless necessary (see screenshot below).
Your “Client ID” will be immediately viewable. Click on “SHOW CLIENT SECRET” to reveal your “Client Secret”.
ONLY Reset the “Client Secret” if necessary.

3) Add Redirect URL(s)

  • Click on “Edit Settings”. Add ≥ 1 Redirect URLs. Depending on your IDE this can be a local address (e.g., “http://localhost:8990/callback/”) or a website “http://<yourpersonalwebsite.com>/”). Notice the “/” at the end of the two addresses. Some users reported issues when that “/” was not added at the end. For Jupyter Lab I was able to use a local host but for Google Colab I was only successful when using a public website. I didn’t try it myself, but other have reported success using websites like “http://.google.com”.
  • Add it and click outside the box window to save it.
  • Optional: Add a website that links to your project such as a GitHub profile or a personal website.

4) Get authorization

Certain actions require authorization from Spotify. It is recommended for long-running applications. Queries without authorization are read-only. The following is a code snippet I used for authorization:

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id="<YOUROWNUSERID>",client_secret="<YOUROWNCLIENTSECRET>",redirect_uri="<YOUROWNRESPONSEURL",scope="user-library-read"))

5) Query Results

Query response data are returned in JSON format. A example of a general search query is:

results = sp.search(q='<THINGYOUWANT>', limit=X)

where “X” is the specified result limit. Below is an example of a query results for “congratulations”. Note that the “available_markets” field will be long so you may want to restrict your results to just a handful at first. I like to use 5 as my starting point.

Query response data for query=’congratulations’ (Part 1/2)
Query response data for query=’congratulations’ (Part 2/2)

6) Parse metadata JSON

The metadata JSON can be a bit cumbersome and intimidating upon first glance, or as I initially put it, look like “nonsense soup”. So, I wrote a function to parse the JSON data into a format ready for a pandas dataframe. The function “tr_md()” takes a track and returns 22 different pieces of metadata.

Code can be found here:

<script src=”https://gist.github.com/JGarrechtMetzger/21b04767a2b5624dca2629ae8d894355.js"></script>

<script src=”https://gist.github.com/JGarrechtMetzger/21b04767a2b5624dca2629ae8d894355.js"></script>
Metadata JSON parsed using the “tr_md()” function

7) Response Data Types

There are many different pieces of information that can be obtained from the Spotify API. The official documentation lists all the possibilities at https://developer.spotify.com/documentation/web-api/reference/#objects-index. I found the TrackObject keys the most helpful as a single group, but your use case will will determine which keys are most appropriate. Use the metadata parser function ‘tr_md()’ as a starting point when trying to deal with the JSON data. Playing around with the bracketing structure [‘example’]seen in the “tr_md()” is how I navigated the JSON when building the function.

8) Summary

Hopefully this short introduction to allowed you to connect to the Spotify API and start exploring. There are innumerable directions a Spotify-based project can take and this is just a tiny example of some of the information that can be obtained with a fairly small amount of code.

Resources and Upcoming Posts

  • Pagination for returning lots of results in a single query.
  • A function to return audio feature data and parse the JSON.
  • Functions for getting track data from playlists, artists, albums, and Spotify users.
  • Constructing large libraries of songs efficiently and quickly.
  • A discussions or errors and problems I encountered along the way
  • A link to -and discussion of- my GitHub project page for my Spotify content-based song recommender.

Lastly, please posts your comments, questions, and tips! This is the first in a series of posts on how to build an audio content-based Spotify recommender. It is by no means exhaustive in any way and I’m sure I’ve overlooked some key issues I had when I first began working with the API. Happy fun exploring the song data!

References:

Date format:(DD/MM/YYYY)

[1] https://newsroom.spotify.com/company-info/, accessed on 03/25/2021

[2] https://github.com/plamere/spotipy, accessed on 03/25/2021

[3] https://spotipy.readthedocs.io/en/2.17.1/, accessed on 03/25/2021

[4] https://developer.spotify.com/documentation/web-api/reference/#objects-index, accessed on 3/25/2021

--

--

J. Garrecht Metzger Ph.D.

PhD Geologist/Geochemist-turned Data Analyst & Scientist. Electronics hobbyist.