Music Recommendation System¶

Milestone 1¶

Problem Definition¶

The context: Why is this problem important to solve?

Music is important. I know someone whom music has literally saved their life when they were in a very dark time and hearing 'Astral Weeks' by Van Morrison for the first time made them realise the beauty that can exist in the world and that maybe it was worth sticking around for.

Music is spiritual, psychological, emotional, inspiring, evocative and it can express what we are feeling when we are unable to. It is not a product to be consumed, but it is a necessity to our soul. Finding the music that speaks to you is so important. And being an artist that can reach your own niche listener is so special and vital to be able to keep new music thriving. Gone are the days where your friend makes you a mix tape or you record the top 40 from the Sunday night chart on the radio! And with so many of the smaller (and larger!) music venues closed down that once saw a thriving live music scene which was essential to discover new bands and artists in your local town or city, the streaming apps are the new way of finding your music and for the artists to find their fans.

The objectives: The intended goal is to explore this dataset and come up with ideas on the best ways to give song receommendations to a user.

The key questions: The key questions include: what should a list of recommended songs look like? Should those top ten be all based on the same idea? Or should there be a combination of ideas that make up a perfect list of recommended songs?

The problem formulation: We can use data science to help find some of these recommendations. Using large datasets of users listening to songs, we can find users who are simliar to other users. For example, a top song of User A will be a great recommendation for User B who shares very simliar music taste.
We can look for hidden variables that affect who likes to listen to what - things only the maths can find that we can't though we might be able to make sense of once the maths finds them! We can use data science to create all sorts of varied models that will hopefully give us a great and varied list of recommended songs that a user will not only be happy to listen to, but that might just change, or even save their life.

Data Dictionary¶

The core data is the Taste Profile Subset released by the Echo Nest as part of the Million Song Dataset. There are two files in this dataset. The first file contains the details about the song id, titles, release, artist name, and the year of release. The second file contains the user id, song id, and the play count of users.

song_data

song_id - A unique id given to every song

title - Title of the song

Release - Name of the released album

Artist_name - Name of the artist

year - Year of release

count_data

user _id - A unique id given to the user

song_id - A unique id given to the song

play_count - Number of times the song was played

Data Source¶

http://millionsongdataset.com/

Important Notes¶

This notebook can be considered a guide to refer to while solving the problem. The evaluation will be as per the Rubric shared for each Milestone. Unlike previous courses, it does not follow the pattern of the graded questions in different sections. This notebook would give you a direction on what steps need to be taken to get a feasible solution to the problem. Please note that this is just one way of doing this. There can be other 'creative' ways to solve the problem, and we encourage you to feel free and explore them as an 'optional' exercise.
In the notebook, there are markdown cells called Observations and Insights. It is a good practice to provide observations and extract insights from the outputs.
The naming convention for different variables can vary. Please consider the code provided in this notebook as a sample code.
All the outputs in the notebook are just for reference and can be different if you follow a different approach.
There are sections called Think About It in the notebook that will help you get a better understanding of the reasoning behind a particular technique/step. Interested learners can take alternative approaches if they want to explore different techniques.

Importing Libraries and the Dataset¶

# Mounting the drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

# Used to ignore the warning given as output of the code
import warnings
warnings.filterwarnings('ignore')

# Basic libraries of python for numeric and dataframe computations
import numpy as np
import pandas as pd

# Basic library for data visualization
import matplotlib.pyplot as plt

# Slightly advanced library for data visualization
import seaborn as sns

# To compute the cosine similarity between two vectors
from sklearn.metrics.pairwise import cosine_similarity

# A dictionary output that does not raise a key error
from collections import defaultdict

# A performance metrics in sklearn
from sklearn.metrics import mean_squared_error

Load the dataset¶

# Importing the datasets
count_df = pd.read_csv('/content/drive/MyDrive/count_data.csv')
song_df = pd.read_csv('/content/drive/MyDrive/song_data.csv')

Understanding the data by viewing a few observations¶

# See top 10 records of count_df data
count_df.head(10)

# See top 10 records of song_df data
song_df.head(10)

Let us check the data types and and missing values of each column¶

# See the info of the count_df 

count_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000000 entries, 0 to 1999999
Data columns (total 4 columns):
 #   Column      Dtype 
---  ------      ----- 
 0   Unnamed: 0  int64 
 1   user_id     object
 2   song_id     object
 3   play_count  int64 
dtypes: int64(2), object(2)
memory usage: 61.0+ MB

# We will look at the shape of the count_df

count_df.shape

(2000000, 4)

# Checking for missing values in the count_df

print(count_df.isna().sum())

Unnamed: 0    0
user_id       0
song_id       0
play_count    0
dtype: int64

# See the info of the song_df data

song_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 5 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   song_id      1000000 non-null  object
 1   title        999985 non-null   object
 2   release      999995 non-null   object
 3   artist_name  1000000 non-null  object
 4   year         1000000 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 38.1+ MB

# We will look at the shape of the song_df

song_df.shape

(1000000, 5)

# Checking for missing values in song_df

print(song_df.isna().sum())

song_id         0
title          15
release         5
artist_name     0
year            0
dtype: int64

# Let's see how many unique values there are in each column of the count_df

count_df.nunique()

Unnamed: 0    2000000
user_id         76353
song_id         10000
play_count        295
dtype: int64

#Let's see how many unique values there are in each column of the song_df

song_df.nunique()

song_id        999056
title          702428
release        149288
artist_name     72665
year               90
dtype: int64

# Let's check to see how many duplicates there are in the song_df

song_df.duplicated().sum()

498

# Let's check to see how many duplicates there are in the count_df

count_df.duplicated().sum()

0

In the COUNT dataset we can see that the Unamed index and the Play Count columns are integer data types. The Song and User ID's are object types as they are encrypted - we should probably convert these to numerical ID's later to be able to better work with them.

There are no duplicates and no missing values in the Count dataset.

In the SONG dataset we can see that all of the columns are object types because they are names or encrypted user ID's apart from the Release column as this is an integer as these values are the year that the song was released.

There are 498 duplicates in the Song dataset and 20 missing values.

I would propose to drop the duplicates and the missing value rows from this dataset. And that we should convert the User and Song IDs to numerical values.

# Dropping the missing value rows from the dataset

song_df = song_df.dropna()

# Checking all columns now have 999980 entries after dropping the missing values

song_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999980 entries, 0 to 999999
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   song_id      999980 non-null  object
 1   title        999980 non-null  object
 2   release      999980 non-null  object
 3   artist_name  999980 non-null  object
 4   year         999980 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 45.8+ MB

# Checking there are no longer any missing values

song_df.isna().sum()

song_id        0
title          0
release        0
artist_name    0
year           0
dtype: int64

# Dropping the duplicates in the song_df

song_df.drop_duplicates(inplace=True)

# Checking the values are now all (999980 - 498) = 999,482 after dropping the duplicates

song_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 999482 entries, 0 to 999999
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   song_id      999482 non-null  object
 1   title        999482 non-null  object
 2   release      999482 non-null  object
 3   artist_name  999482 non-null  object
 4   year         999482 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 45.8+ MB

Observations and Insights: So we now have the Song dataset without the 498 duplicates and without the 20 missing value rows, with each column all having 999482 entries.

The Song dataset gives us more information about the song - connecting the song_ID with the Title, Artist, Album (release) and Year of release of the song. Whereas the Count dataset shows us which users have played which songs and how many times. It would be nice to merge these two datasets into one so we can use the extra song information for our recommendation models later.

We should also look at encoding the User and Song IDs into numerical values so we can more easily work with them inside our models that we will create.

The Column 'Unamaned: 0' is not of any use to us so we will drop it.

# Left merge the count_df and song_df data on "song_id". Drop duplicates from song_df data simultaneously
merged_df = pd.merge(count_df, song_df.drop_duplicates(['song_id']), on="song_id", how="left")

# Drop the column 'Unnamed: 0'
merged_df = merged_df.drop(columns =['Unnamed: 0'])

# Checking the shape of the merged_df

merged_df.shape

(2000000, 7)

# Let's see what the merged dataframe looks like by displaying the first 10 rows

merged_df.head(10)

Think About It: As the user_id and song_id are encrypted. Can they be encoded to numeric features?

# Apply label encoding for "user_id" and "song_id"
# Label Encoding
from sklearn.preprocessing import LabelEncoder  
le = LabelEncoder()

# Fit transform the user_id column

le.fit(merged_df['user_id'])
merged_df['user_id'] = le.transform(merged_df['user_id'])

# Fit transform the song_id column

le.fit(merged_df['song_id'])
merged_df['song_id'] = le.transform(merged_df['song_id'])

# Let's check our new user_id and song_id labels by loading up the first 10 rows

merged_df.head(10)

Think About It: As the data also contains users who have listened to very few songs and vice versa, is it required to filter the data so that it contains users who have listened to a good count of songs and vice versa?

# Get the column containing the users
users = merged_df.user_id

# Create a dictionary from users to their number of songs
ratings_count = dict()

for user in users:
    # If we already have the user, just add 1 to their rating count
    if user in ratings_count:
        ratings_count[user] += 1
    
    # Otherwise, set their rating count to 1
    else:
        ratings_count[user] = 1

# We want our users to have listened at least 90 songs
RATINGS_CUTOFF = 90

# Create a list of users who need to be removed
remove_users = []

for user, num_ratings in ratings_count.items():
    
    if num_ratings < RATINGS_CUTOFF:
        remove_users.append(user)

df = merged_df.loc[ ~ merged_df.user_id.isin(remove_users)]

# Get the column containing the songs
songs = df.song_id

# Create a dictionary from songs to their number of users
ratings_count = dict()

for song in songs:
    # If we already have the song, just add 1 to their rating count
    if song in ratings_count:
        ratings_count[song] += 1
    
    # Otherwise, set their rating count to 1
    else:
        ratings_count[song] = 1

# We want our song to be listened by atleast 120 users to be considred
RATINGS_CUTOFF = 120

remove_songs = []

for song, num_ratings in ratings_count.items():
    if num_ratings < RATINGS_CUTOFF:
        remove_songs.append(song)

df_final = df.loc[ ~ df.song_id.isin(remove_songs)]

# Drop records with play_count more than(>) 5 
df_final = df_final[df_final['play_count'] <= 5]

# Check the shape of the data
df_final.shape

(117876, 7)

# Let's check that this final_df looks right by loading up the first 10 rows

df_final.head(10)

Exploratory Data Analysis¶

Let's check the total number of unique users, songs, artists in the data¶

Total number of unique user id

# Display total number of unique user_id
df_final['user_id'].nunique()

3155

Total number of unique song id

# Display total number of unique song_id
df_final['song_id'].nunique()

563

# Display total number of unique song titles
df_final['title'].nunique()

561

Total number of unique artists

# Display total number of unique artists
df_final['artist_name'].nunique()

232

Observations and Insights: We now have a much smaller and slicker final dataset to work with.

Our user and song IDs are nice, shorter numeric values. We only have songs that have been listened to by at least 120 users and users that have listened to at least 90 songs.

We have also dropped incidents of a song being played by a user more than 5 times. This seems counter-intuitive for building our recommendation models but it seems that there is a very low incidence of users who play songs more than 5 times and so it would make the user to song matrix very big but sparse if we included these rarer events.

We can see in the end we have

3,155 unique users
563 unique song IDs
561 unique song titles
232 unique artists

This tells us that there are two songs that share the same title. This is not many, but we should be careful to use the song_id in the models and not just the title in case we happen to use one that is shared by another song.

We can see also that there must be artists with multiple songs as there are more than double the number of songs as artists.

# Let's see what the distribution of play counts looks like

plt.figure(figsize = (12, 4))
sns.countplot(x="play_count", data=df_final)

plt.tick_params(labelsize = 10)
plt.title("Distribution of Play Count ", fontsize = 10)
plt.xlabel("Plays", fontsize = 10)
plt.ylabel("Occurence of No. of Plays", fontsize = 10)
plt.show()

Observations It is interesting to see that by far a play count of 1 is the most frequent and that for each count higher, the frequency is lower. As playing a song once is not particularly a great indication of someone liking a song, we will have to hope we still have enough data of the higher play counts when we create our recommendation models later.

Let's find out about the most interacted songs and interacted users¶

Most interacted songs

# Display the top 10 songs that have been listened to by the most users

most_played = df_final.groupby(['song_id','title']).size().sort_values(ascending = False)[:10]
most_played

song_id  title                         
8582     Use Somebody                      751
352      Dog Days Are Over (Radio Edit)    748
2220     Sehr kosmisch                     713
1118     Clocks                            662
4152     The Scientist                     652
5531     Secrets                           618
4448     Fireflies                         609
6189     Creep (Explicit)                  606
6293     Yellow                            583
1334     Hey_ Soul Sister                  570
dtype: int64

Most interacted users

# Display the top 10 users who have listened to the most songs

#df_final['user_id'].value_counts().head(10)
top_user = df_final.groupby(['user_id']).size().sort_values(ascending = False)[:10]
top_user

user_id
61472    243
15733    227
37049    202
9570     184
23337    177
10763    176
9097     175
26616    175
43041    174
65994    171
dtype: int64

# Display the top 10 artists that have had the most listeners

top_artist = df_final.groupby('artist_name').size().sort_values(ascending = False)[:10]
top_artist

artist_name
Coldplay                  5317
The Killers               4128
Florence + The Machine    2896
Kings Of Leon             2864
the bird and the bee      2387
LCD Soundsystem           2168
Vampire Weekend           2145
Justin Bieber             2130
Octopus Project           1825
Soltero                   1691
dtype: int64

df.shape

(438390, 7)

df_final.shape

(117876, 7)

Observations and Insights: We can now see some popularity lists taken from the final dataset we are using:

The top 10 songs shows us the songs who have had the most listeners (not taking into account how many times an individual listener has played the song).
The top 10 users shows us the users who have listened to most songs - not taking into account how many times they have played those individual songs.
The top 10 artists shows us the artists that have had the most listeners - not taking into account how many times a listener may have played an individual song by that artist. Interestingly these numbers are much higher than the song counts. If we look at Coldplay - the have had 5,317 listeners whereas Yellow, which is by Coldplay has only had 583 of these listeners. This is probably because Coldplay had released 4 studio albums and 2 live albums in the 10 years up to this data being collected and so there are a lot of Coldplay songs for users to listen to, they were still relevant in this period and it just happens to be that Yellow - from their earliest release is the most popular.

We can see that for all of the top 10 artists, this is the case that their plays far exceed any of the top songs, suggesting they have a lot of material and not brand new.

Songs played in a year

count_songs = df_final.groupby('year').count()['title']

count = pd.DataFrame(count_songs)

count.drop(count.index[0], inplace = True)

count.tail()

# Create the plot

# Set the figure size
plt.figure(figsize = (30, 10))

sns.barplot(x = count.index,
            y = 'title',
            data = count,
            estimator = np.median)

# Set the y label of the plot
plt.ylabel('number of titles played') 

# Show the plot
plt.show()

Observations and Insights: # Here we can see the popularity of songs by the year they were released. This data was collected in January 2011 and it is interesting to see that there is a large dip in plays from songs from the latest year 2010 compared to the years preceeding it.¶

This is possibly because the songs that were released in 2010 had not had enough time to gain popularity yet. Songs that had been around for a year already were the most popular followed by the 2 years previous to that one. This might say something about the trend and fashion in music at that time. That the most popular music of a certain scene that was released in 2007, was still relevant and similar to the most popular music that was released in 2009. It is also possible that a new trend was just starting in 2010 but hadn't got going yet, or maybe instead that that old trend was now dying out and users were starting to diverify in what they were listnening to from other years.

Think About It: What other insights can be drawn using exploratory data analysis?

Proposed approach¶

Potential techniques: What different techniques should be explored?

The objective here is to create a list of 10 songs to recommend to a user. These do not have to be songs they have never heard before, only that we are quite sure they will want to listen to them. I think the best approach would be to treat this whole problem like you would if you were tasked with making a mix-tape for someone. You want to include:

songs they know and love
songs they might not know but should because everybody loves them
songs from artists you know they love but that they might not know this particular song
an old classic that might complement their modern taste or vice versa
something from a brand new artist you think they will love so that they can feel ahead of - or just on the curve

Overall solution design: What is the potential solution design?

To accomodate this diverse set of recommendations I would propose to include different models such as:

user to user similarity - to find what other users who have similar taste to them are listening to
graph network - to find out which songs seem to be the key to other songs that users like
song to song similiarity - this could be as simple as using songs from the same artist and / or album as it's almost certain that if you like 3 songs from the same artist, you are quite likely to like another one of their songs.
matrix estimation to find latent variables - we may well be able to explore and find some variable that we have not thought of to find out relationships of certain users to certain songs

Measures of success: What are the key measures of success to compare different potential technqiues?

In regards to the models, we can train them with a training set so that we can measure how well they are performing and we can use RMSE scores and other scores to compare the performances of tuned models to their counterparts.
However there is something else that we must take into consideration in a higher view of this project.

What we must bear in mind is that music should not always be a popularity contest. It may be tempting to use a most-popular = best kind of model but if we do this we risk drowning out the smaller, independant artists and the brand new artists. We must not be seduced by a bigger / better approach. People do not measure their love of a song based on if it is popular - it's normally only that if it is popular they will have more chance of hearing it. But indeed it is something very special to find a song first or one that is lost to time or something obscure you get to be the one that introduces it to your friends.

	Unnamed: 0	user_id	song_id	play_count
0	0	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOAKIMP12A8C130995	1
1	1	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBBMDR12A8C13253B	2
2	2	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBXHDL12A81C204C0	1
3	3	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBYHAJ12A6701BF1D	1
4	4	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODACBL12A8C13C273	1
5	5	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODDNQT12A6D4F5F7E	5
6	6	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODXRTY12AB0180F3B	1
7	7	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFGUAY12AB017B0A8	1
8	8	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFRQTD12A81C233C0	1
9	9	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOHQWYZ12A6D4FA701	1

	song_id	title	release	artist_name	year
0	SOQMMHC12AB0180CB8	Silent Night	Monster Ballads X-Mas	Faster Pussy cat	2003
1	SOVFVAK12A8C1350D9	Tanssi vaan	Karkuteillä	Karkkiautomaatti	1995
2	SOGTUKN12AB017F4F1	No One Could Ever	Butter	Hudson Mohawke	2006
3	SOBNYVR12A8C13558C	Si Vos Querés	De Culo	Yerba Brava	2003
4	SOHSBXH12A8C13B0DF	Tangle Of Aspens	Rene Ablaze Presents Winter Sessions	Der Mystic	0
5	SOZVAPQ12A8C13B63C	Symphony No. 1 G minor "Sinfonie Serieuse"/All...	Berwald: Symphonies Nos. 1/2/3/4	David Montgomery	0
6	SOQVRHI12A6D4FB2D7	We Have Got Love	Strictly The Best Vol. 34	Sasha / Turbulence	0
7	SOEYRFT12AB018936C	2 Da Beat Ch'yall	Da Bomb	Kris Kross	1993
8	SOPMIYT12A6D4F851E	Goodbye	Danny Boy	Joseph Locke	0
9	SOJCFMH12A8C13B0C2	Mama_ mama can't you see ?	March to cadence with the US marines	The Sun Harbor's Chorus-Documentary Recordings	0

	user_id	song_id	play_count	title	release	artist_name	year
0	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOAKIMP12A8C130995	1	The Cove	Thicker Than Water	Jack Johnson	0
1	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBBMDR12A8C13253B	2	Entre Dos Aguas	Flamenco Para Niños	Paco De Lucia	1976
2	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBXHDL12A81C204C0	1	Stronger	Graduation	Kanye West	2007
3	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBYHAJ12A6701BF1D	1	Constellations	In Between Dreams	Jack Johnson	2005
4	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODACBL12A8C13C273	1	Learn To Fly	There Is Nothing Left To Lose	Foo Fighters	1999
5	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODDNQT12A6D4F5F7E	5	Apuesta Por El Rock 'N' Roll	Antología Audiovisual	Héroes del Silencio	2007
6	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODXRTY12AB0180F3B	1	Paper Gangsta	The Fame Monster	Lady GaGa	2008
7	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFGUAY12AB017B0A8	1	Stacked Actors	There Is Nothing Left To Lose	Foo Fighters	1999
8	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFRQTD12A81C233C0	1	Sehr kosmisch	Musik von Harmonia	Harmonia	0
9	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOHQWYZ12A6D4FA701	1	Heaven's gonna burn your eyes	Hôtel Costes 7 by Stéphane Pompougnac	Thievery Corporation feat. Emiliana Torrini	2002

	user_id	song_id	play_count	title	release	artist_name	year
0	54961	153	1	The Cove	Thicker Than Water	Jack Johnson	0
1	54961	413	2	Entre Dos Aguas	Flamenco Para Niños	Paco De Lucia	1976
2	54961	736	1	Stronger	Graduation	Kanye West	2007
3	54961	750	1	Constellations	In Between Dreams	Jack Johnson	2005
4	54961	1188	1	Learn To Fly	There Is Nothing Left To Lose	Foo Fighters	1999
5	54961	1239	5	Apuesta Por El Rock 'N' Roll	Antología Audiovisual	Héroes del Silencio	2007
6	54961	1536	1	Paper Gangsta	The Fame Monster	Lady GaGa	2008
7	54961	2056	1	Stacked Actors	There Is Nothing Left To Lose	Foo Fighters	1999
8	54961	2220	1	Sehr kosmisch	Musik von Harmonia	Harmonia	0
9	54961	3046	1	Heaven's gonna burn your eyes	Hôtel Costes 7 by Stéphane Pompougnac	Thievery Corporation feat. Emiliana Torrini	2002

	user_id	song_id	play_count	title	release	artist_name	year
200	6958	447	1	Daisy And Prudence	Distillation	Erin McKeown	2000
202	6958	512	1	The Ballad of Michael Valentine	Sawdust	The Killers	2004
203	6958	549	1	I Stand Corrected (Album)	Vampire Weekend	Vampire Weekend	2007
204	6958	703	1	They Might Follow You	Tiny Vipers	Tiny Vipers	2007
205	6958	719	1	Monkey Man	You Know I'm No Good	Amy Winehouse	2007
206	6958	892	1	Bleeding Hearts	Hell Train	Soltero	0
209	6958	1050	5	Wet Blanket	Old World Underground_ Where Are You Now?	Metric	2003
213	6958	1480	1	Fast As I Can	Monday Morning Cold	Erin McKeown	2000
215	6958	1671	2	Sleeping In (Album)	Give Up	Postal Service	2003
216	6958	1752	1	Gimme Sympathy	Gimme Sympathy	Metric	2009