If you ever came across Dicosgs, you probably realized, how useful this site is when it comes to mining information about audio releases. This very lame tutorial will demonstrate using Discogs API & python to get release information, parse it and serve release bbcode to Apache.
Parsing a bit messy HTML is as useful as a chocolate teapot, so we better use the API, which implements RESTful interface and therefore provides better output - XML. Before we start you will need to obtain an API key.
Well… we need to import some stuff in the beggining
#-*- coding utf-8 -*-
from mod_python
import apache
import urllib2
import gzip
import cStringIO
import simplexml
import sys
import datetime,
time
error = ”
For a starters we can write a simple function to fetch desired XML:
def get_release
(release_id
):
global error
REQUIRED_PARAMS =
‘f=xml&api_key=’+YOUR_API_KEY
request =
urllib2.
Request(‘http://www.discogs.com/release/’+
str(release_id
)+
‘?’+REQUIRED_PARAMS
)
request.
add_header(‘Accept-Encoding’,
‘gzip’)
try:
response =
urllib2.
urlopen(request
)
data = response.
read()
return gzip.
GzipFile(fileobj =
cStringIO.
StringIO(data
)).
read()
except urllib2.
HTTPError, e:
error =
‘Ooops. An error occured
Are you sure that "’+release_id+\
‘" is an existing release id?\n\nHTTP ‘+
str(e.
code)+
‘\nRequested URI: ‘+query_string
return False
As you see the function makes a request string out of the release id and your API KEY and requests the page from discogs. The response is deflated XML with our desired data. On any HTTP error it saves the error string into global variable error and return False.
And now comes the parsing. I’m used to SimpleXML extension from PHP so i used SimpleXML module (and command-line tool) by Ales Smrcka.
A simple fucntion for parsing the XML:
def parse_xml(dxml,release_id):
xmldoc = simplexml.xmldoc()
xmldoc.load_string(dxml)
release = {‘artists’:[], \
‘extraartists’: [], \
‘title’: ”, \
‘labels’: [], \
‘genres’: [], \
’styles’: [], \
‘formats’: [], \
‘country’: ”, \
‘released’: ”, \
‘tracks’: []}
We send our XML to SimpleXML and create empty dictionary for our release data. Now we need to parse the XML with SimpleXML:
for i in range(len(xmldoc.elements(‘/resp/release/artists/artist’))):
for name in xmldoc.elements(‘/resp/release/artists/artist[’+str(i)+‘]/name’):
release[‘artists’].append(name.data)
for i in range(len(xmldoc.elements(‘/resp/release/labels/label’))):
for label in xmldoc.elements(‘/resp/release/labels/label[’+str(i)+‘]’):
release[‘labels’].append(label.attr(‘name’)+‘ (’+label.attr(‘catno’)+‘)’)
for i in range(len(xmldoc.elements(‘/resp/release/extraartists/artist’))):
for name in xmldoc.elements(‘/resp/release/extraartists/artist[’+str(i)+‘]/name’):
out = name.data
for role in xmldoc.elements(‘/resp/release/extraartists/artist[’+str(i)+‘]/role’):
release[‘extraartists’].append(out+‘ (’+role.data+‘)’)
for i in range(len(xmldoc.elements(‘/resp/release/genres/genre’))):
for genre in xmldoc.elements(‘/resp/release/genres/genre[’+str(i)+‘]’):
release[‘genres’].append(genre.data)
for i in range(len(xmldoc.elements(‘/resp/release/styles/style’))):
for style in xmldoc.elements(‘/resp/release/styles/style[’+str(i)+‘]’):
release[’styles’].append(style.data)
for i in range(len(xmldoc.elements(‘/resp/release/tracklist/track’))):
for position in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/position’): t = position.data+‘\t‘
artists = []
for j in range(len(xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/artists/artist’))):
for name in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/artists/artist[’+str(j)+‘]/name’):
artists.append(name.data)
if len(artists)>0: t+= ", ".join(artists)+‘ - ‘
for title in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/title’): t+= title.data
for duration in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/duration’):
if(duration.data != ”): t+= ‘ (’+duration.data+‘)’
xtraartists = []
for k in range(len(xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist’))):
for name in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist[’+str(k)+‘]/name’):
out = name.data
for role in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist[’+str(k)+‘]/role’):
out+=‘ - ‘+role.data
xtraartists.append(out)
if len(xtraartists) >0:
t+= ‘\n\t\t[size=7pt](’+‘, ‘.join(xtraartists)+‘)[/size]’
release[‘tracks’].append(t)
for i in range(len(xmldoc.elements(‘/resp/release/formats/format’))):
for format in xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]’):
f = format.attr(‘qty’)+‘x’+format.attr(‘name’)
for j in range(len(xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]/descriptions/description’))):
for desc in xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]/descriptions/description[’+str(j)+‘]’):
release[‘formats’].append(f+‘ ‘+desc.data)
for title in xmldoc.elements(‘/resp/release/title’): release[‘title’] = title.data
for rdate in xmldoc.elements(‘/resp/release/released’): release[‘released’] = rdate.data
for country in xmldoc.elements(‘/resp/release/country’): release[‘country’] = country.data
As you see you can get any item by using elements() method. All you have to do is walk thru all elements using for loop with range() and store data somehow… The last portion of the function makes the final bbcode:
released = release[‘released’].split(‘-’)
out = ‘[size=14pt][b]’+release[‘artists’][0]+‘ - ‘+release[‘title’]+‘ [’+released[0]+‘][/b][/size]\n\n‘
out+= ‘[img]release cover goes here[/img]\n\n‘
out+= ‘[url=http://www.discogs.com/release/’+str(release_id)+‘]Discogs[/url]\n‘
if(len(release[‘labels’])>0): out+= ‘Label:\t[i]’+‘, ‘.join(release[‘labels’])+‘[/i]\n‘
if(len(release[‘formats’])>0): out+= ‘Format:\t[i]’+‘, ‘.join(release[‘formats’])+‘[/i]\n‘
out+= ‘Country:\t[i]’+release[‘country’]+‘[/i]\n‘
if(len(released)==3): out+= ‘Released:\t[i]’+released[2]+‘/’+released[1]+‘/’+released[0]+‘[/i]\n‘
else: out+= ‘Released:\t[i]’+‘.’.join(released)+‘[/i]\n‘
if(len(release[‘genres’])>0): out+= ‘Genre:\t[i][b]’+‘, ‘.join(release[‘genres’])+‘[/b][/i]\n‘
if(len(release[’styles’])>0): out+= ‘Style:\t[i][b]’+‘, ‘.join(release[’styles’])+‘[/b][/i]\n‘
if(len(release[‘extraartists’])>0):
out+= ‘Extra nfo:\t[i]’+‘, ‘.join(release[‘extraartists’])+‘[/i]\n‘
out+= ‘\nBitrate:\t[i][b]xxx kbps[/b][/i]\n‘
out+= ‘Size:\t[i][b]xxx MB[/b][/i]\n‘
out+= ‘\n[b]Tracklisting[/b]:\n‘
out+= ‘\n‘.join(release[‘tracks’])
return out
The final part handles release id parameter obtain via GET and flushes the bbcode back to the browser (or more likely to some AJAX widget):
def handler(req):
req.content_type = ‘text/plain’
req.send_http_header()
if(req.args != None):
dxml = get_release(req.args)
if(dxml != False):
out = parse_xml(dxml,req.args)
req.write(out.encode("utf-8"))
else:
req.write(error)
else:
req.write(‘Err: Please identify the release with a number’)
return apache.OK
You can get the whole script source @ http://hits.starenka.net/?item=http://source.starenka.net/getrelease.zip. Have fun!