Downloading Flickr image sets with Python and wget
May 2nd, 2008
Have you ever came across a kool image set on Flickr and wanted to download all those pics in the “large” size? Yes, there are thousand ways how to skin the cat… I wrote a very simple “spider” [dl here] in Python to do all the clicking for me. Once i have all image links I run wget to retrieve them.
Let’s make function to fetch given uri first:
import sys,re,string
import urllib,urllib2
def readFile(uri):
request = urllib2.Request(uri)
try:
response = urllib2.urlopen(request)
except urllib2.HTTPError, e:
print ‘ERR: (’+str(e.code)+‘) Error occured. Current URI:’+uri
except urllib2.URLError, e:
print ‘ERR: Failed to reach the URI (’+str(e.reason[0])+‘:’+e.reason[1]+‘)’
else:
return response.read()
Then two simple functions to search for appropriate hrefs and images. getThumbs extracts all links to “image detail” pages and send those to getImages function. getImages jumps to “Available sizes” page and prints the image uri.
global thumb_match
for match in re.finditer(thumb_match,data):
getImages(match.group(1))
def getImages(uri):
global set_id,big_match
pos = uri.find(‘in/set-’+set_id)
uri = uri[0:pos-1]+‘/sizes/o/’+uri[pos:]
data = readFile(‘http://www.flickr.com’+uri)
for match in re.finditer(big_match,data):
print match.group(1)
The last section just parses given parameters, compiles regular expressions and starts harvesting links:
uri = sys.argv[1]
if(uri.find(‘?page=’) > 0):
pos = uri.find(‘?page=’)
page = uri[pos:]
uri = uri[0:pos-1]
set_id = uri[uri.rfind(‘/’)+1:]
uri = uri+‘/’+page
else:
if(uri[len(uri)-1:] == ‘/’):
uri = uri[0:-1]
set_id = uri[uri.rfind(‘/’)+1:]
thumb_match = re.compile(r‘.*?<a.*?href="(.*?set-’+set_id+‘/)".*?>.*?’,re.IGNORECASE)
big_match = re.compile(r‘.*?<p><img.*?src="(.*?static.flickr.com.*?)".*? /></p>.*?’,re.IGNORECASE)
getThumbs(readFile(uri))
else:
print ‘lack of params’
Now i just run the script on all gallery pages and let it save the links into a file:
python herrflick.py http://www.flickr.com/photos/pinkponk/sets/72157600267969060/?page=2 >> /down/adverts.txt
python herrflick.py http://www.flickr.com/photos/pinkponk/sets/72157600267969060/?page=3 >> /down/adverts.txt
and let wget download the images:
PS. More comfy is to make the script executable (linux only):
- rename the script to “herrflick”
- copy it to /usr/bin
- make it executable (chmod +x /usr/bin/herrflick)
and run it:
Popularity: 31% [?]
