rulu ruru

post Ukradni tuhle knížku…

June 4th, 2008

Filed under: other, programming, python, web — starenka @ 11:03
Tags: , , ,

… napsal kdysi Abbie Hoffman na obal a začal ji prodávat. Já napíšu jeden řádek a ukradnu všechny soubory v autoindexovaném adresáři na web serveru (nevím přesně jak se tomu vlastně říká ;) - příklad bude zřejmě názornější). Než mě začnete kamenovat, zamyslete se nad tím, jestli každá stařenka, která umí uštrikovat kuklu je automaticky zloděj…

wget -nd -r -l1 -R ‘index*’,‘robots.txt’ –no-parent http://server.com/adresarsautoindexem/ –directory-prefix=/kam/ulozit

Nebyl bych to samozřejmě já, kdybych si to nepřidal do kontextového menu Opery…(pokráceno)

[Document Popup Menu]
....
Submenu, "save content to...", savetoc
...
[savetoc]
Item, junk	= Execute program,"konsole","--noframe --notoolbar --nomenubar --vt_sz 90x30 --noclose -e  wget -nd -r -l1 -R 'index*','robots.txt' --no-parent %u --directory-prefix=/junk"
Item, down	= Execute program,"konsole","--noframe --notoolbar --nomenubar --vt_sz 90x30 --noclose -e  wget -nd -r -l1 -R 'index*','robots.txt' --no-parent %u --directory-prefix=/down"
Item, mp3	= Execute program,"konsole","--noframe --notoolbar --nomenubar --vt_sz 90x30 --noclose -e  wget -nd -r -l1 -R 'index*','robots.txt' --no-parent %u --directory-prefix=/mp3/000_new"

A už vůbec bych to nebyl já, kdybych na to nejdřív nešel strašnou obklikou. Z nevědomosti a lenosti číst manual wgetu, jsem původně vykuchal svůj celkem povedený pavoučí Python skript a upravil na další který vypisoval všechny odkazy v adresáři, které jsem potom chtěl wgetem stáhnout. Inu kdo nemá v hlavě musí mít v prstech. K nakouknutí v muzeu. V pondělí zavřeno.

post Random status message plugin for Pidgin

May 5th, 2008

Filed under: linux, programming — starenka @ 16:55
Tags: , , ,

pidginThere was one thing I missed most while using Pidgin - a simple plugin, which pops a random status message on Pidgin start after given time interval. I loved this plugin I’ve seen first in Miranda so much that  I decided to astray to terra incognita - plugin writing and Perl. Well it took me couple hours to figure out how it works and to learn the basic Perl syntax, nevertheles I’m happy to anounce that “randomator” is ready for downloads now…

 INSTALLATION

Simply unpack the archive to your ~/.purple/plugins directory and paste your own messages into file named randomator.txt. Each line of this file is considered to be a message. After strarting Pidgin go to plugins menu (ctrl+u) and enable the plugin. You will also need to create a “new status” with some content in order to let the plugin change the messages..

 USAGE

 Well, the plugins looks after itself, so you don’t need to do more stuff. I guess I should make some preference dialog to set your own path to the message file, but I was kinda lazy to that.

SETTINGS

You can specify the interval between message change on line 47 in randomator.pl file Purple::timeout_add($plugin, 3600, \&set_message); The preset interval is 3600 seconds = one hour. 

.

post A kde je tady kalcium?

March 29th, 2008

Filed under: links to remember, programming — starenka @ 19:26
Tags:

Pro spoustu lidi jsou regularní výrazy rakovinným polipem na nose programovacího jazyka. Na druhou strany je zase spousta lidí, kteří na ně nedají pochopitelně dopustit. Pro ty, co nechtějí moc vědět, jako to funguje ja pak tahle stranka. Klikači do každýho monitoru.

post Measuring time needed to recieve content of given URI(s) with Python

December 26th, 2007

Filed under: other, programming, python, web — starenka @ 22:55
Tags: , ,

My hosting was damn slow today so I decided to yell at some poor guys out there to get some speed. It’s allways better to have some arguments on you so I thought that showing some numbers would be fine. To demonstrate how slow my domain is I decided to compare the time needed to obtain my page with time needed for getting page from Seznam.cz. You are probably starting to laugh at me just now. Why should I do this for? It’s comparing like comparing elephant’s poo with safe water. No match? Correct :> I will share the code nevertheless just for fun. So let’s get started, shall we?

We need a simple function which handles the HTTP request:

def fetch_page(url):
    import urllib2
    try:
        f = urllib2.urlopen(url)
    except urllib2.HTTPError, e:
        print e.code
        sys.exit()
    except urllib2.URLError, e:
        print e.reason
        sys.exit()

And now we just loop thru URIs within argv and run the Timer.repeat() method on each one. Note that we have to import the fetch_page function in setup argument in order to use it further in the stmt. The repeat method calls timeit method with number of iterations specified as it’s second argument and repeats this first_argument*times. The minimum value of all measured is printed out.

if __name__==‘__main__’:
    import sys
    if len(sys.argv)>2:
        from timeit import Timer
        for a in range(1,len(sys.argv)):
            print sys.argv[a]
            print ‘- min: ‘+str(min(Timer(setup="from __main__ import fetch_page", \                            stmt="fetch_page(%s)"%(‘"’+sys.argv[a]+‘"’)).repeat(3,3)))+’s\n
    else: print \nERR:lack of parameters\nusage:\n+sys.argv[0]+\
            ‘ http://www.first.uri [http://www.second.one] …. [www.n-th.uri]’

post Fetching release information from discogs

December 22nd, 2007

Filed under: programming, python — starenka @ 22:06
Tags: , , , ,

If you ever came across Dicosgs, you probably realized, how useful this site is when it comes to mining information about audio releases. This very lame tutorial will demonstrate using Discogs API & python to get release information, parse it and serve release bbcode to Apache.

Parsing a bit messy HTML is as useful as a chocolate teapot, so we better use the API, which implements RESTful interface and therefore provides better output - XML. Before we start you will need to obtain an API key.

Well… we need to import some stuff in the beggining ;)

#-*- coding utf-8 -*-
from mod_python import apache
import urllib2
import gzip
import cStringIO
import simplexml
import sys
import datetime,time    

error =

For a starters we can write a simple function to fetch desired XML:

def get_release(release_id):
     global error
     REQUIRED_PARAMS = ‘f=xml&api_key=’+YOUR_API_KEY
     request = urllib2.Request(‘http://www.discogs.com/release/’+str(release_id)+‘?’+REQUIRED_PARAMS)
     request.add_header(‘Accept-Encoding’, ‘gzip’)
     try:
          response = urllib2.urlopen(request)
          data = response.read()
          return gzip.GzipFile(fileobj = cStringIO.StringIO(data)).read()
     except urllib2.HTTPError, e:
          error = ‘Ooops. An error occured :( Are you sure that "’+release_id+\
          ‘" is an existing release id?\n\nHTTP ‘+str(e.code)+\nRequested URI: ‘+query_string
     return False

As you see the function makes a request string out of the release id and your API KEY and requests the page from discogs. The response is deflated XML with our desired data. On any HTTP error it saves the error string into global variable error and return False.

And now comes the parsing. I’m used to SimpleXML extension from PHP so i used SimpleXML module (and command-line tool) by Ales Smrcka.
A simple fucntion for parsing the XML:

def parse_xml(dxml,release_id):
    xmldoc = simplexml.xmldoc()
    xmldoc.load_string(dxml)
    release = {‘artists’:[], \
           ‘extraartists’: [], \
           ‘title’: , \
           ‘labels’: [], \
           ‘genres’: [], \
           ’styles’: [], \
           ‘formats’: [], \
           ‘country’: , \
           ‘released’: , \
           ‘tracks’: []}

We send our XML to SimpleXML and create empty dictionary for our release data. Now we need to parse the XML with SimpleXML:

   for i in range(len(xmldoc.elements(‘/resp/release/artists/artist’))):
        for name in xmldoc.elements(‘/resp/release/artists/artist[’+str(i)+‘]/name’):
           release[‘artists’].append(name.data)
    for i in range(len(xmldoc.elements(‘/resp/release/labels/label’))):
        for label in xmldoc.elements(‘/resp/release/labels/label[’+str(i)+‘]’):
            release[‘labels’].append(label.attr(‘name’)+‘ (’+label.attr(‘catno’)+‘)’)
    for i in range(len(xmldoc.elements(‘/resp/release/extraartists/artist’))):
        for name in xmldoc.elements(‘/resp/release/extraartists/artist[’+str(i)+‘]/name’):
            out = name.data
        for role in xmldoc.elements(‘/resp/release/extraartists/artist[’+str(i)+‘]/role’):
            release[‘extraartists’].append(out+‘ (’+role.data+‘)’)
    for i in range(len(xmldoc.elements(‘/resp/release/genres/genre’))):
        for genre in xmldoc.elements(‘/resp/release/genres/genre[’+str(i)+‘]’):
            release[‘genres’].append(genre.data)
    for i in range(len(xmldoc.elements(‘/resp/release/styles/style’))):
        for style in xmldoc.elements(‘/resp/release/styles/style[’+str(i)+‘]’):
            release[’styles’].append(style.data)
    for i in range(len(xmldoc.elements(‘/resp/release/tracklist/track’))):
        for position in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/position’): t = position.data+\t
        artists = []
        for j in range(len(xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/artists/artist’))):
            for name in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/artists/artist[’+str(j)+‘]/name’):
                artists.append(name.data)
        if len(artists)>0: t+= ", ".join(artists)+‘ - ‘
        for title in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/title’): t+= title.data
        for duration in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/duration’):
            if(duration.data != ): t+= ‘ (’+duration.data+‘)’
        xtraartists = []
        for k in range(len(xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist’))):
            for name in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist[’+str(k)+‘]/name’):
                out = name.data
            for role in xmldoc.elements(‘/resp/release/tracklist/track[’+str(i)+‘]/extraartists/artist[’+str(k)+‘]/role’):
                out+=‘ - ‘+role.data
            xtraartists.append(out)
        if len(xtraartists) >0:
            t+= \n\t\t[size=7pt](’+‘, ‘.join(xtraartists)+‘)[/size]’
        release[‘tracks’].append(t)
    for i in range(len(xmldoc.elements(‘/resp/release/formats/format’))):
        for format in xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]’):
            f = format.attr(‘qty’)+‘x’+format.attr(‘name’)
            for j in range(len(xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]/descriptions/description’))):
                for desc in xmldoc.elements(‘/resp/release/formats/format[’+str(i)+‘]/descriptions/description[’+str(j)+‘]’):
                    release[‘formats’].append(f+‘ ‘+desc.data)
    for title in xmldoc.elements(‘/resp/release/title’): release[‘title’] = title.data
    for rdate in xmldoc.elements(‘/resp/release/released’): release[‘released’] = rdate.data
    for country in xmldoc.elements(‘/resp/release/country’): release[‘country’] = country.data

As you see you can get any item by using elements() method. All you have to do is walk thru all elements using for loop with range() and store data somehow… The last portion of the function makes the final bbcode:

   released = release[‘released’].split(‘-’)
    out = ‘[size=14pt][b]’+release[‘artists’][0]+‘ - ‘+release[‘title’]+‘ [’+released[0]+‘][/b][/size]\n\n
    out+= ‘[img]release cover goes here[/img]\n\n
    out+= ‘[url=http://www.discogs.com/release/’+str(release_id)+‘]Discogs[/url]\n
    if(len(release[‘labels’])>0): out+= ‘Label:\t[i]’+‘, ‘.join(release[‘labels’])+‘[/i]\n
    if(len(release[‘formats’])>0): out+= ‘Format:\t[i]’+‘, ‘.join(release[‘formats’])+‘[/i]\n
    out+= ‘Country:\t[i]’+release[‘country’]+‘[/i]\n
    if(len(released)==3): out+= ‘Released:\t[i]’+released[2]+‘/’+released[1]+‘/’+released[0]+‘[/i]\n
    else: out+= ‘Released:\t[i]’+‘.’.join(released)+‘[/i]\n
    if(len(release[‘genres’])>0): out+= ‘Genre:\t[i][b]’+‘, ‘.join(release[‘genres’])+‘[/b][/i]\n
    if(len(release[’styles’])>0): out+= ‘Style:\t[i][b]’+‘, ‘.join(release[’styles’])+‘[/b][/i]\n
    if(len(release[‘extraartists’])>0):
        out+= ‘Extra nfo:\t[i]’+‘, ‘.join(release[‘extraartists’])+‘[/i]\n
    out+= \nBitrate:\t[i][b]xxx kbps[/b][/i]\n
    out+= ‘Size:\t[i][b]xxx MB[/b][/i]\n
    out+= \n[b]Tracklisting[/b]:\n
    out+= \n.join(release[‘tracks’])
    return out

The final part handles release id parameter obtain via GET and flushes the bbcode back to the browser (or more likely to some AJAX widget):

def handler(req):
    req.content_type = ‘text/plain’
    req.send_http_header()
    if(req.args != None):
        dxml = get_release(req.args)
        if(dxml != False):
            out = parse_xml(dxml,req.args)
            req.write(out.encode("utf-8"))
        else:
           req.write(error)
    else:
        req.write(‘Err: Please identify the release with a number’)
    return apache.OK

You can get the whole script source @ http://hits.starenka.net/?item=http://source.starenka.net/getrelease.zip. Have fun!

« Previous Page
ruldrurd
© starenka 2oo7, cute alien monster by noizcut, original theme by Laurentiu Piron - customized by starenka | proudly powered by WordPress