at 9:21 PM

Get site ranking with python and beautifulsoup

To get information about site, we can use alexa.com but sometimes we just need some pieces of information like ranking of site global or local country.
In this case i just need that information, when i looking on alexa i found so many information about the target site.
I have some knowledge about python and beautifulsoup library and i try to grep that information with them.

First time i analyzing structure of page of alexa.com, find element of html tag which contains the data.



Based on screenshot, the data on the div tag with class data, first tag is global ranks, second is local ranks and then reputation and rating. So the script must be grep a div tag with specification then do a looping.

In the source i need somes python library like sys to get arguments, urllib2 to opening url, BeautifulSoup to parsing the html, re to remove empty character which produced by BeautifulSoup output.

Ok, this is the source to get the information.
#!/usr/bin/env python
# -*- coding: utf8 -*-

# alexagrep.py
# Ardi nooneDOTnu1ATgmailDOTcom
# GNU GPL

import sys
import urllib2
from bs4 import BeautifulSoup
from re import sub

ATTR = [
    'Global ranks ',
    'Local ranks  ',
    'Reputation   ',
    'Rating       ',
]

def openLink(url):
    _url = 'http://www.alexa.com/siteinfo/'+ url
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    
    return opener.open(_url).read()

if __name__ == '__main__':
    try:
        url = sys.argv[1]
    except:
        print 'Error url'
        sys.exit()
        
    html = openLink(url)
    soup = BeautifulSoup(html)
    tds = soup.find('tr', {'class':'data-row1'}).findAll('td')
    i=0
    
    for td in tds:
        print ATTR[i] +' => '+ sub(r'\s+', ' ', td.find('div', {'class':'data'}).text.strip())
        i = i + 1
    




To execute script just type $ python alexagrep.py url or copying into /usr/local/bin and adding execute permission to script (chmod +x).



.

Tulis Komentar dengan akun Facebook Anda.

1 comments:

smartsuite said...

I believe Web time sheet software makes the complete employee time clock tracking task easier. Its easy to update, approve and maintain the time sheets in no

time.Time Attendance System

Post a Comment