June 07, 2012

Web scrapping using python and beautifulsoup

BeautifulSoup is nicely written utility in python to parse the web page using.

It follows css selector style. Thus developer who is used to jquery selectors will find it very easy to parse the HTML/XML tags.

Here is sample example to get all the links.

Python:
 
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
 
soup.find_all('a')
#returns all links as nested data-structure
 
soup.find(id="link3")
#return node whose id is link3   

very Awesome.

Cheers!