Parsing Tables With Img Tags In Python With BeautifulSoup
I am using BeautifulSoup to parse an html page. I need to work on the first table in the page. That table contains a few rows. Each row then contains some 'td' tags and one of the
Solution 1:
You have a nested table, so you need to check where you are in the tree, prior to parsing tr/td/img tags.
from bs4 import BeautifulSoup
f = open('test.html', 'rb')
html = f.read()
f.close()
soup = BeautifulSoup(html)
tables = soup.find_all('table')
for table in tables:
if table.find_parent("table") is not None:
for tr in table.find_all('tr'):
for td in table.find_all('td'):
for img in td.find_all('img'):
print img['id']
print img['src']
print img['title']
print img['alt']
It returns the following based on your example:
img_id
img_src
img_title
img_alt
Post a Comment for "Parsing Tables With Img Tags In Python With BeautifulSoup"