Extract Data From Html Table With Mechanize

May 30, 2024 Post a Comment

First of all, here is the sample html table : Kangchenjunga 8,586m<

Solution 1:

More succint version relying more on the black magic of XPath :)

require'nokogiri'require'open-uri'

doc = Nokogiri::HTML(open('http://www.alpineascents.com/8000m-peaks.asp'))
last_td = doc./("//tr[td[strong[text()='#{ARGV[0]}']]]/td[5]")

puts last_td.text.gsub(/.*?;/, '').strip

Solution 2:

I believe this is what you want (you will need to gem install nokogiri)

require'nokogiri'require'open-uri'

doc = Nokogiri::HTML(open('http://www.alpineascents.com/8000m-peaks.asp'))
rows = doc.search('//table')[6]./('tr')
rows.shift
rows.shift

rows.each do|row|if row.text.include? ARGV[0]
    puts row./('td')[4].text.gsub(/.*?;/, '').strip   
  endend

Solution 3:

The first mistake that I see is that you are calling the following:

p=Mechanize.new.get('www.alpineascents.com/8000m-peaks.asp').body

Unfortunately grabbing the body from the mechanize object will just return all the body text as you would find in the DOCTYPE body block.

This information is quite annoying to parse through so I would recommend doing the following. p=Mechanize.new.get('http://www.alpineascents.com/8000m-peaks.asp')

This will return a Mechanize#Page object which you an play with(http://mechanize.rubyforge.org/Mechanize/Page.html)

With that object we can simply perform a search which is nokogiris search by doing the following;

elems = p.search('tr')

this will return all the tr elements as a Nokogiri::XML::Element which we can use pretty cleanly to get the information that we want. Note that you may want to play around with all the stuff in IRB to figure out exactly what you need but the idea is should be clear from the following:

elems.first.search('td').last.text which will return the final td elements text from the first tr element we searched for before.

If you have any questions / want me to clarify feel free to ask away.

I have been hacking on things with mechanize for a long while now.

EDIT:

If you want to be able to look up the values this using some argument this is how I imagined you would solve the problem

values = {}
elems.each do |e|
  td = e.search('td')
  values[td.first.text] = td.last.text
end

When you have the values hash filled you can do the following:

if ARG[0] = "Everest"

then

> values["Everest"] => "1953; Sir E. Hillary, T. Norgay"

Html5 Ready

Extract Data From Html Table With Mechanize

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Extract Data From Html Table With Mechanize"