Skip to content Skip to sidebar Skip to footer

How To Make Nokogiri Transparently Return Un/encoded Html Entities Untouched?

How can I use Nokogiri with having html entities (like German umlauts) untouched? I.e.: # this is fine node = Nokogiri::HTML.fragment('

ö

') node.to_s # =

Solution 1:

Ok, my question has been answered by Aaron via twitter/gist:

require'rubygems'require'nokogiri'

doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'# We added a contextual fragment method for the 1.4.2 release. This *might*# work in 1.4.1. If you want to mess with 1.4.2, build from my github, or# grab one of our nightly builds:## $ sudo gem install nokogiri -s http://tenderlovemaking.com/## Also, libxml2 had a bug with encoding when handling UTF-8 fragments, so I# suggest you also upgrade to libxml2 2.7.7.## Hope that helps!
puts doc.fragment('<p>ö</p>')

Post a Comment for "How To Make Nokogiri Transparently Return Un/encoded Html Entities Untouched?"