Extract Absolute Links From A Page Using Htmlparser
I'm using the following snippet to extract all the links on a page using HTMLParser. I get quite a few relative URLs. How can I convert these to absolute URLs for a domain e.g. www
Solution 1:
You want
urlparse.urljoin(base, url[, allow_fragments])
http://docs.python.org/library/urlparse.html#urlparse.urljoin
This allows you to give an absolute or base url, and join it with a relative url. Even if they have overlapping pieces, it should work.
Post a Comment for "Extract Absolute Links From A Page Using Htmlparser"