How To Extract Content From Html
I have HTML as string and i want to extract just 'post_titles' from it. this is the HTML string:
Single pa
Solution 1:
You must include a cookie in your request. Check this Java code:
try {
String url = "https://ssblecturate.wordpress.com/wp-login.php";
Connection.Response response = Jsoup.connect(url)
.data("log", "your_login_here") // your wordpress login
.data("pwd", "your_password_here") // your wordpress password
.data("rememberme", "forever")
.data("wp-submit", "Log In")
.method(Connection.Method.POST)
.followRedirects(true)
.execute();
Document document = Jsoup.connect("https://ssblecturate.wordpress.com/wp-admin/edit.php")
.cookies(response.cookies())
.get();
Element titleElement= document.select("div[class=post_title]").first();
System.out.println(titleElement.text());
} catch (IOException e) {
e.printStackTrace();
}
Solution 2:
Try this, but make sure your HTML text is well formatted in the String :
String html ="<div class=\"hidden\" id=\"inline_49\">"+"<div class=\"post_title\">Single parenting</div>"+"<div class=\"post_name\">single-parenting</div>"+"<div class=\"post_author\">90307285</div>";
Document document =Jsoup.parse(html);
Elements divElements = document.select("div");
for(Element div : divElements) {
if(div.attr("class").equals("post_title")) {
System.out.println(div.ownText());
}
}
Solution 3:
Updated ! Hope It works for you :
//Get div tag with class name is 'post_title'Document doc;
try {
File input = new File("D:\\JAVA\\J2EE\\Bin\\Bin\\Project\\xml\\src\\demo\\index.html");
doc =Jsoup.parse(input, "UTF-8", "http://example.com/");
//Get div tag with class name is 'post_title'Element element = doc.select("div.post_title").first();
System.out.println(element.html());
} catch (Exception e) {
e.printStackTrace();
}
Solution 4:
If you have it in a String, you can try with regExp
.
This regex means "everything between with class post_title (not exactly but yes for your sample).
String exp = "<divclass=\"post_title\">([^<]*)</div>"
You should be able to get the content with:
String post_title = Pattern.compile(exp).matcher(yourString).group(1);
NOTE: I guess your post_title does not contain "<"... This should indeed generate an XML structure error.
Post a Comment for "How To Extract Content From Html"