I plan, I design, I built

Learn it, Live it, Love it.

Posts Tagged ‘parse

Get HTML Title Tag using Java HTMLDocument Library

without comments

Java HTMLDocument model is to support both browsing and editing. If you had overcome need to parse HTML document and wanted to retrieve the title Tag child node value, you can’t use iterator to search the title tag and get the value. But there was one, even more easier way to do that.

//package needed.
javax.swing.text.html.HTMLDocument

//Sample code that get the html document from internet.
//We can use string instead.

URL url = new
URL(“http://yourwebsitehere.com”);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);

HTMLEditorKit htmlKit = new HTMLEditorKit();

//Only start from body
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
…..
….. //Some HTMLEditorKit code here (Not used for this tutorial)
…..
String title = (String) htmlDoc.getProperty(HTMLDocument.TitleProperty);
System.out.println(“HTMLDocument Title: ” + title);

This should be work, of course this is only the snippet code.

Reference : Java HTMLDocument API

Written by Charles Ling

15 May, 2008 at 3:07 AM

Posted in Java

Tagged with ,