Get HTML Title Tag using Java HTMLDocument Library

Java HTMLDocument model is to support both browsing and editing. If you had overcome need to parse HTML document and wanted to retrieve the title Tag child node value, you can’t use iterator to search the title tag and get the value. But there was one, even more easier way to do that.

//package needed.
javax.swing.text.html.HTMLDocument

//Sample code that get the html document from internet.
//We can use string instead.

URL url = new URL('http://yourwebsitehere.com');
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);

HTMLEditorKit htmlKit = new HTMLEditorKit();

//Only start from body
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
.....
..... //Some HTMLEditorKit code here (Not used for this tutorial)
.....
String title = (String) htmlDoc.getProperty(HTMLDocument.TitleProperty);
System.out.println('HTMLDocument Title: ' + title);

//end java code here

This should be work, of course this is only the snippet code.

Reference : Java HTMLDocument API

Advertisements