Get HTML Title Tag using Java HTMLDocument Library

Java HTMLDocument model is to support both browsing and editing. If you had overcome need to parse HTML document and wanted to retrieve the title Tag child node value, you can’t use iterator to search the title tag and get the value. But there was one, even more easier way to do that.

//package needed.
javax.swing.text.html.HTMLDocument

//Sample code that get the html document from internet.
//We can use string instead.

URL url = new URL('http://yourwebsitehere.com');
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);

HTMLEditorKit htmlKit = new HTMLEditorKit();

//Only start from body
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
.....
..... //Some HTMLEditorKit code here (Not used for this tutorial)
.....
String title = (String) htmlDoc.getProperty(HTMLDocument.TitleProperty);
System.out.println('HTMLDocument Title: ' + title);

//end java code here

This should be work, of course this is only the snippet code.

Reference : Java HTMLDocument API

About these ads

About Charles Ling
Web/Android/IPhone Developer. Very very interested in Web Architecture, Web Standard and and how to use Web to improve human social life and doing cool stuff.

2 Responses to Get HTML Title Tag using Java HTMLDocument Library

  1. Anonymous says:

    nice code

  2. Anonymous says:

    You didn’t init the htmlDocument with the BufferedReader.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: