Get HTML Title Tag using Java HTMLDocument Library

Java HTMLDocument model is to support both browsing and editing. If you had overcome need to parse HTML document and wanted to retrieve the title Tag child node value, you can’t use iterator to search the title tag and get the value. But there was one, even more easier way to do that.

//package needed.
javax.swing.text.html.HTMLDocument

//Sample code that get the html document from internet.
//We can use string instead.

URL url = new URL('http://yourwebsitehere.com');
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);

HTMLEditorKit htmlKit = new HTMLEditorKit();

//Only start from body
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
.....
..... //Some HTMLEditorKit code here (Not used for this tutorial)
.....
String title = (String) htmlDoc.getProperty(HTMLDocument.TitleProperty);
System.out.println('HTMLDocument Title: ' + title);

//end java code here

This should be work, of course this is only the snippet code.

Reference : Java HTMLDocument API

Bypass Server Proxy with Java Code

Few days ago, i was writing Java Spider Web crawler that need to by pass by University proxy because of the firewall settings. Below here were the basic snippet code how we can bypass the proxy.

Java Library that needed :
import java.net.*;
import java.io.*;
import java.util.Properties;

Since we are using java property rather than compile with command as below :
UNIX
java -Dhttp.proxyHost=proxyhost
[-Dhttp.proxyPort=portNumber] URLReader

DOS shell (Windows 95/NT/XP)
java -Dhttp.proxyHost=proxyhost
[-Dhttp.proxyPort=portNumber] URLReader

Snipper Code :
//Place before you make a use of stream reader

Properties systemSettings = System.getProperties();
System.setProperty("http.proxySet", "true");

//Your proxy host server
systemSettings.put(“http.proxyHost”,”bluetongue.cs.rmit.edu.au”) ;

//Your proxy port
systemSettings.put(“http.proxyPort”, “8080”);

URL url = new URL(“http://www.google.com/news”);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();

/* it’s not the greatest idea to use a sun.misc.* class
* Sun strongly advises not to use them since they can
* change or go away in a future release so beware.
*The username and password below ONLY be encoded that means
*it is not secure to transmit over the network. But since this is just a demo
*/

sun.misc.BASE64Encoder encoder = new sun.misc.BASE64Encoder();
String encodedUserPwd =
encoder.encode("mydomain\\username:password".getBytes());
con.setRequestProperty
("Proxy-Authorization", "Basic " + encodedUserPwd);

That’s all, you should be able to compile and run now.