The Hypertext Transfer Protocol (HTTP) allows information to be exchanged between a web server and a web browser. Java allows you to program HTTP directly. HTTP programming allows you to create programs that access the web much like a human user would. These programs, which are called bots, can collect information or automate common web programming tasks. This book presents a collection of very reusable recipes for Java bot programming. This book covers many topics related to Java HTTP programming. Both secure and insecure HTTP communications are covered, as well as HTTP authentication. Learn to interact with HTTP forms and support both HTTP POST and HTTP GET requests. Collect data from a wide array of HTML constructs, such as tables, and lists. Learn about advanced topics that complicate the life of a bot, such as AJAX and Javascript. Also learn about the ethical use of bots, and when bots should not be used. This book also introduces the Heaton Research Spider. The Heaton Research Spider is an open source spider framework. Using the Heaton Research Spider you can create spiders that will crawl a web site, much like a real spider crawls the web. The Heaton Research Spider is available in both Java and Microsoft Dot Net form.
Author(s): Jeff Heaton
Publisher: Heaton Research, Inc.
Year: 2007
Language: English
Pages: 682