|
The goal of this project is to build a scalable Web Crawling
System based on an opensource Web Crawler. This Web Crawler
is used to search the Internet for pages containing Eiffel
or Java source code using information retrieval methods but
will be extendable to other document types too. All fetched
pages are stored in a database and are available to perform
different analysis of the fetched code.
As a first part of the project, the Web Crawler and information
retrieval methods are developped. The second part of the project
consists of collecting data with the Web Crawler and doing one
interesting sample evaluation of the data.
|