Type of project:
Semesterarbeit SS 2004

Andri Toggenburger

Code Crawler


The goal of this project is to build a scalable Web Crawling System based on an opensource Web Crawler. This Web Crawler is used to search the Internet for pages containing Eiffel or Java source code using information retrieval methods but will be extendable to other document types too. All fetched pages are stored in a database and are available to perform different analysis of the fetched code.

As a first part of the project, the Web Crawler and information retrieval methods are developped. The second part of the project consists of collecting data with the Web Crawler and doing one interesting sample evaluation of the data.

Final report (PDF)