A final project for designing and implementing a web search engine using Java, PHP (Laravel), and MySQL. This project includes components for crawling, indexing, ranking, and searching web pages.
- Web Crawler: Fetches web pages and their links for processing.
- Indexer: Extracts, analyzes, and indexes content from crawled pages.
- Ranker: Implements algorithms to rank web pages based on relevance.
- Search Module: Provides a search interface to query indexed data.
Component | Programming Language | Framework/Library | Database |
---|---|---|---|
Web Crawler | Java | Jsoup | MySQL |
Indexer | Java | Custom Logic | MySQL |
Ranker | Java | Custom Logic | MySQL |
Search Interface | PHP | Laravel | MySQL |
- Fetches web pages recursively and extracts links using Jsoup.
- Stores discovered links and their relationships in the database.
- Processes HTML content to extract keywords.
- Removes stop words and applies stemming for indexing.
- Stores indexed data in the MySQL database.
- Calculates the relevance of pages based on indexed keywords and other metrics.
- Stores the ranking results in the database for efficient retrieval.
- Laravel-based front-end for querying the indexed data.
- Displays ranked search results to the user.
-
Clone the repository:
git clone https://github.com/ariantron/WebSearchEngine.git cd WebSearchEngine
-
Set up the database:
- Install MySQL and create a database named
search_engine
. - Import the schema provided in the project.
- Install MySQL and create a database named
-
Configure the database connection:
- Update the connection settings in Java and Laravel projects.
-
Build and run the components:
- Web Crawler: Compile and execute the Java classes under the
Crawler
package. - Indexer: Run the
Indexer
package for indexing web pages. - Search Interface: Start the Laravel application for querying and displaying results.
- Web Crawler: Compile and execute the Java classes under the
-
Start the Laravel development server:
php artisan serve
- Jsoup: For parsing HTML and extracting links (Jsoup on Maven).
- htmlcleaner: For cleaning and processing HTML content (htmlcleaner on Maven).
- MySQL Connector: For connecting Java applications to MySQL (MySQL Connector on Maven).
This project is licensed under the MIT License. See the LICENSE file for details.