Skip to content

As a new immigrant in Israel I keep hearing that I can forget about buying a home here. Prices are high and keep rising. I decided to go and investigate this deeper. Hence, this is a web scraper for a real estate Israeli website.

License

Notifications You must be signed in to change notification settings

lnros/real-estate-web-scraping

Repository files navigation

Web scraping OnMap

This web scraper get the data from properties for sale and rent from the Israeli OnMap website.

The website has four main data sources: buy, rent, new homes and commercial data.

Listing type Description
Buy Properties for sale
Rent Properties for rent
Commercial Commercial properties for rent
New homes Properties that are on planning or construction phase

The scraper

The scraper is built using a mixture of Selenium and BeautifulSoup. Selenium is in charge of scrolling each webpage to the bottom so that BeautifulSoup can read the entire HTML.

Installation

Make sure to install all the required packages for the scraper to work:

$ pip install -r requirements.txt

If you are planning on storing the scraped information in a database, please install MySQL.

Then to create the database structure:

$ mysql -u <username> -p < db/on_map.sql

Make sure to change the values in the DBConfig class in config.py in order to match your database configuration.

Usage

Run web_scraper.py from the Command Line.

usage: web_scraper.py [-h] [--limit n] [--print] [--save] [--database]
                      [--fetch] [--verbose]
                      {buy,rent,commercial,new_homes,all}

Scraping OnMap website | Checkout https://www.onmap.co.il/en/

positional arguments:
  {buy,rent,commercial,new_homes,all}
                        choose which type of properties you would like to
                        scrape

optional arguments:
  -h, --help            show this help message and exit
  --limit n, -l n       limit to n number of scrolls per page
  --print, -p           print the results to the screen
  --save, -s            save the scraped information into a csv file in the
                        same directory
  --database, -d        inserts new information found into the on_map database
  --fetch, -f           fetches more information for each property using
                        Nominatim API
  --verbose, -v         prints messages during the scraper execution

Fetching additional information

Using the GeoFetcher class, we are able to add more geolocation information to each property. This class is based on Geopy and uses Nominatim as the geolocation service. Even though we are fetching the information asynchronously with asyncio and AioHTTPAdapter, since Nominatim provides a free service, its request limit is low. Thus, some properties may appear with None features after fetching additional information. If you wish, you can increase the DELAY_TIME in conf.py as a way to obtain all the information.

The database

The current ERD for the of this project is:

  • In property_types, we have whether the property is an apartment, penthouse, cottage, and so on.

  • In cities, we have all the city names of the properties.

  • In listings, we have the listing types offered on the website: buy, rent, commercial, new homes.

  • In properties, each record is a different property in the website, providing address, price, number of rooms, in which floor it is located, the area and the number of parking spots available. If the property is under constructions, the ConStatus tells what the construction status is. Latitude, longitude, and details in Hebrew are obtaibed using GeoPy with Nominatim service and might not be available for all properties due to request limitations since Nominatim is a free and limited API.

Known issues

  • The database currently is not 100% in accordance with 3NF standards. The additional data fetched from the API is not normalized.

  • The API performance can be furthered enhanced.

  • For macOS users, there is a known error when using geckodriver. The error is:

OSError: [Errno 86] Bad CPU type in executable: '/Users/username/.wdm/drivers/geckodriver/macos/v0.30.0/geckodriver'

And the fix is:

$ cd ~/.wdm/drivers/geckodriver/macos/v0.30.0
$ curl -Lsk -O https://github.com/mozilla/geckodriver/releases/download/v0.30.0/geckodriver-v0.30.0-macos.tar.gz
$ ls 
geckodriver
geckodriver-v0.30.0-macos-aarch64.tar.gz
geckodriver-v0.30.0-macos.tar.gz
$ rm geckodriver
$ tar zxvf geckodriver-v0.30.0-macos.tar.gz 

Short presentation

For a short presentation with some data on rent in Israel and specifically in Tel Aviv, click here.

Authors

@lnros - Leonardo Rosenberg
@Shahar9772 - Shahar Shoshany

About

As a new immigrant in Israel I keep hearing that I can forget about buying a home here. Prices are high and keep rising. I decided to go and investigate this deeper. Hence, this is a web scraper for a real estate Israeli website.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages