We are provided with a hotel bookings dataset.
Out main objective is perform EDA on the given dataset and draw useful conclusions about general trends in hotel bookings and how factors governing hotel bookings interact with each other.
Have you ever pondered the ideal season of the year to reserve a hotel room? Or the ideal duration of stay to obtain the most affordable daily rate? What to do if you wanted to forecast whether a hotel will unreasonably receive a lot of unique requests? his lodging You can better investigate those questions by using a dataset! This data collection comprises booking information for a resort hotel and a city hotel, as well as information about the date the reservation was made. duration of stay. among other variables, the number of adults, kids, and/or babies, and the quantity of parking places. The deta no longer contains any information that might be used to individually identify you. Investigate and evaluate the data to find crucial elements.
To find best time of year to book a hotel room
We are given a hotel bookings dataset. This dataset contains booking information for a city hotel and a resort hotel. It contains the following features.
- hotel: Name of hotel ( City or Resort)
- is_canceled: Whether the booking is canceled or not (0 for no canceled and 1 for canceled)
- lead_time: time (in days) between booking transaction and actual arrival.
- arrival_date_year: Year of arrival
- arrival_date_month: month of arrival
- arrival_date_week_number: week number of arrival date.
- arrival_date_day_of_month: Day of month of arrival date
- stays_in_weekend_nights: No. of weekend nights spent in a hotel
- stays_in_week_nights: No. of weeknights spent in a hotel
- adults: No. of adults in single booking record.
- children: No. of children in single booking record.
- babies: No. of babies in single booking record.
- meal: Type of meal chosen
- country: Country of origin of customers (as mentioned by them)
- market_segment: What segment via booking was made and for what purpose.
- distribution_channel: Via which medium booking was made.
- is_repeated_guest: Whether the customer has made any booking before(0 for No and 1 for
Yes)
- previous_cancellations: No. of previous canceled bookings.
- previous_bookings_not_canceled: No. of previous non-canceled bookings.
- reserved_room_type: Room type reserved by a customer.
- assigned_room_type: Room type assigned to the customer.
- booking_changes: No. of booking changes done by customers
- deposit_type: Type of deposit at the time of making a booking (No deposit/ Refundable/ No refund)
- agent: Id of agent for booking
- company: Id of the company making a booking
- days_in_waiting_list: No. of days on waiting list.
- customer_type: Type of customer(Transient, Group, etc.)
- adr: Average Daily rate.
- required_car_parking_spaces: No. of car parking asked in booking
- total_of_special_requests: total no. of special request.
- reservation_status: Whether a customer has checked out or canceled,or not showed
- reservation_status_date: Date of making reservation status.
- Total number of rows in data: 119390
- Total number of columns: 32
All duplicate rows were dropped.
- Null values in columns
companyandagentwere replaced by 0. - Null values in column
countrywere replaced by 'others'. - Null values in column
childrenwere replaced by the mean of the column.
- Changed data type of
children,company,agentto int type. - Changed data type of
reservation_status_dateto date type.
- One outlier was found in the
adrcolumn. Simply dropped it.
Performed EDA and tried answering the following questions:
Q1) Which countries have the most passengers.
Q2) Find hotel who have the maximum ADR (Average Daily Rate)
Q3) Find the Average of total ADRs
Q4) Define the average of number of nights stayed.
Q5) Define the hotel and country of people who had 5 special requests
Q6) Define the people country whom reserved a hotel with most number of babies and children
Mainly performed using Matplotlib and Seaborn library and the following graph and plots had been used:
- Bar Plot.
- Histogram.
- Scatter Plot.
- Pie Chart.
- Line Plot.
- Heatmap.
(1) There are 66.4% city Hotels and 33.6% Resort Hotels.
(2) In month of january and february lead time is low.
(3) Contract ' of Customer Types has the most stay duration.
(4) City Hotel has the most visitors.
(5) The people from PRT country reserved a hotel with most number of babies and children.
(6) PRT Country has most Passengers.
(7) City Hotel has the maximum ADR (Average Daily Rate) which is 5400.0
(8) Agent no. 9 has made most no. of bookings.
(9) Most demanded room type is A, but better adr rooms are of type H, G and C also. Hotels should increase the no. of room types A and H to maximise revenue.
(1) There was a lot of duplicate data.
(2) Choosing appropriate visualization techniques to use was difficult.
(3) A lot of null values were there in the dataset.
https://kumarmhaske.github.io/EDA_Hotel_Bookings_Data_Analysis/ Tap here for a quick preview.