Home | GitHub | Speaking Engagements | Terms | E-mail
Predicting Hotel Cancellations with Machine Learning
The purpose of this project is to predict hotel cancellations for two separate hotels in Portugal (H1 and H2), both on a classification and time series basis. Included in the GitHub repository are the datasets and notebooks for all models run. The Python version used is 3.6.5.
The original datasets and research by Antonio et al. can be found here: Using Data Science to Predict Hotel Booking Cancellations
The classification models were built using data from the H1 dataset, with predictions then compared to the H2 dataset.
Time series forecasting was conducted on H1 and H2 independently.
Xgboost showed the best recall performance at 94%. In other words, the model correctly identified 94% of all customers who cancelled their hotel booking across the H2 dataset.
LSTM showed strong results in predicting the weekly ADR value. The mean weekly ADR for H1 was 160.49 with an RMSE of 33.77. The mean weekly ADR for H2 was 131.42 with an RMSE of 38.15.
ARIMA showed stronger performance in predicting weekly cancellations across H1, while LSTM showed stronger performance across the H2 dataset.
Each individual article with relevant findings can be accessed as below: