Cancellations are among the most frustrating (and costly) irritations in the hotel business. Rooms that are blocked for guests who don’t show up result in lost revenue and ruin forecasts. What if hotels could predict, with a fair degree of accuracy, which bookings will be cancelled? This question is not only interesting from an academic point of view. It also influences the way hotels should think about pricing, inventory and profitability.
In Revenue Management Systems (RMS), where key decisions about pricing, overbooking and inventory management are made, this issue becomes particularly important. These systems work off of assumptions made in projections about how guests will behave, as we have already seen in the Essential analytical features in an RMS.
However, if a reservation is likely to be canceled (and this probability isn’t factored in) the RMS may make mistakes. For example, undervaluing or overvaluing a room, misallocating inventory, or losing a customer willing to pay a lot of money for a room. By taking into account the amount of cancellations in the revenue management policy, hotels can achieve the perfect balance between occupancy and price.
Techniques for predicting hotel booking cancellations
There are a number of techniques that data scientists use to predict whether a customer will cancel a reservation. The three main methods are Regression models, Machine Learning algorithms and Deep Learning networks. In the following, we will discuss each of them and what they bring to the table.
Regression: Simplicity and Interpretability
Logistic regression is usually the first step in this group of techniques. It is simple, statistically based, and comes up with probability estimates. Using past data such as when the booking was made, who the customer is and on which channel the booking was made, it can generate the probability that the booking will be cancelled.
The benefit of regressions is the transparency of the results. Hoteliers can very easily interpret what role each feature plays in the final prediction. For example, they may find that last-minute bookings made on mobile devices have greater cancellation rate. This interpretability builds trust. It also helps to ensure that business insights are taken into account in predictive modelling.
On the other hand, regression too has its limitations. For example, it assumes linear relationships and does not do well with complex interactions among variables. More refined models may be needed for problems that display nonlinear patterns and have a higher dimensionality.
Machine Learning: Harnessing Complexity
Random forest and gradient boosting techniques capture nonlinearities and feature interactions without the need for that much preprocessing. They take advantage of a broader and richer feature set (weather conditions, seasonality, local events) to generate better predictions.
Machine learning often provides better predictions than a simple regression model applied in practice. It also deals with missing values more elegantly and automatically selects important features. Moreover, it can assemble a large dataset quite easily.
There is one worthwhile advantage: complexity. This is one of the main reasons why these models are less intuitive. They definitely need to be tuned correctly, with validation done during the process. Otherwise, we say goodbye to all the laws of generalisation between different customer segments and booking windows.
Deep Learning: Capturing the Subtleties
If highly complex patterns are present in the datasets, then deep learning should be your other option. Neural networks can learn very subtle patterns in guest behavior and booking trends. Deep learning models not only ingest conceptual text data from different sources. They also do so with guest stay histories and even customer reviews. All of this adds new angles to cancellation prediction.
In theory, these models are capable of achieving the highest accuracy levels. However, the flipside is that they suffer from a serious drawback. They lack transparency, are complex to train, and often come more into the realm of rocket science (unless there is so much data and infrastructure support behind such a model). Not only would the answers be more unstable. They are even more difficult to justify in a business context where explainability is sought.
This can be detrimental to the whole RMS equation. It is common for pricing decisions to need to be communicated and justified to stakeholders. Neural networks may be categorically rejected because of their opacity, unless they are combined with some explanatory technique.

How to find the optimal methodology
Generally speaking, it is not possible to choose one modelling approach as the best. The answer will depend on the needs of the business, the data it has and the risk it is willing to take. For most hotels, a fine-tuned gradient boosting model will be adequate. If you are interested in how it works, here is all you need to know about the Gradient Boosting algorithm.
This model provides useful and understandable probabilistic results integrated into RMS workflows. It is also flexible enough to represent the nuances of guest behaviour.
Whatever modeling approach is taken, one universal truth holds true. Therefore, predicting cancellation probabilities is no longer just a data science challenge, it is an utmost business application. In environments where margins are slim and competition fierce, a hotel that understands the intent of the potential guest at the time of booking has a powerful tool in its hands.
Those hotels that continue to take up this challenge will recover lost revenue and make better decisions by optimising prices. In this way, they will be able to deliver a better guest experience.
Conclusion
Regression, machine learning and deep learning are three of the most common methods used by data science teams to make predictions. In this post, we have delved into each of them to find out how they could be practically applied to forecasting booking cancellations in the hotel industry.
If you found this article interesting, we encourage you to visit the Data Science category to see other posts similar to this one and to share it on social networks. See you soon!