Demand forecasting: Using Machine Learning techniques

Demand forecasting: Using Machine Learning techniques

According to Evan J. Douglas, “Demand forecasting may be defined as the process of finding values for demand in future time periods.” Demand forecasting plays an important role in effective decision making about sales, production, maintaining inventories etc. by minimizing the uncertainty. Only demand data captured at equal time intervals, for example every week, month, quarter or year.

Based on the understanding that future predictions depend on past data, they recognize the trend or pattern in the available historical data to make future predictions. These types of methods do not use events directly in the algorithm to give the forecast. Most popular are the statistical methods such as Simple Moving Average (SMA), Exponential Smoothing and Autoregressive Integration Moving Average (ARIMA) etc.

With the larger data availability, we could also leverage the advancements in machine learning for forecasting. Machine learning methods could achieve better performance with correct data preparation strategies. As they are computationally extensive, careful evaluation might be needed for the given data.

There are various advantages and disadvantages for each statistical and machine learning method used for forecasting. It is unlikely that a single selected method will give a good performance on all datasets available. Therefore, we use various methods for forecasting on the data with the mentioned general steps

  1. Choose the correct seasonality value depending on whether the data is weekly, monthly, quarterly, or yearly. 
  2. Collect and prepare the data according to the method requirement. Handle the missing data values as we need data values at equal time intervals.
  3. Check if we need to transform the data and apply an appropriate transformation to get a better forecast.
  4. Initiate and fit the model on historical data for a different set of parameters. Select parameters for the best model based on evaluation metrics.
  5. Forecast using the selected model for the next desired time steps.
  6. Perform inverse of data transformation on forecasted data if transformed earlier.


Statistical Models focus on understanding the relationships between variables, make specific assumptions about the data and are often simpler and more interpretable. Machine Learning Models on the other hand focus on predictive accuracy, can handle more complex relationships without strong assumptions and are often more complex and computationally intensive.

If there are multiple data sets available, the model can be trained on various aspects, which leverages Univariate or Multivariate outcome comparisons to identify which of the data variables are impacting the outcome and predicting more closure trends for historical data.

Both univariate and multivariate analyses are valuable tools, and the choice between them depends on the research questions and the complexity of the data. Univariate analysis provides a foundational understanding of individual variables, while multivariate analysis offers insights into the relationships and interactions among multiple variables.

Most popular ML techniques:

  • Auto-ARIMA
  • Ensemble
  • Holt-Winters – Additive
  • Holt-Winters – Multiplicative
  • LSTM RNN- Additive Hybrid
  • LSTM RNN – Multiplicative Hybrid
  • DNN Univariate


If we want to make the demand forecasting a continuous process, we need to
refresh these models periodically with the new data considering the forecast depends significantly on recent history. We follow the above-mentioned procedure iteratively whenever the actual data for the next time step is made available. The updated model decreases uncertainty in the next forecasts as it gains knowledge about the immediate past input.

It becomes important to select the correct method for the data from the pool of various methods that can be used. One way is to compare the output of various methods visually which involves a lot of human intervention. Another effective way is to compare quantitatively by using different evaluation metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), etc. Evaluation metrics could vary depending upon different use cases. An effective combination of these two ways could make the selection process faster and more accurate.

The most common use case for machine learning based forecasting is at sub-national level. Generally, we do forecast at national level and use the split % based on historical trends to determine goals for territories or regions. Using machine learning techniques, we can train the model on data about territories or look at other granular level data to understand if there is any missing data value or that is the typical way it functions and cautiously trend the values accordingly. Performing forecast at account/territory/region/Zip/State level makes it easy to interpret the gaps in performance at a granular level. For subnational level, we can use the information available from various data sources like Sales (internal or external), Claims data etc. Adding multiple data sources helps build multi-variant models. The results can be triangulated at various levels.

The outcomes of subnational level forecasting can be used for goal setting, inputs for fields, need of campaigns etc.

The selected model that gives the best performance could ultimately reduce the risk in the decision-making process.

Sample Dataflow for ML techniques & Use cases of Demand Forecast