Predicting Recession: Forward looking recession forecast based on real time variables

By: Yash Jain
Vani Bhasin

This winning article from our research challenge was published on Global Banking & Finance


“Recession” is a recurrent and widely discussed topic among academicians and scientists. The 2007-09 global financial crisis was an excruciating reminder of this phenomenon  and recently it has come again into the limelight due to fears of another downturn in the global economy courtesy of the COVID – 19 pandemic.

Despite its significance for every economy, the most widely accepted definition of recession – to wit, “at least two consecutive quarters of decline in national output” – is not as useful as it could be. Reliable GDP estimates are only available after a time lag, so by this definition a recession can’t be identified until it’s already well underway – effectively the same as being warned a ship’s hull has been compromised when it’s already half sunk. This of course limits opportunities to implement counter policy measures which could have reduced the impact of the crisis much earlier.

For most economists, a more meaningful definition of a recession is an extended period of below-trend or ‘below potential’ growth. Trend or potential growth can be hard to measure, but for countries with high rates of population and/or productivity growth, real GDP does not need to be negative for conditions consistent with recession to develop per this approach.

An alternative definition of recession by the National Bureau of Economic Research is “a significant decline in economic activity spread across the economy, lasting more than a few months, normally visible in real GDP, real income, employment, industrial production, and wholesale-retail sales.” As the NBER itself emphasizes, this definition does not require consecutive quarters of negative real GDP growth, despite the widespread use of this conventionally accepted definition. Because monthly GDP data is not available, this research attempts to build on NBER’s definition by offering a model for forward-looking recession prediction using other economic variables.

Importance of our study

Various policy institutions and private sector decision makers (e.g., central banks or financial system surveillance authorities for the former, commercial banks and investment funds for the latter) depend on information on the expected level of future economic activity. An accurate forecast of economic outlook is important for banks and fund managers to make timely investment decisions and for policymakers to implement precautionary policies. If a recession can be predicted accurately, decisions can be adjusted accordingly in advance. For example, bankers can improve their cash balance by providing fewer loans and adopting other investment policies while central banks can implement an expansionary monetary policy. Individuals would do better by planning their consumption, savings and investment if they know about the onset of a recession in advance.

Literature review

I . This study develops a unified framework to better forecast economic recessions than those used in the literature. There are some widely used and famous forecasting models, but our research is notably different from each one of them.

  • New York Fed(1) predicts recession probabilities using the spread between the 10-year and 3-month Treasury rates and it only provides a 12-month forecast. We built our model using multiple economic variables to answer whether or not the US economy will be in recession in the coming months.
  • Rabobank(2) has a recession probability model, but it is also only based on a single variable (the spread between the 10-year and 1-year Treasury rates), and only covers one time period (17-months horizon).
  • Wells Fargo Economics(3) has a few recession probability models that use a combination of economic and market data but they haven’t released their model and no one can use it. They only release their forecast results based on the model but the variables and their impact on the model is not known. Additionally, they limited their forecast to quarterly results.

II. Most of the existing research only uses one or two primary financial variables (interest rates and spreads, stock prices, monetary aggregates etc.) to predict recession. This is an attempt to avoid problems like overfitting which results in poor performance of the predictive model and large changes even to minor fluctuations in data.

  • Estrella and Mishkin wrote a research paper (4) in 1998 on predicting recession which provided one of the foundational analyses on this topic. They used financial variables like stock prices, interest rates and spread as leading indicators. Results are published in quarterly form. Then Arturo Estrella, Anthony P. Rodrigues, and Sebastian Schich (5) conducted an empirical study on predicting recession by using only the yield curve as an indicator in 2003. Then again in 2008, Kauppi and Saikkonen (6) wrote a research paper and chose interest rate spread as a driving indicator. Again in 2012, Rudebusch and Williams (7) research focused on use of yield curve spread as an indicator and how the overall model precision can be improved by using yield spread in place of interest rate spread which is given much more importance. Bellego and Ferrara (2009) draw principal components from a set of several indicators, ending up with three factors representing term spread variables, stock market variables and commodities, respectively, which they use to forecast euro-area recessions. Engemann, Kliesen, and Owyang (2010) find that oil prices have considerable predictive power for US recessions. Finally, Wright (2006) shows that using both the level of the federal funds rate and the term spread predict US recessions and expansions better than models based on the term spread only. Nyberg (2010) finds that stock market returns and the foreign term spread have additional predictive power to forecast recessions in the United States and Germany beyond the domestic term spread.
  • Hence from all the above literature, we can see that the limited forecasting performance of earlier models is also related to their inflexible model specifications and limited utilization of data. Earlier studies commonly apply a static Probit or Logit model that uses only a few monthly or quarterly explanatory variables to predict the probability of a recession.
  • We extended the earlier research on forecasting recessions with financial variables by

(i) covering a longer period by including data for recent years; encompassing the onset of the recent COVID-19 crisis

(ii) investigating the predictive power of additional financial variables as well as non-financial variables

(iii)Our model is not restricted to give results on a quarterly basis or at a particular time in the future but can be used to predict recession for any month 3 months in advance

  • Our approach can improve forecasting performance.

Data Selection and Sources

  • We first prepared an exhaustive list of 35 possible economic and financial variables that could be used in our analysis. As economic data is released at different frequencies (weekly, monthly, quarterly, annually etc.), we choose those variables that are available monthly to get the maximum number of observations. This means we had to forgo some variables like real GDP, real estate market prices, corporate investment, real gross private domestic investment, and net export activity since they are released quarterly.
  • We took the data on recession from 1959 onwards in order to capture as many recessions as we could. After researching periods of US recession per NBER, the full data set included 9 recessions since 1959.
  • There are some variables whose data were released after 1959 and those variables are eliminated from our observation. This means we had to exclude potentially useful data that lacked sufficient history. This included the following data sources with the associated earliest available date of data availability in parentheses:
    • DJIA Volatility index (1997)
    • Credit spread (2015)
    • Wholesale inventories (1992)
    • S&P 100 (1986)
    • 1-month real interest (1982)
    • Interest rate spread (1976)
  • We then shortlisted the following 12 variables as useful indicators after eliminating the remaining variables due to absence of monthly data and the first publishing date for the data.

Fig 1 – Initially Shortlisted Variables

  • We thought of creating new variables from the selected variables. To better model the results and find any historical trends in the dataset, we have taken percentage change for variables like M1, M2, CPI, nonfarm employees, industrial production etc; which generally increase with time and population. Taking absolute values for these variables would give spurious results and won’t make any economic sense.
  • We introduced “months since the last recession” as a new variable. On average, a recession occurred after every 8-10 years. An economy typically grows for 6 to 10 years and later is likely to go into a recession for about 6 months to 2 years. Thus, economic recession is a declining phase of the business life cycle when their decline in economic activities spread across the economy.
  • Instead of taking only percentage change in C&I, we estimated C&I as a percentage of the previous quarter’s GDP and then instead of using S&P percentage change, we calculated S&P drawdown (S&P drawdown = 1-(actual value/ max since last 12 months)) to account for the stock market volatility.



  • Future recessionary periods will follow the same trend as the past recessionary periods.
  • There are a lot of variables that change when the economy is in recession. Each variable might not be a useful predictor but when combined together, they will show strong results which can be used to predict recession in the economy.


Data Analysis and Feature Selection 

  • A correlation heatmap was plotted to check for the problem of multicollinearity between the initially selected variables.

Fig. 2 – Correlation matrix for shortlisted variables

  • We observed from the above Pearson correlation matrix that correlation of federal funds rate with 3-month treasury bill market rate is very high. It is true for market yield on US treasury Securities at 1 yr maturity, market yield on US treasury Securities at 5 yr maturity and market yield on US treasury Securities at 10 yr maturity as well.

Hence, we will remove all three market yield variables as well 3-month treasury bill market rate to prevent the problem of multicollinearity. The reason for not eliminating federal funds effective rate as part of our feature selection process was that the importance of this variable was one of the highest in XG Boost model.

  • We also removed M1% change due to its high correlation with M2.

We then analyzed how far out (1-6 months in advance) can each of the remaining variables be used to predict future recessionary periods. Based on the feature importance we found for variables in different scenarios, we finalized different time lags for different variables. For example, inflation rate in a given month is used to predict recession 2 months forward, savings rate is used to predict recession 6 months forward and so on


Fig. 3 – Time lag to predict recession for different variables


  1. Different models

Our dependent variable is a binary class (1 as a recession and 0 as not a recession), and we have used different classification models such as logistic regression, random forest and XGboost.

We then plotted a ROC-AUC curve to compare the performance of each model. The Receiver Operator Characteristic (ROC) curve is an evaluation metric for binary classification problems to visualize how well our machine learning classifier is performing. As per the model benchmarking analysis, XG-Boost was found to be the most predictive model with an AUC score of 0.97.


Fig. 4 – ROC curve for different predictive models

For better understanding of performance of each model we have also calculated recall, precision and F1 score for the 6-month shift.


Recall, Precision and F1 Score are calculated based on the test set.

Fig. 5 – Evaluation metrics for different models

From the above area under the curve and F1 score we can observe that the XGboost is performing best among these models. Also, the F1 Score on the test set is also within range, indicating our model is not overfitted.


Fig. 6 – Feature importance based on XGboost

The XGBoost Feature Importance shows the best variables for our research. Hence, we will further analyze the impact of the top 3 variables- industrial production % change, S&P drawdown, and fed fund effective rate- on predicting recession.

b. Feature direction

Fig. 7 – Direction and magnitude of impact of top variables on recession probabilities


Fig. 8 – Fed funds rate across recessionary periods

We can clearly see that the Fed rate increases before recession and mostly breaks the high made since the last recession. After that the fed rate falls rapidly. Model also shows the same relation. When the Fed rate increased, the probability of recession also increased.


Fig. 9 – S&P drawdown across recessionary periods

We can see in this graph that S&P is low before the recessionary period but increases rapidly given greater stock market volatility in a recession. The table above also indicates that an increase in the S&P drawdown will decrease the probability of recession because we are predicting recession after 6 months.

Observation and Results:

A. Model Result


Fig. 10 – Predicting recession probabilities using XGBoost

Thus, our model proposes that a better method of predicting and understanding recession follows our top three variables- Fed Fund Effective Rate, S&P drawdown and percentage change in industrial production- and observe trends over a period of 6 months to predict the possibility of recession. Our model’s use of data available on a real time basis is a considerable factor in its favor, especially when this data can be used as an immediate predictor of recession. Using our model, we also try to predict the probability of recession for different months in 2023

B. Future Recession Prediction


Fig. 11 – Recession prediction for 2022 using XGBosst

As seen in the table above, the probability of recession was very high in February and March per our model which is similar to the general economy predictions that the U.S. economy contracted in the first quarter as the trade deficit widened to a record high and a resurgence in COVID-19 infections curbed spending on services like recreation. The Commerce Department’s third estimate of gross domestic product showed some underlying softness in the economy, with consumer spending revised lower and inventories higher than reported last month with GDP falling at a 1.6% annualized rate last quarter. That was the first drop in GDP since the short and sharp pandemic recession nearly two years ago. Similarly, in the recession probabilities for July and August are predicted as high by our model and while the economic output contracted for two consecutive quarters in the first half of 2022, a strong labor market hinted that we are likely not in recession as the US produced 528,000 jobs in a month during July.


The model can thus have powerful implications for different macro and microeconomic stakeholders to adjust their policy measures well ahead in time based on the recession probabilities.




  8. FED, World Bank, Yahoo finance