Table of Contents
- Data & Methods
4.1 Exploratory Data Analysis (EDA)
4.2.1 Input data & decomposition
4.2.2 Forecasting with HW Exponential Smoothing
4.2.3 Forecasting with ETS
4.2.4 Forecasting with ARIMA
- Discussion & Conclusion
5.1 Model evaluation
5.2 General conclusions
New house construction & sales plays a significant role in housing economy. Besides employment generation it simultaneously impacts timber, furniture and home appliances markets. New house sales is an important indicator of country’s overall economic health and direction too. Over the last 50 years, as we will see below, there has been few significant bumps and turning points in housing market, which shaped the trajectory of the US economy (and global economy to a great extent).
This notebook is targeted towards anyone interested in historical housing market and its future outlook, but specifically targeted at data scientists interested in time series data analysis and forecasting. First I state the objectives upfront along with data and methods. This is followed by some exploratory data analysis. In the second part I do forecasting in 3 different methods, then discuss the results.
Overall objectives are two-fold: (1) review and discuss historical patterns in new home sales; and (2) test different forecasting models and run short-term forecasting of new home sales.
3. Data & methods
- The time series data I am working with comes from census.gov. This is a great source for time series data sets on a large number of social, economic and business indicators.
- I downloaded the “New Singly-Family House Sold” series. This is not seasonally adjusted, but there is a seasonally adjusted series called “Annual Rate of Single-Family House Sold”. Since this has gone through some treatments already for seasonal adjustments, I rather prefered the “raw” data, which is reported monthly and not adjusted.
- Key terms in this dataset are “new”, “single-family house” and “sold”. Check out the definition of these and other related terms.
- I am doing the analysis in R. Although Python has great
tsresources, but for forecasting it is no where near to R, thanks to the
forecastpackage developed by Rob J Hyndman.
# Required packages library(fpp2) library(forecast) library(readxl) library(ggplot2) library(seasonal) library(dplyr)
# data import df = read.csv("usnewhousesold.csv", skip=6) head(df)[1:3,]
# keep only `Value` column df = df[, c(2)] # convert the values into a time series object series = ts(df, start = 1963, frequency =12)
options(repr.plot.width = 6, repr.plot.height = 3) # plot the series autoplot(series)+ xlab("Time") + ylab("New home sales '000") + ggtitle(" Figure 1: New home sales series")+ theme(plot.title = element_text(size=8))
# Seasonal sub-series plot (the horizontal bar indicates montly mean values) options(repr.plot.width = 10, repr.plot.height = 3) series_season = window(series, start=c(1963,1), end=c(2017,12)) ggsubseriesplot(series_season) + ylab(" ") + ggtitle("Figure 2: Seasonal sbseries plot")+ theme(plot.title = element_text(size=10))
options(repr.plot.width = 6, repr.plot.height = 3) # remove seasonality (monthly variation) to see yearly changes series_ma = ma(series, 12) autoplot(series_ma) + xlab("Time") + ylab("New home sales '000")+ ggtitle("Figure 3: The series after removing seasonality" )+ theme(plot.title = element_text(size=8))
options(repr.plot.width = 6, repr.plot.height = 3) # zooming in to the down time, which is clearly between 2005 to 2012 series_downtime = window(series, start=c(2005,3), end=c(2012,2)) autoplot(series_downtime) + xlab("Time") + ylab("New home sales '000")+ ggtitle(" Figure 4: New home sales down time")+ theme(plot.title = element_text(size=8))
- Figure 1 shows a clear seasonality al the way through and may be a bit of cyclic behavior.
- In terms of seasonality, not surprisingly, home sales starts to go up in the spring, peaks during summer, then goes down in the fall. This is predictable in most years, except between 2005-2012 (Figure 1 & 3).
- After removing seasonality few things stand out (figure 3): there wasn’t much movement (i.e. trend) in new home sales up until 1990, other than the seasonality and a vague 4-5 year cycle. Post-1990 the market saw a boom, a steady growth that continued until 2005 – and which is when the market started to crash. The downward spiral continues for 8 years until 2012 (Figure 4). There has been a recovery since then with a another steady growth, and with predictable seasonality (also see Figure 5), but with no cycles (mimicking 1990-2005).
- New house sales went down more than 80% from average 127k/month to 23k/month during the “crash” years. Currently at 52k/month it is still 60% down from pre-2005.
- In time series forecasting historical data is used as predictors of future values. But from Figure 3 it is clear that the distant past isn’t any good in predicting the next 5-10 years. It is also clear that the segment in the series any good for prediction it is 2012 onwards data, Clearly a better forecast predictor than the previous data (but see the Appendix for prediction with the whole series)
- Decomposed data shows a predictable seasonality and trend component (Figure 6), so it does not look like a very complex modeling is required. A Holt Winter Exponential smoothing or ARIMA should work just fine for forecasting. Nevertheless, I’m using 3 different methods to compare: ETS, HW Exponential Smoothing and ARIMA. There are plenty of literature on the internet on these forecasting methods, so I’m not going to discuss them (and there is another good reason for not discussing theories!).
options(repr.plot.width = 6, repr.plot.height = 3) # slicing 2012-2018 data as predictor series onwards2012 = window(series, start=c(2012,1), end=c(2018,9)) autoplot(onwards2012) + labs(caption="Figure 5")+ xlab("Time") + ylab("New home sales '000")+ ggtitle(" Figure 5: Predictor series 2012-2018")+ theme(plot.title = element_text(size=8))
# decomposition options(repr.plot.width = 6, repr.plot.height = 3) autoplot(decompose(onwards2012)) + ggtitle("Figure 6: Decomposition of the series")+ theme(plot.title = element_text(size=8))
4.2.2 Forecasting with HW Exponential Smoothing
# model forecast_hw=hw(onwards2012, seasonal="multiplicative", h=63)
options(repr.plot.width = 10, repr.plot.height = 3) # plot autoplot(series, series = " 1963-2011 series")+ autolayer(onwards2012, series = "Predictor series")+ autolayer(forecast_hw, series="Holt-Winter forecast")+ xlab("Time") + ylab("New home sales '000")+ ggtitle("Figure 7: HW Exponential Smoothing")+ theme(plot.title = element_text(size=8))
# point forecast for 2023 annual sales of new homes forecast2023hw=tail(forecast_hw$mean, n=12) forecast2023hw = sum(forecast2023hw) round(forecast2023hw)
# Diagnostics/accuracy test accuracy(forecast_hw)
# model description forecast_hw['model']
4.2.3 Forecasting with ETS method
# model forecast_ets = forecast(onwards2012, h=63)
options(repr.plot.width = 10, repr.plot.height = 3) # plot autoplot(series, series=" 1963-2011 series")+ autolayer(forecast_ets, series=" ETS forecast")+ autolayer(onwards2012, series=" Predictor series")+ ggtitle(" Figure 8: ETS forecasting")+ theme(plot.title = element_text(size=8))
# point forecast forecast2023ets=tail(forecast_ets$mean, n=12) forecast2023ets = sum(forecast2023ets) round(forecast2023ets)
# model diagnostics accuracy(forecast_ets)
# model description forecast_ets['model']
4.2.4 Forecasting with ARIMA
# model fit.arima = auto.arima(onwards2012, seasonal=TRUE, stepwise = FALSE, approximation = FALSE) forecast_arima = forecast(fit.arima, h=63)
options(repr.plot.width = 10, repr.plot.height = 3) # plot autoplot(series, series=" 1963-2011 series")+ autolayer(onwards2012, series=" Input series")+ autolayer(forecast_arima, series=" ARIMA Forecast")+ ggtitle(" Figure 9: ARIMA forecasting")+ theme(plot.title = element_text(size=8))
# point forecast forecast2023arima=tail(forecast_arima$mean, n=12) forecast2023arima = sum(forecast2023arima) cat("New house sold in 2023 ('000): ", round(forecast2023arima)) print('') # current cat(" Current value ('000): ", sum(tail(onwards2012, n=12)))
5. Model evaluation and Conclusion
- HW: 539.6391
- ETS: 534.3067
- ARIMA: 364.97
- HW: 1.989132
- ETS: 1.880336
- ARIMA: 1.896309
- No growth happened in new home sales from 1960s for 30 years until 1990. Then started to climb up until 2005 before starting to crash in 2005.
- New home sales declined by 75% in the 5 years between 2005 and 2010
- Sales is recovering and rising since 2012, but yet far from catching up with the pre-crash sales
- Current sales it about 630k new homes per year
- 5-year forecast until 2023 shows total home sales at 870k – a total growth of about 40% (7% per year). This is a business-as-usual scenario, i.e., IF the everything goes as is.
- The projected growth is still not even close toe pre-2005 level (>1200k/year). It can take, with curent trend, upto 2035 to catch up to 2005 level
As revealed in the report came out this week, and also reported in Wall Street Journal, seasonally adjusted rate of new home sales has declined by 8.9% in October signaling a market slowdown. Some market analysts expect this to continue, predicting post 2012 boom may be over. It will take a couple of years to understand the trend before we can say with higher certainty what the future holds for this important market segment.
forecast package made life made so much easier. Thanks to Rob J Hyndman and collaborators for the great work and graciously making the book “Forecasting: Principles and Practice” open access.
options(repr.plot.width = 10, repr.plot.height = 3) hw_series=hw(series, seasonal="multiplicative", h=63) autoplot(series, series=" Predictor series 1963-2018")+ autolayer(hw_series, series=" HW Forecast ")+ylab("New home sales thousands")