Time Series Analysis & Forecasting of New Home Sales

Table of Contents

  1. Introduction
  2. Objectives
  3. Data & Methods
  4. Results
    4.1 Exploratory Data Analysis (EDA)
    4.2 Forecasting
    4.2.1 Input data & decomposition
    4.2.2 Forecasting with HW Exponential Smoothing
    4.2.3 Forecasting with ETS
    4.2.4 Forecasting with ARIMA
  5. Discussion & Conclusion
    5.1 Model evaluation
    5.2 General conclusions
    5.3 Discussion

1. Introduction

New house construction & sales plays a significant role in housing economy. Besides employment generation it simultaneously impacts timber, furniture and home appliances markets. New house sales is an important indicator of country’s overall economic health and direction too. Over the last 50 years, as we will see below, there has been few significant bumps and turning points in housing market, which shaped the trajectory of the US economy (and global economy to a great extent).
This notebook is targeted towards anyone interested in historical housing market and its future outlook, but specifically targeted at data scientists interested in time series data analysis and forecasting. First I state the objectives upfront along with data and methods. This is followed by some exploratory data analysis. In the second part I do forecasting in 3 different methods, then discuss the results.

2. Objectives

Overall objectives are two-fold: (1) review and discuss historical patterns in new home sales; and (2) test different forecasting models and run short-term forecasting of new home sales.

3. Data & methods

  • The time series data I am working with comes from census.gov. This is a great source for time series data sets on a large number of social, economic and business indicators.
  • I downloaded the “New Singly-Family House Sold” series. This is not seasonally adjusted, but there is a seasonally adjusted series called “Annual Rate of Single-Family House Sold”. Since this has gone through some treatments already for seasonal adjustments, I rather prefered the “raw” data, which is reported monthly and not adjusted.
  • Key terms in this dataset are “new”, “single-family house” and “sold”. Check out the definition of these and other related terms.
  • I am doing the analysis in R. Although Python has great ts resources, but for forecasting it is no where near to R, thanks to the forecast package developed by Rob J Hyndman.
# Required packages
library(fpp2)
library(forecast)
library(readxl)
library(ggplot2)
library(seasonal)
library(dplyr)
# data import
df = read.csv("usnewhousesold.csv", skip=6)
head(df)[1:3,]
# keep only `Value` column
df = df[, c(2)]
# convert the values into a time series object
series = ts(df, start = 1963, frequency =12)

4. Results

4.1 EDA

options(repr.plot.width = 6, repr.plot.height = 3)

# plot the series
autoplot(series)+ 
xlab("Time") + ylab("New home sales '000") + ggtitle(" Figure 1: New home sales series")+
theme(plot.title = element_text(size=8))
# Seasonal sub-series plot (the horizontal bar indicates montly mean values)
options(repr.plot.width = 10, repr.plot.height = 3)

series_season = window(series, start=c(1963,1), end=c(2017,12))
ggsubseriesplot(series_season) + ylab(" ") +
ggtitle("Figure 2: Seasonal sbseries plot")+
theme(plot.title = element_text(size=10))
options(repr.plot.width = 6, repr.plot.height = 3)

# remove seasonality (monthly variation) to see yearly changes
series_ma = ma(series, 12)
autoplot(series_ma) + 
xlab("Time") + ylab("New home sales '000")+
ggtitle("Figure 3: The series after removing seasonality" )+
theme(plot.title = element_text(size=8))
options(repr.plot.width = 6, repr.plot.height = 3)

# zooming in to the down time, which is clearly between 2005 to 2012
series_downtime = window(series, start=c(2005,3), end=c(2012,2))
autoplot(series_downtime) + 
xlab("Time") + ylab("New home sales '000")+
ggtitle(" Figure 4: New home sales down time")+
theme(plot.title = element_text(size=8))
  • Figure 1 shows a clear seasonality al the way through and may be a bit of cyclic behavior.
  • In terms of seasonality, not surprisingly, home sales starts to go up in the spring, peaks during summer, then goes down in the fall. This is predictable in most years, except between 2005-2012 (Figure 1 & 3).
  • After removing seasonality few things stand out (figure 3): there wasn’t much movement (i.e. trend) in new home sales up until 1990, other than the seasonality and a vague 4-5 year cycle. Post-1990 the market saw a boom, a steady growth that continued until 2005 – and which is when the market started to crash. The downward spiral continues for 8 years until 2012 (Figure 4). There has been a recovery since then with a another steady growth, and with predictable seasonality (also see Figure 5), but with no cycles (mimicking 1990-2005).
  • New house sales went down more than 80% from average 127k/month to 23k/month during the “crash” years. Currently at 52k/month it is still 60% down from pre-2005.

4.2 Forecasting

  • In time series forecasting historical data is used as predictors of future values. But from Figure 3 it is clear that the distant past isn’t any good in predicting the next 5-10 years. It is also clear that the segment in the series any good for prediction it is 2012 onwards data, Clearly a better forecast predictor than the previous data (but see the Appendix for prediction with the whole series)
  • Decomposed data shows a predictable seasonality and trend component (Figure 6), so it does not look like a very complex modeling is required. A Holt Winter Exponential smoothing or ARIMA should work just fine for forecasting. Nevertheless, I’m using 3 different methods to compare: ETS, HW Exponential Smoothing and ARIMA. There are plenty of literature on the internet on these forecasting methods, so I’m not going to discuss them (and there is another good reason for not discussing theories!).

4.2 Forecasting

options(repr.plot.width = 6, repr.plot.height = 3)

# slicing 2012-2018 data as predictor series
onwards2012 = window(series, start=c(2012,1), end=c(2018,9))
autoplot(onwards2012) + labs(caption="Figure 5")+ xlab("Time") + ylab("New home sales '000")+
ggtitle(" Figure 5: Predictor series 2012-2018")+
theme(plot.title = element_text(size=8))
# decomposition
options(repr.plot.width = 6, repr.plot.height = 3)
autoplot(decompose(onwards2012)) + ggtitle("Figure 6: Decomposition of the series")+
theme(plot.title = element_text(size=8))

4.2.2 Forecasting with HW Exponential Smoothing

# model
forecast_hw=hw(onwards2012, seasonal="multiplicative", h=63)
options(repr.plot.width = 10, repr.plot.height = 3)

# plot
autoplot(series, series = " 1963-2011 series")+ 
autolayer(onwards2012, series = "Predictor series")+
autolayer(forecast_hw, series="Holt-Winter forecast")+
 xlab("Time") + ylab("New home sales '000")+
ggtitle("Figure 7: HW Exponential Smoothing")+
theme(plot.title = element_text(size=8))
# point forecast for 2023 annual sales of new homes
forecast2023hw=tail(forecast_hw$mean, n=12)
forecast2023hw = sum(forecast2023hw)
round(forecast2023hw)
866
# Diagnostics/accuracy test
accuracy(forecast_hw)
# model description
forecast_hw['model']

4.2.3 Forecasting with ETS method

# model
forecast_ets = forecast(onwards2012, h=63)
options(repr.plot.width = 10, repr.plot.height = 3)

# plot
autoplot(series, series=" 1963-2011 series")+
autolayer(forecast_ets, series=" ETS forecast")+
autolayer(onwards2012, series=" Predictor series")+
ggtitle(" Figure 8: ETS forecasting")+
theme(plot.title = element_text(size=8))
# point forecast
forecast2023ets=tail(forecast_ets$mean, n=12)
forecast2023ets = sum(forecast2023ets)
round(forecast2023ets)
628
# model diagnostics
accuracy(forecast_ets)
# model description
forecast_ets['model']

4.2.4 Forecasting with ARIMA

# model
fit.arima = auto.arima(onwards2012, seasonal=TRUE, stepwise = FALSE, approximation = FALSE) 
forecast_arima = forecast(fit.arima, h=63)
options(repr.plot.width = 10, repr.plot.height = 3)

# plot
autoplot(series, series=" 1963-2011 series")+
autolayer(onwards2012, series=" Input series")+
autolayer(forecast_arima, series=" ARIMA Forecast")+
ggtitle(" Figure 9: ARIMA forecasting")+
theme(plot.title = element_text(size=8))
# point forecast
forecast2023arima=tail(forecast_arima$mean, n=12)
forecast2023arima = sum(forecast2023arima)
cat("New house sold in 2023 ('000): ", round(forecast2023arima)) 

print('')

# current
cat(" Current value ('000): ", sum(tail(onwards2012, n=12)))

5. Model evaluation and Conclusion

Model comparison

  • AIC
    • HW: 539.6391
    • ETS: 534.3067
    • ARIMA: 364.97
  • MAE
    • HW: 1.989132
    • ETS: 1.880336
    • ARIMA: 1.896309

Main conclusions

  • No growth happened in new home sales from 1960s for 30 years until 1990. Then started to climb up until 2005 before starting to crash in 2005.
  • New home sales declined by 75% in the 5 years between 2005 and 2010
  • Sales is recovering and rising since 2012, but yet far from catching up with the pre-crash sales
  • Current sales it about 630k new homes per year
  • 5-year forecast until 2023 shows total home sales at 870k – a total growth of about 40% (7% per year). This is a business-as-usual scenario, i.e., IF the everything goes as is.
  • The projected growth is still not even close toe pre-2005 level (>1200k/year). It can take, with curent trend, upto 2035 to catch up to 2005 level

Discussion

As revealed in the report came out this week, and also reported in Wall Street Journal, seasonally adjusted rate of new home sales has declined by 8.9% in October signaling a market slowdown. Some market analysts expect this to continue, predicting post 2012 boom may be over. It will take a couple of years to understand the trend before we can say with higher certainty what the future holds for this important market segment.

Acknowledgement

forecast package made life made so much easier. Thanks to Rob J Hyndman and collaborators for the great work and graciously making the book “Forecasting: Principles and Practice” open access.

Appendix

options(repr.plot.width = 10, repr.plot.height = 3)

hw_series=hw(series, seasonal="multiplicative", h=63)
autoplot(series, series=" Predictor series 1963-2018")+
autolayer(hw_series, series=" HW Forecast ")+ylab("New home sales thousands")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s