Predicting the demise of retail bookstores: a time series forecasting

Results and Discussion

“The internet is killing retail. Bookstores are just the first to go.” — quoted in the NYT article. Retail bookstores are in death row; looks like it’s just a matter of time for those to be history. eBooks are partly to blame, but with eBook sales leveling off recently, the remaining affect seems to be online book sales, dominated by, with no surprise!, Amazon.
So, exactly how long retail bookstores are going to survive? Is it going to be “Gradually and then suddenly”? Or could it be just 5 years from now? One might wonder: “5 years? no way!!”. But imagine the Amazon effect just few years ago; or the Facebook effect in 2008.

To examine this I’ve got a nice time series dataset from the Census Bereau database. This is a monthly retail sales data (in millions $) from bookstores all across the country. Data are collected monthly as part of Monthly Retail Trade Survey and covers 26 years since 1992.
A number of popular forecasting methods are out there. Since the data has seasonality (shown in Figure X), I choose to run the Holt-Winter Exponential Smoothing (HW) — a popular forecasting method in machine learning field ( I also ran ARIMA, but the results are less definitive (see appendix)). I am not going into technical discussion on forecasting methods as there’s a bunch of materials available online.

So what exactly does the analysis tell? It shows that bookstore sales peaked around 2007, but since then the sales are going downlhill. The HW forecasting predicts that the bookstores may at best survive another 15-20 years from now. This roughly puts the lifeline around 2040.

Signs are everywhere. Book World is closing it’s stores. Barnes & Noble closed 10% of it’s stores in just the last 5/6 years and this February it shedded 1800 jobs. This will keep accelerating in the next few years. That said, some bookstores may well survive beyond 2040, but not as traditional stores, rather as antique books stores.

Code book

# data import 
df = read.csv("../BookSales.csv", skip=6)
# Required packages
# keep only the `Value` column  
df = df[, c(2)]
# convert the values into a time series object  
series = ts(df, start = 1992, frequency =12)
options(repr.plot.width = 6, repr.plot.height = 3)   
# plot the series
autoplot(series)+ xlab(" ") + ylab("Retail sales (million US$)") + ggtitle(" Figure 1: Bookstores sales series")+ theme(plot.title = element_text(size=8))
options(repr.plot.width = 6, repr.plot.height = 3)

# Aggregate annual sales
annual_sales=aggregate(series, nf=1, FUN=sum) # nf=1 > annual; nf=4 > quarterly; nf=12 > monthly 
# Seasonal sub-series plot  
options(repr.plot.width = 10, repr.plot.height = 3)

series_season = window(series, start=c(1992,1), end=c(2018,10))

ggsubseriesplot(series_season) + ylab(" ") + ggtitle("Figure 2: Seasonal sub-series plot (horizontal bars represent monthly mean)")+ ylab("Retail sales (million US$)")+ theme(plot.title = element_text(size=10))
options(repr.plot.width = 6, repr.plot.height = 3)
# remove seasonality (monthly variation) to see yearly changes
series_ma = ma(series, 12)
autoplot(series_ma) +
xlab("Time") + ylab("Retail sales (million US$)")+
ggtitle("Figure 3: The series after removing seasonality" )+
theme(plot.title = element_text(size=8))
options(repr.plot.width = 6, repr.plot.height = 3)
# zooming in to the recent trend
series_downtime = window(series_ma, start=c(2007,1), end=c(2018,10))
autoplot(series_downtime) +
xlab("Time") + ylab("Retail sales (million US$)")+
ggtitle(" Figure 4: Bookstores sales recent trend")+
theme(plot.title = element_text(size=8))
# decomposition
options(repr.plot.width = 6, repr.plot.height = 3)
autoplot(decompose(series)) + ggtitle("Figure 6: Decomposition of the series")+
theme(plot.title = element_text(size=8))
# predictor series
predictor_series = window(series, start=c(2007,1), end=c(2018,10))

# model
forecast_hw=hw(predictor_series, seasonal="multiplicative", h=260)

options(repr.plot.width = 10, repr.plot.height = 3)
# ploting the predictions
autoplot(series, series = " 1963-2011 series")+
autolayer(series_ma, series = "Predictor series")+
autolayer(forecast_hw, series="Holt-Winter forecast")+
xlab("Time") + ylab("Retail sales (million US$)")+
ggtitle("Figure 7: HW Exponential Smoothing")+
theme(plot.title = element_text(size=8))


# Fiting & prediction with ARIMA
fit.arima = auto.arima(series, seasonal=TRUE, stepwise = FALSE, approximation = FALSE)
forecast_arima = forecast(fit.arima, h=160)

options(repr.plot.width = 10, repr.plot.height = 3)
# ploting ARIMA predictions
autoplot(series, series=" 1992-2018 series")+
autolayer(series_ma, series=" Input series")+
autolayer(forecast_arima, series=" ARIMA Forecast")+
ggtitle(" Figure 9: ARIMA forecasting")+
theme(plot.title = element_text(size=8))

5 thoughts on “Predicting the demise of retail bookstores: a time series forecasting

  1. Hi, nice post. Thanks.
    Figure 7 needs some cleanup, however. Red series is 1992-2018, not 1963-2011. Blue one is the MA of the above, not the predictor, as the predictor is the 2007-2018 subsections of the raw series, not the averaged one.

    Liked by 1 person

  2. Hi again,

    If you run ARIMA on the same predictor as HW (i.e. 2007-2018) by the end of 2030 both predict value around 500. Looks like ARIMA puts more “consideration” 🙂 to the slower rate of decrease in sales from 2013 onwards, on the above predictor, than HW does. That might be an insight, as well.




    1. You are absolutely right. I was going to stick with HW, then realized ARIMA is a time-tested model so put it in the appendix just to “honor” it! Thanks again for checking out.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s