## Results and Discussion

“The internet is killing retail. Bookstores are just the first to go.” — quoted in the NYT article. Retail bookstores are in death row; looks like it’s just a matter of time for those to be history. eBooks are partly to blame, but with eBook sales leveling off recently, the remaining affect seems to be online book sales, dominated by, with no surprise!, Amazon.

So, exactly how long retail bookstores are going to survive? Is it going to be “Gradually and then suddenly”? Or could it be just 5 years from now? One might wonder: “5 years? no way!!”. But imagine the Amazon effect just few years ago; or the Facebook effect in 2008.

To examine this I’ve got a nice time series dataset from the Census Bereau database. This is a monthly retail sales data (in millions $) from bookstores all across the country. Data are collected monthly as part of Monthly Retail Trade Survey and covers 26 years since 1992.

A number of popular forecasting methods are out there. Since the data has seasonality (shown in Figure X), I choose to run the Holt-Winter Exponential Smoothing (HW) — a popular forecasting method in machine learning field ( I also ran ARIMA, but the results are less definitive (see appendix)). I am not going into technical discussion on forecasting methods as there’s a bunch of materials available online.

So what exactly does the analysis tell? It shows that bookstore sales peaked around 2007, but since then the sales are going downlhill. The HW forecasting predicts that the bookstores may at best survive another 15-20 years from now. This roughly puts the lifeline around 2040.

Signs are everywhere. Book World is closing it’s stores. Barnes & Noble closed 10% of it’s stores in just the last 5/6 years and this February it shedded 1800 jobs. This will keep accelerating in the next few years. That said, some bookstores may well survive beyond 2040, but not as traditional stores, rather as antique books stores.

## Code book

# data import

df = read.csv("../BookSales.csv", skip=6)

# Required packages

library(fpp2)

library(forecast)

library(readxl)

library(ggplot2)

library(seasonal)

library(dplyr)

# keep only the `Value` column

df = df[, c(2)]

# convert the values into a time series object

series = ts(df, start = 1992, frequency =12)

options(repr.plot.width = 6, repr.plot.height = 3)

# plot the series

autoplot(series)+ xlab(" ") + ylab("Retail sales (million US$)") + ggtitle(" Figure 1: Bookstores sales series")+ theme(plot.title = element_text(size=8))

options(repr.plot.width = 6, repr.plot.height = 3) # Aggregate annual sales annual_sales=aggregate(series, nf=1, FUN=sum) # nf=1 > annual; nf=4 > quarterly; nf=12 > monthly autoplot(annual_sales)

# Seasonal sub-series plot

options(repr.plot.width = 10, repr.plot.height = 3)

series_season = window(series, start=c(1992,1), end=c(2018,10))

ggsubseriesplot(series_season) + ylab(" ") + ggtitle("Figure 2: Seasonal sub-series plot (horizontal bars represent monthly mean)")+ ylab("Retail sales (million US$)")+ theme(plot.title = element_text(size=10))

options(repr.plot.width = 6, repr.plot.height = 3)

# remove seasonality (monthly variation) to see yearly changes

series_ma = ma(series, 12)

autoplot(series_ma) +

xlab("Time") + ylab("Retail sales (million US$)")+

ggtitle("Figure 3: The series after removing seasonality" )+

theme(plot.title = element_text(size=8))

options(repr.plot.width = 6, repr.plot.height = 3)

# zooming in to the recent trend

series_downtime = window(series_ma, start=c(2007,1), end=c(2018,10))

autoplot(series_downtime) +

xlab("Time") + ylab("Retail sales (million US$)")+

ggtitle(" Figure 4: Bookstores sales recent trend")+

theme(plot.title = element_text(size=8))

# decomposition

options(repr.plot.width = 6, repr.plot.height = 3)

autoplot(decompose(series)) + ggtitle("Figure 6: Decomposition of the series")+

theme(plot.title = element_text(size=8))

# predictor series

predictor_series = window(series, start=c(2007,1), end=c(2018,10))

# model

forecast_hw=hw(predictor_series, seasonal="multiplicative", h=260)

options(repr.plot.width = 10, repr.plot.height = 3)

# ploting the predictions

autoplot(series, series = " 1963-2011 series")+

autolayer(series_ma, series = "Predictor series")+

autolayer(forecast_hw, series="Holt-Winter forecast")+

xlab("Time") + ylab("Retail sales (million US$)")+

ggtitle("Figure 7: HW Exponential Smoothing")+

theme(plot.title = element_text(size=8))

## Appendix

# Fiting & prediction with ARIMA

fit.arima = auto.arima(series, seasonal=TRUE, stepwise = FALSE, approximation = FALSE)

forecast_arima = forecast(fit.arima, h=160)

options(repr.plot.width = 10, repr.plot.height = 3)

# ploting ARIMA predictions

autoplot(series, series=" 1992-2018 series")+

autolayer(series_ma, series=" Input series")+

autolayer(forecast_arima, series=" ARIMA Forecast")+

ggtitle(" Figure 9: ARIMA forecasting")+

theme(plot.title = element_text(size=8))

Hi, nice post. Thanks.

Figure 7 needs some cleanup, however. Red series is 1992-2018, not 1963-2011. Blue one is the MA of the above, not the predictor, as the predictor is the 2007-2018 subsections of the raw series, not the averaged one.

LikeLiked by 1 person

Thanks Mariusz for spotting the issues, just fixed them.

LikeLike

Hi again,

If you run ARIMA on the same predictor as HW (i.e. 2007-2018) by the end of 2030 both predict value around 500. Looks like ARIMA puts more “consideration” 🙂 to the slower rate of decrease in sales from 2013 onwards, on the above predictor, than HW does. That might be an insight, as well.

Cheers,

Mariusz

LikeLike

You are absolutely right. I was going to stick with HW, then realized ARIMA is a time-tested model so put it in the appendix just to “honor” it! Thanks again for checking out.

LikeLike