Benchmark forecasting example: Japanese population by 2030

Disclaimer

I’m not going to discuss time series and forecasting theories, not here, not anytime in the future in this writing series. I am deliberately avoiding any and all theories, as much as possible (if you are curious why, I wrote about it in here). If you are interested in theories, there are plenty of materials out there (see Rob Hyndsman’s extensive work, for example). Instead, in this series I plan to provide lot’s of forecasting examples with many different types and shapes of real world data. I’ll pick a dataset, do some analysis, along the way I may explain why I’ m doing what I’m doing.

The dataset

In today’s example I picked a dataset of population growth of Japan. I knew that population in Japan is going down, so just out of curiosity I was interested in looking at historical trend and see what the future looks like in a business-as-usual situation. We can talk about some alternatives to business-as-usual scenarios at the end.
First we need to get the data. There are several sources, but the World Bank has the richest country scale datasets on numerous different indicators. We could download the data from the World Bank website as csv file, then clean it up and import here. But fortunately, someone has done all of these in an R package called wbstats so we don’t have to.
Below I’m going step by step, from data preparation to forecasting and all the way to interpretation of the forecasts.

Preparing data

# load `wbstats` package
library(wbstats)
# we also need data wrangling package `dplyr`
library(dplyr)
# import data
jppop = wb(indicator = "SP.POP.TOTL", country = "JP", startdate=1960, , enddate=2017)
# view just first 2 rows
head(jppop)[1:2,]
iso3c	date	value	indicatorID	indicator	iso2c	country
JPN	2017	126785797	SP.POP.TOTL	Population, total	JP	Japan
JPN	2016	126994511	SP.POP.TOTL	Population, total	JP	Japan
# from the dataframe we'll keep only 2 columns: data & value.
jppop = jppop[c(2,3)]
# change the order of the year from descending to ascending
jppop = jppop[order(jppop$date),]
# plot the data to see how it looks like
options(repr.plot.width = 7, repr.plot.height = 5) # set figure size
plot(jppop$value~jppop$date)
# remove the date column, we don't need it any more
jppop=jppop[c(2)]
# convert the dataframe into a times series (ts)) object
data = ts(jppop, start =1960)
# convert population to millions for easy visuals
data = data/1000000

Forecasting

# now we are in forecasting business.
# first load `fpp2` and `forecast()` package (importing just `fpp2` should work, but just in case). 
library(fpp2)
library(forecast)
library(ggplot2) # you may or may not need it, but just in case
# plot the ts object we created, this time using autoplot() function that comes with forecast() package
options(repr.plot.width = 7, repr.plot.height = 3) # set figure size
autoplot(data) + ggtitle("Population trend in Japan") + xlab("Year") +  ylab("Millions")

From this plot alone we can say a lot about population in Japan, some of them are obvious from the figure some other needs little research. Here are few:
– Current population in Japan is around 126 million
– Total population has grown until around 2010 and since then declining
– Besides some east European countries Japan is the only developed nation to experience population drop
– The main reasons for such population drop aging population and you generation not willing to have kids

# we are doing forecasting for the year 2030 (13 years into the future from 2017, hence h=13) using five simple models
data.mean=meanf(data, h=13) # mean forecast
data.naive = naive(data, h=13) # naive forecat
data.rwf_drift = rwf(data, h=13, drift=TRUE) # random walk forecast with drift
data.spline = splinef(data, h=13) # local linear forecast
data.ets = forecast(data, h=13) # automatic ETS forecast (Exponential Smoothing)
# view all the forecasts we just made alltogether in one figure
options(repr.plot.width = 12, repr.plot.height = 4.5) # set figure size
autoplot(data) + autolayer(data.naive, series = "Naive", PI=FALSE) + 
autolayer(data.rwf_drift, series = "RWF with drift", PI=FALSE) + 
autolayer(data.mean, series = "Mean forecast", PI=FALSE) + 
autolayer(data.spline, series = "Local linear forecast", PI=FALSE) + 
autolayer(data.ets, series = "Automatic ETS forecast", PI=FALSE)

Now it’s upto you as a forecaster (doesn’t matter if you are a beginner or expert) how to interpret the forecasts and which model to choose and let the others go. Doesn’t look like RW or mean model looks any plausible, may in the longer term but not for the next 13 years period. On the other hand the other 3 forecasts look really plausible. But let’s watch what all the forecast values are

# find out forecast values of each method
mean = round(data.mean$mean[13])
naive = round(data.naive$mean[13])
rwf_drift = round(data.rwf_drift$mean[13])
ets = round(data.ets$mean[13])
local_linear = round(data.spline$mean[13])
t(data.frame(mean, naive, rwf_drift, ets, local_linear))
-------------------------
mean	        118
naive	        127
rwf_drift       135
ets	        125
local_linear	124
-------------------------

Are these models any good?

In a range of scenarios and uncertainties the UN Population Prospects (www.population.un.org) meadian projection is that the population in Japan will be 121.5 million in 2030. The World Bank projection is even dire – 120.2 million by 2030. The nearest of those UN and WB model projections is the linear trend model that shows 124 million people. Any of these model prediction can be right, depending on how Japan responds to current population decline (Random Walk projection is probably not going to happen).

Exercise:

Go ahead, choose another country and do your own analysis, interpretation. Just by changing country in the very first line of codes you can run the whole forecast (just type “US” instead of “JP”, then run all and watch!). You can have unlimited fun by changing indicator in the same line of code.

Endnote:

I will update this blog to expand on some of the narratives later on.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s