I’m not going to discuss time series and forecasting theories, not here, not anytime in the future in this writing series. I am deliberately avoiding any and all theories, as much as possible (if you are curious why,
I wrote about it in here). If you are interested in theories, there are plenty of materials out there (see Rob Hyndsman’s extensive work, for example). Instead, in this series I plan to provide lot’s of forecasting examples with many different types and shapes of real world data. I’ll pick a dataset, do some analysis, along the way I may explain why I’ m doing what I’m doing.
In today’s example I picked a dataset of population growth of Japan. I knew that population in Japan is going down, so just out of curiosity I was interested in looking at historical trend and see what the future looks like in a business-as-usual situation. We can talk about some alternatives to business-as-usual scenarios at the end.
First we need to get the data. There are several sources, but the World Bank has the richest country scale datasets on numerous different indicators. We could download the data from the World Bank website as
csv file, then clean it up and import here. But fortunately, someone has done all of these in an R package called
wbstats so we don’t have to.
Below I’m going step by step, from data preparation to forecasting and all the way to interpretation of the forecasts.
# load `wbstats` package library(wbstats) # we also need data wrangling package `dplyr` library(dplyr) # import data jppop = wb(indicator = "SP.POP.TOTL", country = "JP", startdate=1960, , enddate=2017)
# view just first 2 rows head(jppop)[1:2,]
iso3c date value indicatorID indicator iso2c country JPN 2017 126785797 SP.POP.TOTL Population, total JP Japan JPN 2016 126994511 SP.POP.TOTL Population, total JP Japan
# from the dataframe we'll keep only 2 columns: data & value. jppop = jppop[c(2,3)] # change the order of the year from descending to ascending jppop = jppop[order(jppop$date),] # plot the data to see how it looks like options(repr.plot.width = 7, repr.plot.height = 5) # set figure size plot(jppop$value~jppop$date)
# remove the date column, we don't need it any more jppop=jppop[c(2)] # convert the dataframe into a times series (ts)) object data = ts(jppop, start =1960) # convert population to millions for easy visuals data = data/1000000
# now we are in forecasting business. # first load `fpp2` and `forecast()` package (importing just `fpp2` should work, but just in case). library(fpp2) library(forecast) library(ggplot2) # you may or may not need it, but just in case
# plot the ts object we created, this time using autoplot() function that comes with forecast() package options(repr.plot.width = 7, repr.plot.height = 3) # set figure size autoplot(data) + ggtitle("Population trend in Japan") + xlab("Year") + ylab("Millions")
From this plot alone we can say a lot about population in Japan, some of them are obvious from the figure some other needs little research. Here are few:
– Current population in Japan is around 126 million
– Total population has grown until around 2010 and since then declining
– Besides some east European countries Japan is the only developed nation to experience population drop
– The main reasons for such population drop aging population and you generation not willing to have kids
# we are doing forecasting for the year 2030 (13 years into the future from 2017, hence h=13) using five simple models data.mean=meanf(data, h=13) # mean forecast data.naive = naive(data, h=13) # naive forecat data.rwf_drift = rwf(data, h=13, drift=TRUE) # random walk forecast with drift data.spline = splinef(data, h=13) # local linear forecast data.ets = forecast(data, h=13) # automatic ETS forecast (Exponential Smoothing)
# view all the forecasts we just made alltogether in one figure options(repr.plot.width = 12, repr.plot.height = 4.5) # set figure size autoplot(data) + autolayer(data.naive, series = "Naive", PI=FALSE) + autolayer(data.rwf_drift, series = "RWF with drift", PI=FALSE) + autolayer(data.mean, series = "Mean forecast", PI=FALSE) + autolayer(data.spline, series = "Local linear forecast", PI=FALSE) + autolayer(data.ets, series = "Automatic ETS forecast", PI=FALSE)
Now it’s upto you as a forecaster (doesn’t matter if you are a beginner or expert) how to interpret the forecasts and which model to choose and let the others go. Doesn’t look like RW or mean model looks any plausible, may in the longer term but not for the next 13 years period. On the other hand the other 3 forecasts look really plausible. But let’s watch what all the forecast values are
# find out forecast values of each method mean = round(data.mean$mean) naive = round(data.naive$mean) rwf_drift = round(data.rwf_drift$mean) ets = round(data.ets$mean) local_linear = round(data.spline$mean) t(data.frame(mean, naive, rwf_drift, ets, local_linear))
------------------------- mean 118 naive 127 rwf_drift 135 ets 125 local_linear 124 -------------------------
Are these models any good?
In a range of scenarios and uncertainties the UN Population Prospects (www.population.un.org) meadian projection is that the population in Japan will be 121.5 million in 2030. The World Bank projection is even dire – 120.2 million by 2030. The nearest of those UN and WB model projections is the linear trend model that shows 124 million people. Any of these model prediction can be right, depending on how Japan responds to current population decline (Random Walk projection is probably not going to happen).
Go ahead, choose another country and do your own analysis, interpretation. Just by changing
country in the very first line of codes you can run the whole forecast (just type “US” instead of “JP”, then run all and watch!). You can have unlimited fun by changing
indicator in the same line of code.
I will update this blog to expand on some of the narratives later on.