Running a business is all about numbers.
If you know your numbers, have historical data and track trends, realistically you can predict your upcoming sales each month. This then allows you some peace of mind while also giving you insight into how much you can afford to expand or invest. Maybe into new product lines, research and development, etc.
But get those numbers wrong and you can stretch yourself too thin, or worse, not commit when you could have and miss opportunities for growth.
Fortunately there are tools out there that can help you to more accurately predict these changes, without having to make gut choices.
Tools like Facebook Prophet…
Developed by Facebook's core data science team, Prophet is an open-source algorithm that generates time-series models.
The algorithm aims to provide business users with a powerful and easy-to-use tool to forecast business results without needing an expert in time series analysis.
Simply put, you feed it historical and seasonal data, holidays, and any predicted one-off events, and then it gives an approximate impact. The more data you have and the further back you can share, the better it performs.
The package is available in R and Python and has comprehensive documentation on the Prophet site.
Sidenote: If you don’t know how to use Python yet but want to start learning how to use this and these tools, check out these courses on ZTM.
Anyways, back to Prophet.
Like all tools, Prophet has pros and cons but it’s easily one of my favorites. It helps business users forecast results quickly and easily, whilst also understanding the drivers of the business.
Before I get into how to actually use this tool, let's first break down and understand what a time-series is.
A time series is a collection of data that tracks a metric over a period of time. Let’s look at a classic example of time series data, the weather.
How do we predict the weather?
Sure we could look outside and make a guess, but how do the actual meteorologists do it?
Simply put, they look at past data to predict future results.
If they know the average temperature in May for the past 10 years, and they track the trends, they have a fairly good idea of how the weather will be this year. Then, they add in factors that can also affect temperature such as the humidity levels, geographic location, altitude, and proximity to large bodies of water.
To determine the impact of each of these factors or ‘independent variables’ on the dependent variable (temperature), scientists may conduct a time series analysis.
This analysis can also identify patterns in the timing of these variables (such as seasonality or overall trends) to understand how they impact the dependent variable over time.
Simple!
Many concepts and algorithms have been developed to understand, study, and predict time series data, and the most common framework is the Structural Time Series.
Again, it sounds complex but in reality this is nothing more than taking the ‘general trend’ of the data.
What does the general trend mean? Well, let’s visualize:
Okay, ignore the two charts on the right for now. We’ll come back to those. We’ll focus on the two charts to the left – “Data” and “Trend”.
Imagine the top-left chart is our time series. It’s a bit all over the place, right? Although the line goes in a general direction (bottom left to top right), there are peaks and troughs.
Now look at the bottom-left chart – “Trend”. Notice how it takes the same path as the “Data” chart but cuts out the peaks and troughs.
In fancy speak, the “Data” chart has been decomposed or cleaned up to form the “Trend” chart.
We call this “Trend” chart the Structural Time Series.
Now, look at the two charts to the right: “Seasonality” and “Exogenous impacts”.
“Seasonality” is the cyclical pattern in the data.
For example, the consumption of ice cream is cyclical since it peaks in the summer months and falls in the winter. Seasonality is one of the reasons we see peaks and troughs in the “Data” chart.
“Exogenous events” can also affect the time series.
Exogenous events can be things like the weather, economic sentiment, and random sales events.
The secret to a good forecast is determining the exogenous events and determining what drives the data up and down.
From an equation perspective, you have the following:
The time series at time t is the sum of the trend, seasonality, exogenous impacts, and the error term. The error is part of the prediction, and trying to explain every single instance will likely result in overfitting.
So let’s break this down with another simple example.
Let’s say that you own a surf shop, and sales stay pretty average thanks to a good customer base, and they are slowly growing over time. This is the current trend in your data.
Then as you come into Summer, you predict that sales will rise as more people want to go to the beach.
This is the seasonality.
A hot summer usually means more sales, while bad weather can have a huge negative impact on your bottom line.
Now, suppose an influencer or celebrity talks about how much they love surfing and they happen to be based near you. This exogenous event would directly impact your sales.
Simple right?
BTW, this literally happened when Jonah Hill started surfing. He shared his weight loss and how much surfing affected him, and a lot of people started purchasing surf products that he recommended in interviews.
This is exactly why companies hire celebrities to endorse their products. It can have a huge impact.
However, although they can predict a change in sales, there is always some margin of error. Same for when you’re predicting seasonal effects on current trends.
tl;dr: To recap, a Time Series can be decomposed (at least) into Trend, Seasonality, Exogenous Impacts, and the error term. The error term is basically that which cannot be explained by the first three parts. The “at least” bit is key since we can further split the decomposition, as we will see.
OK so now that you have a rough understanding of what a Structural Time Series is, Facebook Prophet will be much easier to understand, and you’ll start to see some similarities between the two.
Why am I showing you the framework? Because by having a holistic view of the composition and how they connect, you can tailor it to your problem and improve your models.
So let's take another look and break this down further, before we start to add this data into Prophet.
Now, the exogenous impacts are split into two parts:
If we go back to our previous example, the Holiday Effects h(t) could be that a good summer means more surf sales, while external regressors can be things like marketing investments, bad weather, or Jonah Hill sharing his favorite surf wax.
Alright so let’s look into how to use this. Now again, I’m going to assume that you already use Python and have some basic experience at manipulating and understanding data - if not, check out the courses I mentioned earlier:
With that out of the way, I highly suggest that you follow along with me by using the code snippets I’ll provide as examples.
Grab Prophet and then we can set it all up. First, let’s get the usual libraries: pandas and numpy.
# Importing libraries
import pandas as pd
import numpy as np
Next, we need to define a date interval to build the time series.
For now we’ll use a daily data set, but you could also use weekly or monthly by including “W” or “M” in the pd.date_range function.
# Define time range for data
start_date = '2020-01-01'
end_date = '2021-12-31'
dates = pd.date_range(start_date, end_date, freq='D')
The third step is to create the time series and a dataframe.
Important: Unlike SARIMAX, the dates must be in a column, not in the index!
Furthermore, Prophet is annoyingly specific with names. This means that the time series must be called ‘y’ and the dates ‘ds’ or it won’t work.
# Create dummy time series data
y = np.random.normal(loc=100, scale=20, size=len(dates))
df = pd.DataFrame({'ds': dates, 'y': y})
Lastly, let’s create a couple of regressors to capitalize fully on this tutorial.
# Create dummy regressor data
regressor_1 = np.random.normal(loc=50, scale=10, size=len(dates))
regressor_2 = np.random.normal(loc=0, scale=5, size=len(dates))
df['regressor_1'] = regressor_1
df['regressor_2'] = regressor_2
# Inspect the dataset
print(df.head())
Easy right?
So let’s recap what we have so far.
Simple!
But we don’t end here. Thanks to the holiday component in Prophet, we can take this forecasting even further.
The Holiday component via the structural time series is one of the most exciting parts of Prophet given its simplicity and power.
For example
Let’s look at a holiday such as Valentine's day, and how it might affect chocolate sales. For this example, I’m going to assume that the demand from February 11th to 15th might look something like this:
My thoughts are that the demand would grow each day from the 11th, reaching its peak on the 14th. On the 15th, some people would still purchase if they had forgotten, or perhaps to eat their sorrows at not having a date!
Well, with Facebook Prophet, you’ll specify February 14th as the specific holiday date, and then add a lower and upper impact window.
So in this example, you would ask Prophet to find an impact for each day between the 11th and the 15th. It could then look at past data during that time and amend your forecast.
Another really cool thing about Prophet?
With other models and tools, you would need to create a variable for each day during that holiday. This means that you would need 5 variables to represent February 11th to the 15th, but with Prophet you don’t need to. This makes Prophet far more beginner friendly to use.
Let me show you how this would work:
# Create dataframe with holidays
feb14 = pd.DataFrame({
'holiday': 'feb14',
'ds': pd.to_datetime(['2020-02-14', '2021-02-14']),
'lower_window': -3,
'upper_window': 1
})
Easy!
And if you wanted to add a second holiday – let’s say July 4th – you would repeat the process:
july4 = pd.DataFrame({
'holiday': 'july4',
'ds': pd.to_datetime(['2020-07-04', '2021-07-04']),
'lower_window': -2,
'upper_window': 2
})
Then, you would need to combine both into a new pandas dataframe:
# Merging all holidays
Holidays = pd.concat([feb14, july4])
Boom. Done. Let’s roll!
The Facebook Prophet Model has a lot of other parameters for you to tweak, so let’s have a look at them.
We’re going to tune these last three values later in this section, so don’t worry about them for now.
Also, if this all seems confusing don’t worry as you’ll soon understand these parameters better by repeating and implementing them. So let’s build the forecasting model with the code below:
#FB model
m = Prophet(holidays = holidays,
seasonality_mode = 'multiplicative',
seasonality_prior_scale = 10,
holidays_prior_scale = 10,
changepoint_prior_scale = 0.05)
m.add_regressor('regressor_1')
m.add_regressor('regressor_2')
m.fit(df)
This is the baseline model and we can build on this from here on.
Cross-validation is an important technique for evaluating the performance of time series forecasting models like Prophet.
It divides your data into multiple training and testing sets, fits your model to each training set, and evaluates the accuracy of your predictions on the corresponding testing set.
To perform cross-validation with Prophet, you can do the following:
# Cross-validation
from fbprophet.diagnostics import cross_validation, performance_metrics
df_cv = cross_validation(m,
horizon = '31 days',
period = '16 days',
initial = '365 days',
parallel = 'processes')
Once you've run the cross-validation, you can use a range of metrics to evaluate the performance of your model, such as mean absolute error (MAE), mean squared error (MSE), or root mean squared error (RMSE).
# RMSE and MAPE for the Cross-Validation
print("RMSE: ", round(performance_metrics(df_cv)["rmse"].mean(),1))
print("MAPE: ", 100 * round(performance_metrics(df_cv)["mape"].mean(),3), "%")
One cool feature is that you can see the error metric per day.
For instance, you can plot the RMSE at the 4th, 15th, or even 27th day. Simply use the following code snippet:
# Plotting the Cross-Validation error per day
from fbprophet.plot import plot_cross_validation_metric
plot_cross_validation_metric(df_cv, metric = 'rmse');
What you usually expect is a growth curve over the time horizon, while the following example is roughly stable.
You can see this in the chart below, where it shows you how reliable the forecast is over a specific time horizon.
That’s the cross-validation done, so let’s continue.
Prophet is a powerful tool for time series forecasting, but you need to make sure that you carefully tune your model parameters to get the best results. The good news of course is that Prophet offers a range of parameters to fine-tune your model and improve its accuracy!
You should tune four main parameters in a Prophet mode:
#Parameter Grid
from sklearn.model_selection import ParameterGrid
param_grid = {'seasonality_mode': ["additive", 'multiplicative'],
'seasonality_prior_scale': [1, 5, 10, 20],
'holidays_prior_scale': [5, 10, 20, 25],
'changepoint_prior_scale': [0.005, 0.01, 0.05, 0.1]}
grid = ParameterGrid(param_grid)
len(list(grid))
When tuning your Prophet model, it's important to remember that there's no one-size-fits-all solution.
The optimal values for your model parameters will depend on the specific characteristics of your time series and the goals of your analysis, so feel free to experiment with different values and evaluate your model's performance using mean absolute or squared error metrics.
To find your own optimal values, you must build a loop with three steps:
#Store the results
rmse = []
#Loop
for params in grid:
#model
m = Prophet(holidays = holidays,
seasonality_mode = params['seasonality_mode'],
seasonality_prior_scale = params['seasonality_prior_scale'],
holidays_prior_scale = params['holidays_prior_scale'],
changepoint_prior_scale = params['changepoint_prior_scale'])
m.add_regressor('regressor_1')
m.add_regressor('regressor_2')
m.fit(df)
#CV
df_cv = cross_validation(m,
horizon = '31 days',
period = '16 days',
initial = '365 days',
parallel = 'processes')
#Measure the error and store the result
error = performance_metrics(df_cv)["rmse"].mean()
rmse.append(error)
Wow, that’s a lot of code, but it’s super easy to copy and paste 😀.
#Check the results
tuning_results = pd.DataFrame(grid)
tuning_results['rmse'] = rmse
best_params = tuning_results[tuning_results.rmse == tuning_results.rmse.min()].transpose()
Best_params
Now, you just need to use those best parameters to build your newly-tuned Prophet model.
Using historical data and various modeling techniques, Prophet can generate accurate predictions that consider trends, seasonality, and other factors in a time-series.
There are two steps you need to take.
Create a dataframe with the “future” dates and add the regressors. (See below).
This is important. If you used regressors to explain the past, then you must have them to predict the future.
#make a future dataframe
future = m.make_future_dataframe(periods = len(future_df),
freq = "D") #W for weekly or M for monthly
future = pd.concat([future, df.iloc[:,2:]], axis = 1)
future.head()
Once the 'future' dataframe is ready, you use the predict function and see what comes out.
# Forecasting
forecast = m.predict(future)
forecast.head()
To truly understand your data and model, you must visualize it intuitively and informally, and that's where data visualization comes in. Prophet makes creating beautiful visualizations of your time series data and forecasts easy.
For example, with just one line of code, you can generate a plot that shows the trend, seasonality, and uncertainty in your data – like this:
# Plotting
m.plot_components(forecast);
One sample of the output is below. I have two seasonal cycles - weekly and yearly - and I can see the development:
But data visualization isn't just about making pretty pictures! It's about gaining insights and understanding your data in a way that's easy to digest.
With Prophet, you can now use these visualizations to explore your data, identify patterns and trends, and make informed decisions about your model.
And that’s really the end goal right? To predict upcoming events and how to plan for them.
As you can see, Prophet is pretty damn awesome.
And sure, it’s not all positives and rainbows. On the negative side, Prophet works best with long term predictions and more historical data.
It can also be a little complex thank to all the fine tuining options. Each of which requires optimization, which can increase Prophets complexity to use.
Even then, Prophet is still my absolute favorite tool for time-series forecasting.
Hopefully you can see the value in this tool and how it can help your business predictions and you’ll start to use it yourself. If you want an in depth program to teach you time-series forecasting with Facebook Prophet, come and check out my time-series forecasting course.
Likewise, if you saw the value in this but have never coded before, check out ZTM’s Python course and learn from scratch. There is absolutely zero prior programming experience required to get started!