Everything you need to know about Time Series analysis
What is Time Series?
In short, Time Series analysis is nothing but systematic efforts to understand what occurred in the past and predict the future over a period of time. Then what is Regression?
The difference in Time Series and Regression is – In Time Series, the Time Component is very significantly related to the target variable. That does not mean your data contains a time feature so it is a time series problem. For example property prices. Though the property prices increases over period of time, those are not strongly related to time component as other features too play an important role in Price Predication like location of the property, School / Hospital / Metro Station around property and so on.
Let’s quickly understand some of the important aspects of time series –
Trend
This describes movement of a feature to relatively higher or lower values over long period of time and may consist of Up-Trend or Down-Trend. One of the best example you can relate is internet usages has significantly increased over the period of time. Whereas sale of newspapers and other print media has slowly decreased due to advancement of technology and people moving towards digital content.
Seasonality
Seasonality describes the Upward or Downward movement in repeating pattern within the fixed time period. During the festive season, for example Christmas which occurs every year in December, people do shop more and this behaviour gets repeated every year over the decades of time.
Noise
Noise is when we observe sudden upward or downward movement which is unsystematic in nature. This happens for a very short period of time and is non-repeating. For example, due to an unforeseen natural calamity like earth quake or some kind of viral epidemic, medicine sale increases for a short period but same spike is not observed in the same period of time in the past.
Exogenous variables
In simple terms, exogenous variables are –
- Independent variables. They are fixed when they are provided to the model and not explained or determined by the model.
- Endogenous variables gets affected by Exogenous variables
Endogenous variables
A variable that is explained by a model. They are dependent variable that we want to explain
Univariate Time Series
Univariate refers to an expression, equation, function or polynomial of only one variable. So Time series where Single Variable is considered are called Univariate Time Series.
E.g. Data collected of a temperate every minute. Therefore each minute you will have only one-dimensional value, which is nothing but a temperature.
Multivariate Time Series
Objects of any of those types involving more than one variable may be called multivariate. Time Series consisting of more than one feature or variable are called as Multivariate Time Series
E.g. now you are collecting data for temperate, humidity and wind speed every minute. Therefore each minute you will have multi-dimensional value, which is temperature, humidity and wind speed.
As you know, there are various models available to address time series problems and often this is very confusing for beginners which model to choose which best fits. Below are some of the basic models one can effectively use to address some of the common time series problems.
But wait…
With all these options available, which model should I use?
Here is a quick matrix which will help you choose the suitable model. However, how it fits, it all depends on your implementation. If you have any questions, please reach out to use us at contact@intellifysolutions.com
Is your data contains Trend and Seasonal Component? | ||||
No Trend and Seasonal Component Present | Trend Component Present | Seasonal Component Present | Both Trend and Seasonal Components are Present | |
Univariate time series |
AR MA ARMA SES |
ARIMA SARIMA #SARIMAX HWES |
SARIMA #SARIMAX HWES |
SARIMA #SARIMAX HWES |
Multivariate time series |
VAR VARMA #VARMAX |
Note: #with exogenous variables
Here is a quick explanation for each of the model in simple terms. Well you can read it here in your leisure time https://machinelearningmastery.com/time-series-forecasting/
Typical steps we would take when we are addressing Time Series problem
- Define the Objective or a forecast you need to predict. For e.g. forecast temperature for next week.
- Load Data – Load the historical data. The quantity and quality of your data dictate how accurate model will be
- Conduct Exploratory Data Analysis (EDA) – this is a critical process used to perform initial investigations on the data. This helps to discover various patterns, identify anomalies and test the hypothesis using various summary statistics and graphical representations using line charts, histograms, correlations diagrams etc.
- Define Train and Test Data Sets – In this step the data is separated into two parts, one to train the model and one to test the model. The proportion of data to be separated for train and test depends on how many data point you have. Typically this is divided into 80%-20% ratio. 80% to train and 20% for test.
- Choose Algorithm – Choose the suitable Algorithm depending on the data and forecasting needs. Please refer to the above matrix
- Develop Multiple Models – Develop multiple models and test them for better accuracy and choose the best fit for your data.
- Train and Test the Model Accuracy – This is one of the proven method to test the model accuracy.
- Tune the Model – Tune the model to get more accuracy using various techniques like feature selection.
- Deploy the model – This model can be deployed in production for use.