Week 3 Time Series
.docx
keyboard_arrow_up
School
University of South Florida *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
26
Uploaded by ssaintclair
Google Cloud – Time Series Modeling: What is a time series?
Introduction
What is a Time Series
Time Series Concepts
Time Series Modeling
Lab Objectives
What is a Time Series
What are some of the basic concepts in Time Series
How do we analyze Time Series data to predict future values from past values
Agenda
Concepts in Time Series
Time series terminology
AR – Auto regressive
MA – Moving Average
Putting it all together
ARIMA Model
No one can predict the future. However, there's a way to predict the future in certain data sets with greater accuracy; time series data. In this lesson, we will learn what is the time series and understand the basic concepts of time series modeling. We will also try to learn some basic terminology and then use some time-
tested, no pun intended, techniques to predict the future. We will first try to answer the question: What is a time series and why is it important
in finance? We will then discuss how to analyze a time series data set. What are the different terms used in time series analysis including ARIMA and how we can use it to make predictions? In this lesson, our goals are to understand what is a time series and what are some of the basic concepts in time series that we need to know about? Concepts in Time Series
What is a Time Series?
A time series is a series of data points indexed in time order.
Most commonly, a time series is a sequence of snapshots of a process taken at successive equally spaced points in time.
Thus, it is a sequence of discrete-time data.
Then we will learn about how to analyze time series data and build a model to predict a future value from past values. First, let's understand what a time series is. A time series is a series of data points indexed in time order. Most commonly a time series is a sequence of snapshots of a process taken at successive equally spaced points in time. Thus, it is a sequence of discrete time data. Examples of time
series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average. Now, let's look at what the basic terminology that we use in analyzing time series data are. What is Stationary data?
“Stationary” means that the statistical structure of the series is independent of time.
First, we need to understand the concept known as stationarity. In time series data such as the chart to the right of us, US GDP data over the last 200 years. We see that it's summary statistics such as mean and variance change over time. This is because US GDP has expanded over time and an average in one 10-year period is not the same average a century later. We call such data non-stationary. So what is stationary data then? Any data such that these statistical structure of the series is independent of time is known to be stationary. In simple terms, it implies that its mean and standard deviation don't change over time. How do we know data is Stationary?
Plots
Summary Statistics
Statistical Tests
How can we find whether time series is stationary? One way to do that is simply by looking at the plot. As you can see in the chart here, it is non-stationary, meaning that it has a definite trend. Secondly, you can measure summary statistics such as average and standard deviation at various points of time in the data and check for obvious or significant differences. Third, you can look and do some statistical tests to check if the expectations of stationarity are met or have been violated. How to convert a Trendy data to “Stationary” data?
One way to Convert Trendy Data to Stationary Data is to Difference it or De-Mean It!
Suppose your data is trendy like the US GDP
chart on the screen, how do you make it stationary? Statistical time series methods and even modern machine learning methods benefit from a clearer signal in the data, which we obtained when we stationarize a time series. One way to
make non-stationary time series data stationary is by identifying and removing trends and removing seasonal effects.
An easy way to do it is to difference one time period from another. That is, we take the difference between two data points and plot it like we see on screen to the right. How does it look? It still looks like it has some trend or even higher averages over time. Let's try differencing it one more time. After differencing it once more, it appears to be stationary. If you want to confirm that the mean and variance of this series is not dependent on time, you can do a statistical test known as the augmented Dickey-Fuller test.
How to convert Trendy data to “Stationary” data?
After differencing it twice, this data now appears to be “stationary”.
You can double-check it with an Augmented Dickey-Fuller test*
Without going into details about how the test works, we will give you a hint on how to read the test output. If the test statistic of the test is greater than
a certain p-value, let's say 0.05, then the given time series is stationary. If you need more details, check out the two links on this slide.
Q; With the US GDP we saw that the data had an upward trend. We
took the first difference but were not able to eliminate the trend. Why do you think the data required a second differencing?
A: The data had an exponential component (increasing slope). This is very common with data where growth is compounded each year such as GDP and stock prices. Exponentially growing prices require a second differencing.
Next, let's look at why stationarity is important in a time series model. There are two reasons. Let's say we want to build a model in which averaging is used. What mean and standard deviation of your data will you use? If your data is non-
stationary, then you will choose the mean from the beginning or the middle or the end, they're all different. Hence, stationarity allows you to build a stable model that uses stable parameters that don't change over time. Why is Stationarity important in Time Series Modeling?
Stationarity allows preserving model stability, i.e. a model whose parameters and structure are stable over time.
Stationarity matters, because it provides a framework in which averaging (used in AR and MA processes), can be properly used to describe the time series behavior.
What are the other properties of time series data?
Most time series contain one or more of the following components:
o
Trend
o
Seasonal
o
Cyclical
o
Irregular (or Residual)
In the air passenger traffic chart that we see to the right, we notice that there are some interesting components in time series data. The first component is called trend. A trend is a long-run increase or decrease in a time series. You can see that the chart on screen has a slight upward trend. Second, when data is affected by the time of the year, it is set to be seasonal. In this case, we can see that almost every year the chart tended to peak during the middle of the year and decrease slightly afterwards. This is most pronounced in retail sales such as snow shovels or lawnmowers. Snow shovels tend to sell well in fall and winter and then decline afterward. Third, is a cyclical component. A cyclical component is measured over a long time horizon, typically one year or longer. For example, sales at fast food chains may rise during recessions when consumers are more cost-conscious and then fall during recoveries This is tied to the business cycle. Finally, an irregular component. Irregular effects are the impacts of random events such as crashes, earthquakes, or sudden changes in the weather. By their very nature, these
effects are completely unpredictable. How do we change stock prices to be more stationary?
Stock prices are typically trending (up or down). But in any case, they have changing mean over time.
Hence we must difference stock prices to get daily, monthly or annual returns
to make them “stationary”
Putting it all together, we can see that
a time series is an amalgam of all these components. The definition of stationarity implies that the mean and
variance of a process remains stationary, that is they should not change over time. However, looking at a stock chart, we can tell that it is not stationary. Stock prices are typically trending up or down. But in this case, they have a rising average price over time. So how do we make it
stationary? One way to do that would
be the difference the stock prices to get daily, monthly, or annual returns, that should make them stationary as you can see on the bottom right.
Q: What is the difference between seasonality and cyclicality?
A: Seasonality refers to cycles that repeat on an annual basis. Cyclicality refers to cycles that can be longer than a year. Seasonality is common with goods or services
that are in greater demand at a particular time of year such as air conditioners or space heaters. Cyclicality refers to data that responds to longer term changes such as the business cycle.
AR – AUTOREGRESSIVE
Next, we are going to learn about how to analyze time series data. One way to do that is to exploit an inherent property of almost all time series data. That is, at any point in time, a data point is slightly or highly dependent on the previous value or values. Let's see what that means next. Now, let us look at autocorrelation which is an important property of almost all time series data. Time Series Terminology: Auto Correlation (AR)
A correlation of a variable with itself at different time periods in the past is known as
“Autocorrelation.”
How can we identify what value of lag we should use?
Unlike data that we use in linear regression, time series data occur at different times. One may wonder for time series has some relationship to previous versions of
itself. Correlation is a great way to measure this relationship. Before we run a linear regression, we can tell if two variables are related. We do this by calculating the correlation coefficient between them.
In linear regression, the observations are paired so there is just one way to compute the classical correlation coefficient. This correlation is known as the Pearson correlation coefficient named after the statistician Karl Pearson. The same idea applies to time series, a series can correlate with itself. You may have seen the movie called The Truman Show starring Jim Carrey. In this movie, he is the star of an
around-the-clock reality television show but he doesn't know it. In one classic scene, he sits in his car and observes all the events that occur around him. He notices that after several minutes, all the activities repeat themselves. The same cars drive, the same people biking, the same people talking. This is an example of a
time series correlating with itself. How does this work with a set of training data? Let's do the following experiment. Suppose we have the return of apple for six months. We would have about a 125 data points. We will form two series from
these. We form the first series by excluding the last five points, we'll call this X. We formed the second series by excluding the first five points, we'll call this Y. That is, we make X by taking the first 120 data points. We make Y by taking the last 120 data points. We call this lagging. For computing the five-day autocorrelation, we must exclude the first five points of one series. Remember, to correlate, we always have to have the same number of points. When we calculate a five-day autocorrelation using 125 days of data, we have 125 minus 5 or 120 total pairs. We then proceed to compute the correlation just to see these were entirely different series. We may wonder though, why not try an eight-
day autocorrelation? Here, we form our X by taking the first 117 points, we form our
Y by taking the last 117 points, then we correlate these. We come to realize that there are many different autocorrelations we can calculate. We can calculate the one-day autocorrelation, two-day auto, and three-day and so on, all the way up to a number of days that slightly smaller than the number of training points you have. Practically speaking, let's calculate the autocorrelation of daily SPY returns from lag 0 to lag 10. This is real data. You can see from the graph that there is a correlation of about 30 percent for a seven-day lag. We can even say that the series has a zero correlation. Why would we do that? We would emphasize that the correlation of a series with itself is one. We can then draw a plot of this autocorrelations. This is known as the ACF chart as you can see on the slide. In the last slide, we discussed autocorrelation. Correlation has no direction. When we say the correlation between X and Y, it is the same as the correlation between Y and X. Regression is however different. When we run a regression, there is a specific direction. Why regress on X? This differs from the regression of X on Y. Time Series Terminology: Auto Regression (AR)
An AR process is where autoregression occurs. Our goal is to find the “correct” time lag that best captures the “order” of such an AR process. This is not a one-step procedure but is an Iterative process.
In regression, direction matters. In ARIMA modeling, we will start by using an order of lag to regress. We can gain insight from our autocorrelation plot. We can get a sense of lag that if we get a sense of the autocorrelation, an AR process is where auto regression occurs. Our goal is to find the correct time lag that best captures the order of such an AR process. This is not a one-step procedure but it's an iterative process. It's not very clear, isn't it? Auto Regressive (AR) process
Here is how an AR process with a time period lag of 1 looks like.
Let's look at a specific example. Here's a sample time
series with the time period of lag 1 to the right of the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillLinear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning