Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
assets/datasets
code
slides
readme.md

readme.md

title duration creator
Intro to Pandas & Time Series Data
1:25
name city
Robby Grodin
Boston

Intro to Time Series

Week 9 | Lesson 1.3

LEARNING OBJECTIVES

After this lesson, you will be able to:

  • Understand what time series analysis is and what it is used for
  • Use Pandas to model and manipulate a Time Series
  • Explain the functionality afforded to the DateTime object

STUDENT PRE-WORK

Before this lesson, you should already be able to:

  • Load data into a Pandas DataFrame
  • Access data in a DataFrame object
  • Use Pandas' built in descriptive statistics functions

INSTRUCTOR PREP

Before this lesson, instructors will need to:

  • Review the GOOG data set from week 2 for familiarity
  • Review both .ipynb notebooks. You will be live-coding the Introduction, Demo, and part of the Discussion sections in these notebooks. Feel free to diverge from the provided solution code, these are just suggestions. Also, feel free to add more exercises. You know your students best!

LESSON GUIDE

TIMING TYPE TOPIC
5 min Opening What Is Time Series Analysis?
15 min Introduction The DateTime Object
20 min Demo Time Series In Pandas
15 min Discussion Date Ranges and Frequencies
25 min Independent Practice Manipulating a Time Series
5 min Conclusion Recapitulation

What Is Time Series Analysis? (5 mins)

  • Statistical modeling of time ordered data observations
  • Two main goals:
    • Identifying the underlying mechanisms represented by the sequence of observations
    • Forecasting: predicting the future values of a variable described in the time series
  • Examining multiple time series to model dynamic relationships

Instructor Note: Have the students list the possible business uses for time series analysis, i.e.: Financial Analysis/Forecasting, retail inventory planning, CDC predictions, neuroscience, signal processing, etc.

Check: Recall the np.correlate function from Week 2, which we used to analyse the relationship between GOOG and AAPL stocks.

The DateTime Object (15 mins)

As our data will be ordered by time, we will need a powerful library for dealing with timestamps. Luckily, Python provides a module that gives us both simple and complex methods of manipulating dates and times. The cornerstone of the datetime module is the DateTime object, a container representing a time that is either aware or naive. Aware DateTimes have information regarding time zone and daylight savings time, a naive DateTime does not.

Let's check out the DateTime Documentation.

from datetime import datetime

# Time this lesson plan was written
lesson_date = datetime(2016, 3, 5, 23, 31, 1, 844089)

The DateTime object has all kinds of descriptive methods. Let's try some!

lesson_date.day
lesson_date.month
lesson_date.year
lesson_date.hour

NOTE: See Reference A below for all components that can be extracted from a DateTime object.

We can also use a timedelta object to shift a DateTime object. Here's an example:

from datetime import timedelta
offset = timedelta(days=1, seconds=20)
offset.days
offset.seconds
offset.microseconds

now = datetime.now()
now

now + offset
now - offset

Code: Open the datetime.ipynb notebook and complete the 4 exercises

Time Series In Pandas (20 mins)

Let's load switch over to the timeseries.ipynb notebook, and I'll walk you through loading a time series into Pandas. We'll also go over applying the DateTime functionality to the time series.

Date Ranges and Frequencies (15 mins)

Using the Pandas documentation, take a few minutes to read about the asfreq and resample methods.

Instructor's Note: Give the students a few minutes to read about these methods. Have a brief discussion about the implications of both.

Let's go back to our timeseries.ipynb notebook and implement the two functions to get a better idea of what they do.

Note that asfreq gives us a method keyword argument. Backfill, or bfill, will propogate the last valid observation forward. In other words, it will use the value preceding a range of unknown indices to fill in the unknowns. Inversely, pad, or ffill, will use the first value succeeding a range of unknown indices to fill in the unknowns.

Now, let's discuss the following points:

  • What does asfreq do?
  • What does resample do?
  • What is the difference?
  • When would we want to use each?

We can also create our own date ranges using a built in function, date_range. The periods and freq keyword arguments grant the user fine-grained control over the resulting values. To reset the time data, use the normalize=True directive.

NOTE: See Reference B below for all of the possible

We are also given a Period object, which can be used to represent a time interval. The Period object consists of a start time and an end time, and can be created by providing a start time and a given frequency.

Manipulating a Time Series (25 mins)

Let's break up into groups and look at the different ways we can manipulate our time series.

Try the following to mutate df_goog to represent a daily, weekly, and monthly granularity. When you have data on a daily level, use the Period and date_range functionalities to practice retrieving data from a DataFrame for a given range or frequency.

  • asfreq
  • resample
  • Period
  • date_range

BONUS:

  • Create a new DataFrame with the daily change for each column in df_goog (hint: you'll need to reset the index to a daily timeframe)
  • Apply models studied previously to gauge the relationship between a random sampling of columns from df_goog
  • Create an Aware DateTime object with Boston's UTC offset.

Recapitulation (5 mins)

  • Recap the objects and methods discussed
  • Discuss how these techniques will help with the Kaggle challenge
  • Repeat the importance of reading the documentation (does it do what you think it does, are you re-inventing the wheel, etc.)

ADDITIONAL RESOURCES

Reference

A) Time/Date components that can be accessed from a DateTime object (source)

Alias Description
year The year of the datetime
month The month of the datetime
day The days of the datetime
hour The hour of the datetime
minute The minutes of the datetime
second The seconds of the datetime
microsecond The microseconds of the datetime
nanosecond The nanoseconds of the datetime
date Returns datetime.date
time Returns datetime.time
dayofyear The ordinal day of year
weekofyear The week ordinal of the year
week The week ordinal of the year
dayofweek The day of the week with Monday=0, Sunday=6
weekday The day of the week with Monday=0, Sunday=6
quarter Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc.
days_in_month The number of days in the month of the datetime
is_month_start Logical indicating if first day of month (defined by frequency)
is_month_end Logical indicating if last day of month (defined by frequency)
is_quarter_start Logical indicating if first day of quarter (defined by frequency)
is_quarter_end Logical indicating if last day of quarter (defined by frequency)
is_year_start Logical indicating if first day of year (defined by frequency)
is_year_end Logical indicating if last day of year (defined by frequency)

B) Time offset aliases (source)

Alias Description
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseonds
U, us microseconds
N nanoseconds