Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
assets
code/starter-code
readme.md

readme.md

title type duration creator
DB Lab 1
lab
1:25
name city
Francesco Mosconi
SF

DB Lab 1

Introduction

Note: This can be a pair programming activity or done independently.

In this lab we will explore a small subset of the database of the Enron email corpus

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). The original dataset contains data from about 150 users, mostly senior management of Enron and about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The subset we use here is much smaller than the total dataset.

Exercise

Requirements

  • Load the data from the enron.db file into Pandas
  • Perform exploratory data analysis
  • Use the merge function from pandas to combine data from different tables

Bonus:

  • Answer few more questions using the merge function from pandas

Starter code

The starter code folder contains a Jupyter notebook with a series of questions. Most questions can be solved in more than one way and using different tools. Feel free to use both SQL and Pandas to tackle them.

Solution Code