Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
assets
code/starter-code
readme.md

readme.md

title type duration creator
DB Lab 2
lab
1:25
name city
Francesco Mosconi
SF

DB Lab 2

Introduction

The goal of this lab is to build a model to predict adult salary from their demographic data. The dataset comes from the UCI Database and it is locally stored in a SQLite Database.

The target variable is binary. Adults are partitioned in 2 classes with salaries above or below 50 thousand dollars.

This is an example of how data science can be useful to complement the information in our possession to produce new insights. Think for example about taking public demographic data using it for marketing or city planning.

Can you find some other industry examples where demographic data could be used to predict something?

Exercise

Requirements

  • Extract the data from the SQLite Database and import it in Pandas
  • Decide what to do with missing values
  • Explore the features distribution:
    • categorical features
    • numerical features
  • Build a predictive model
  • Evaluate the accuracy of the model and compare to a benchmark

Bonus:

  • Tune the model parameters to improve the accuracy score

Starter code

Starter Code

Solution Code