The goal of this project is to accurately predict property prices within England and Wales with models based on features readily available on open source databases.
First goal is to retrieve data from the Land Registry's website and analyse house prices on the limited features that they make available:
- Postcode - Property type - Tenure - Build type
Once this initial analysis has been carried out, next is to pull in more features to improve and strengthen the model (in no particular order):
- # Bedrooms - Square footage - Proximity and abundance of transport links - Local wealth (Referencing information) - Crime rates in the area - Council tax banding - Density of properties - Property performance (e.g EPC certificates) - Rent prices in the area - Green space - Average rainfall and/or other weather indicators
postcode co-ordinates - https://github.com/dwyl/uk-postcodes-latitude-longitude-complete-csv
pricing model - http://www.doc.ic.ac.uk/~mpd37/theses/2015_beng_aaron-ng.pdf
We then format these columns in order to run further analysis such as formatting dates to datetime and month values.
As we will be looking into leveraging the location of the properties themselves to see if and how that affects the price, we will need to run ruther transformation on the postcode.
Next we add the longitude and latitudes directly on the postcodes.
There are a number of postcodes that are missing within the dataframe and close to 90% of those missing values have the property type of 'other'. As this property type is somewhat ambiguous and makes up the majority of missing postcodes, we will drop all observations with this property type.
To add to this, the long. lat. csv has failed to join on 64,321 rows - We will need to run further analysis into why there were such a large number of properties missing long lat values.
I ran some basic visualisations to get a sense of the data - I found that property prices are heavily squewed to the right. As such applying a mask to only include those with a value of less 1 million pounds.