Zillow Home Value Prediction

Introduction

Improving Zestimate

Zillow currently employs their ML-based Zestimate tool, which uses numerous data points collected on houses across the US to make an estimate for house value on the market. The tool has proven to be very powerful with predictions achieving a median margin of error of less than 5%. Because the tool is free and effective, Zillow has disrupted conventional real estate and stands as one of the future leaders in the industry.

Zillow started the “Zillow Prize: Zillow Home Value Prediction (Zestimate)” competition to offer participants an opportunity to improve Zestimate for the chance to win $1.2M. The competition took place in two phases:

  1. A qualifying round where participants aimed at reducing Zestimate’s residual error.
    • Submissions were evaluated on the mean absolute error between the predicted log error and actual log error.
    • In essence, participants were trying to predict the difference between the Zestimate and the acutal home price once it sold
  2. Those who qualified for the second round built a home valuation model and it was tested on the market for 3 months.

We will address phase one of this project and explore notebooks where the aim was to achieve the lowest residual error on given data.

Data

The data was provided by Kaggle and was sourced from 2016 and 2017 in three counties in the Los Angeles, CA area. For each year, there are two CSV files, a test dataset with over 2M houses and numerous variables describing them and a training dataset that contains the actual log error between predicted and actual. The instances are connected through the parcelID variable which servs as a unique primary key for each observation.