Housing Price Prediction

Welcome to my very first Machine Learning / Data Science Project.
This post is in continuation of the Part 1 - Data Extraction of this Project , Part-2 EDA and Visualization ,Part 3 - Preprocessing and Part 4 - Modeling please check them out if haven't already.
I will be sharing the process and updates using blogs.
In this Blog Post I have detailed the Overview and focused on last but not least part of a Machine Learning / Data Science Project: Deployment!!
You can also view this project on Google Collab.
Overview
This Project Notebook covers all the necessary steps to complete the Machine Learning Task of Predicting the Housing Prices on California Housing Dataset available on scikit-learn. We will perform the following steps for successfully creating a model for house price prediction:
1. Data Extraction (See Details in Previous Blog)
Import libraries
Import Dataset from scikit-learn
Understanding the given Description of Data and the problem Statement
Take a look at different Inputs and details available with dataset.
Storing the obtained dataset into a Pandas Data frame
2. EDA (Exploratory Data Analysis) and Visualization (See Details in Previous Blog)
Getting a closer Look at obtained Data
Exploring different Statistics of the Data (Summary and Distributions)
Looking at Correlations (between indiviual features and between Input features and Target)
Geospatial Data / Coordinates - Longitude and Lattitude features
3. Preprocessing (See Details in Previous Blog)
Dealing with Duplicate and Null (NaN) values
Dealing with Categorical features (e.g. Dummy coding)
Dealing with Outlier values
Visualization (Box-Plots)
Using IQR
Using Z-Score
Seperating Target and Input Features
Target feature Normalization (Plots and Tests)
Splitting Dataset into train and test sets
Feature Scaling (Feature Transformation)
4. Modeling (See Details in Previous Blog)
Specifying Evaluation Metric R squared (using Cross-Validation)
Model Training - trying multiple models and hyperparameters:
Linear Regression
Polynomial Regression
Ridge Regression
Decision Trees Regressor
Random Forests Regressor
Gradient Boosted Regressor
eXtreme Gradient Boosting (XGBoost) Regressor
Support Vector Regressor
Model Selection (by comparing evaluation metrics)
Learn Feature Importance and Relations
Prediction
5. Deployment
Exporting the trained model to be used for later predictions. (by storing model object as byte file - Pickling)] pickle is a serialized format file - to be deployed on web servers pickle.dump(gradient_boosting_model, open("gradient_boosting_model.pkl", 'wb') ) #here wb = write byte Load and use the pickle file (model object) pickled_model = pickle.load(open("gradient_boosting_model.pkl", "rb")) #here rb = read byte making a pridiction using the loaded pickled_model new_data = scaler.transform(cal_housing_dataset.data.loc[0].values.reshape(1, -1)) new_data
Output:
array([[ 2.35209557, 0.98489275, 0.5903677 , -0.14943682, -0.98178766, -0.04651314, 1.0518404 , -1.32288385]])
## Prediction
pickled_model.predict(new_data)
Output: array([4.50939797]) the prediction is same as one made before
We can Download and use these trained model and scaler for deployment on the server.
Thank you for your time!! The following is the file with progress of the project until now.
Did you like my Notebook and my approach??
Yes, Absolutely 🤩
Nice Try 😅
No, can improve a lot 👀
Comentários