Programmingempire
In this article on Visualizing Regression Models with lmplot() and residplot() in Seaborn, I will explain these two methods of the Seaborn package.
What is a FacetGrid?
Basically, it is a class in the Seaborn package that helps us visualize the distribution of one variable. At the same time, we can also visualize the relationship between the multiple variables using different panels. Moreover, a faceltGrid has three dimensions – the row, col, and the hue.
Example of Using lmplot() Method
The following code example uses a Stroke Prediction Dataset available here. As has been noted, our stroke dataset contains the following fields:
[‘id’, ‘gender’, ‘age’, ‘hypertension’, ‘heart_disease’, ‘ever_married’, ‘work_type’, ‘Residence_type’, ‘avg_glucose_level’, ‘bmi’, ‘smoking_status’, ‘stroke’]
Certainly, the function lmplot() draws a scatterplot, and also it does so on a FaceGrid(). Evidently, the function takes several arguments. While the arguments x and y represent the columns in the dataset, the parameter data should be assigned the name of the data frame. Besides, there is a hue parameter that allows us to add another dimension which allows us to represent another column in the same plot using a color. In fact, the code example given below shows the relationship between two variables as well as a hue representing a third column in the CSV,
import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd
df1=pd.read_csv("stroke_data.csv")
print(df1.head())
print(list(df1))
#Example of using lmplot()
sb.lmplot(x="age", y="bmi", hue="gender", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="gender", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="gender", data=df1)
plt.show()
sb.lmplot(x="age", y="bmi", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="ever_married", data=df1)
plt.show()
sb.lmplot(x="age", y="bmi", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="work_type", data=df1)
plt.show()
sb.lmplot(x="age", y="bmi", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="heart_disease", y="hypertension", hue="smoking_status", data=df1)
plt.show()
sb.lmplot(x="bmi", y="hypertension", hue="smoking_status", data=df1)
plt.show()
Output
Residual of a Linear Regression
While we draw the regression line in the scatter plot, it may happen that not all the points fall on the regression line. Accordingly, the vertical line that we draw from a data point to the regression line is known as the residual of that point. When a point is above the regression line it is called the positive residual. Similarly, the points falling below the regression line have a negative residual and the points on the line have zero residual.
Benefits of Determining Residuals
Since the residual provides us information about the deviation of the actual value from the predicted value they help us in determining the accuracy of our regression model.
Example of Using residplot() Method
The following example uses a dataset of the Daily Temperature of Major Cities which is available here. Since the dataset consists of a huge number of rows, we apply certain filters. Firstly, the rows of the Asia region are retrieved. After that, India is selected for the Country field and Delhi is selected for the City field. Afterwards, we select rows for the year 2020 and 1995 respectively in two different data frames. Then we draw the regression plots and the residue plots for both data frames.
import seaborn as sb
from matplotlib import pyplot as plt
import pandas as pd
df1=pd.read_csv("city_temperature.csv")
print(df1.head())
print(list(df1))
#Printing unique values of the Region field
print(df1['Region'].unique())
print(df1.loc[df1["Region"]=="Asia"])
#Retrieve data where the Region is Asia
df2=df1.loc[df1["Region"]=="Asia"]
print(df2['Country'].unique())
# From Asia region, retrive the data where Country is india
df3=df2.loc[df2["Country"]=="India"]
print(df3)
# Fetch data from the City=Delhi
df4=df3.loc[df3["City"]=="Delhi"]
print(df4)
# Data from the Year=2020
df5=df4.loc[df4["Year"]==2020]
print(df5)
# Draw Regression Plot
sb.regplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Regression Plot on Month-wise Average Temperature for the year 2020")
plt.show()
# Data from the Year=1995
df6=df4.loc[df4["Year"]==1995]
print(df6)
# Data from year=1995 and month upto 5
df7=df6.loc[df6["Month"]<=5]
print(df7)
# Draw Regression Plot
lim=sb.regplot(y="AvgTemperature", x="Month", data=df7)
lim.set(ylim=(50, 100))
plt.title("Regression Plot on Month-wise Average Temperature for the year 1995")
plt.show()
# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df5)
plt.title("Residue Plot on Month-wise Average Temperature for the year 2020")
plt.show()
# Draw Residual Plot
sb.residplot(y="AvgTemperature", x="Month", data=df7)
plt.title("Residue Plot on Month-wise Average Temperature for the year 1995")
plt.show()
Output
Filtering the Dataset
Summary
This article on Visualizing Regression Models with lmplot() and residplot() in Seaborn demonstrates the use of both of these functions available in the Regression API of the Seaborn package. generally, the lmplot() function compares two different variables whereas the residplot() function measures the accuracy of the regression model.
Further Reading
How to Implement Inheritance in Python
Find Prime Numbers in Given Range in Python
Running Instructions in an Interactive Interpreter in Python
Deep Learning Practice Exercise
Deep Learning Methods for Object Detection
Image Contrast Enhancement using Histogram Equalization
Transfer Learning and its Applications
Examples of OpenCV Library in Python
Understanding Blockchain Concepts
Example of Multi-layer Perceptron Classifier in Python
Measuring Performance of Classification using Confusion Matrix
Artificial Neural Network (ANN) Model using Scikit-Learn
Popular Machine Learning Algorithms for Prediction
Long Short Term Memory – An Artificial Recurrent Neural Network Architecture
Python Project Ideas for Undergraduate Students
Creating Basic Charts using Plotly
Visualizing Regression Models with lmplot() and residplot() in Seaborn
Data Visualization with Pandas