Machine Learning, Python

How to Implement Linear Regression With Multiple variables?

The following article describes How to Implement Linear Regression With Multiple variables.

Problem Statement

Use the sklearn library to create a linear regression with multiple variables. Load a well known dataset split it into training and testing sets, and then train the model to predict a target variable based on one or more features. For instance, you can use the following dataset.

https://raw.githubusercontent.com/sachinmotwani20/NPTEL-ML_Datasets/main/ScoresPrediction.

Solution

In order to create a linear regression model with multiple variables using the scikit-learn (sklearn) library, you can follow these steps.

  1. Import the necessary libraries.
  2. Load the dataset from the provided URL.
  3. Split the dataset into training and testing sets.
  4. Create and train the linear regression model.
  5. Evaluate the model’s performance.

The following Python code example does this.

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from urllib.request import urlopen
import matplotlib.pyplot as plt

# Load the dataset from the provided URL
url = "https://raw.githubusercontent.com/sachinmotwani20/NPTEL-ML_Datasets/main/ScoresPrediction.csv"
data = pd.read_csv(urlopen(url))

# Check the first few rows of the dataset
print(data.head())

# Separate the features (X) and target variable (y)
X = data.drop('FinalYrScore', axis=1)  # Assuming 'FinalYrScore' is the target variable
y = data['FinalYrScore']

# Split the dataset into training and testing sets (e.g., 80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared (R2) Score: {r2}")

# Plotting the linear regression line
plt.figure(figsize=(10, 6))
plt.scatter(X_test['FirstYrScore'], y_test, color='blue', label='Actual Scores')
plt.plot(X_test['FirstYrScore'], y_pred, color='red', linewidth=2, label='Linear Regression Line')
plt.xlabel('First Year Score')
plt.ylabel('Final Year Score')
plt.title('Linear Regression for Score Prediction')
plt.legend()
plt.grid(True)
plt.show()

Output

Evaluating Model Performance
Evaluating Model Performance
Linear Regression With Multiple Variables
Linear Regression With Multiple Variables

This code will load the dataset, split it into training and testing sets, train a linear regression model, evaluate its performance, and then plot the linear regression line along with the actual data points.


Further Reading

50+ interview questions along with their answers on Python

How to Perform Dataset Preprocessing in Python?

Spring Framework Practice Problems and Their Solutions

How to Implement Linear Regression from Scratch?

Java Practice Exercise

Getting Started with Data Analysis in Python

The Benefits of Using Data Science in the Mortgage Industry: Better Outcomes for Borrowers and Lenders

Wake Up to Better Performance with Hibernate

Data Science in Insurance: Better Decisions, Better Outcomes

Most Popular Trading Software

Breaking the Mold: Innovative Ways for College Students to Improve Software Development Skills

programmingempire

Princites

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *