Linear Regression

Linear Regression using sklearn

Simple Linear Regression using scikit-learn
Linear Regression is a statistical model used to predict the linear relationship between two or more variables.
Here we are going to demonstrate the linear Regression model using the Scikit-learn library in Python.
Scikit-learn also defined as sklearn is a python library with a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction. It features various algorithms like support vector machines, random forests, and k-neighbors.
The dataset used for this model contains the Experience and Salary of Employees. The Salary is based on the Years of Experience of the employee. We are going to derive a linear relationship between the years of experience and the salary.
You can download the dataset here- Dataset
Implementation of the linear regression model using sklearn :-

Step 1: Import all the required libraries for the model.

# importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Step 2: Import the dataset.

# read the dataset using pandas
Data = pd.read_csv(“Salary_Data.csv”)
print(“Data imported successfully”)
The dataset is imported using the read_csv function in the pandas library. If the dataset file is imported successfully it will print “Data imported successfully”
Data.head()    # displays the top 5 rows of the data
  • head() function in NumPy is used to display the number of rows of data we want to display. By default, it will display 5 rows if we didn’t pass any arguments.
              output    # Provides information regarding the columns in the data

Step 3: Plotting the data in a graph

plt.title(“Salary Prediction with experience”)
plt.xlabel(“Experience in years”)
  •  Plotting the data to know how the values are scattered using the plot function.

output image

# Assigning the data into rows and columns
X = Data.iloc[:, :-1].values
Y = Data.iloc[:, 1].values

Step 4: Splitting the data into training and testing data

# Split the data for train and test
from sklearn.model_selection import train_test_split
  • To split the data into training and testing sets, import a module train_test_split in the sklearn library.

Step 5: Train the data

#Training the data
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(),y_train)
print(“Training sucessful”)
  • Train the dataset using the fit() method in the LinearRegression function of sklearn. It will print “Training successful” if the dataset is trained by our model.

Step 6: Plot Regression line

# plotting the regression line
                 output image
# Intecept and coeff of the line
print(‘Intercept of the model:’,regressor.intercept_)
print(‘Coefficient of the line:’,regressor.coef_)
  • The linear regression model is represented by the equation, y = mx + c
  • m = coefficient of the line
  • c = intercept of the line


Therefore the equation for this model is represented as
  • y = (26780.09)x + 9312.5
print(x_test)  #printing the test data
Test data:

Step 7: Predicting the result

  • The predict() method is used to predict the result. Here we passed the test data as input.
year = float(input(“Enter number of years : “))
year = np.array(year).reshape(-1, 1)
print(“Predicted Salary = {}”.format(own_pred[0]))
  • We can also predict the output by giving the input manually.

Step 8: Calculating Error

  •  Mean absolute error (MAE) is a measure of errors between prediction and true values.

mean absolute error


y_i = prediction values
x_i = true value
n = total number of data 
from sklearn import metrics
print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred))
  • The mean absolute error is calculated by the mean_absolute_error() method in sklearn.

Mean Absolute Error: 2446.1723690465055

Leave a Comment