# Linear Regression using sklearn

###### Simple Linear Regression using scikit-learn
Linear Regression is a statistical model used to predict the linear relationship between two or more variables.
Here we are going to demonstrate the linear Regression model using the Scikit-learn library in Python.
Scikit-learn also defined as sklearn is a python library with a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction. It features various algorithms like support vector machines, random forests, and k-neighbors.
The dataset used for this model contains the Experience and Salary of Employees. The Salary is based on the Years of Experience of the employee. We are going to derive a linear relationship between the years of experience and the salary.
Implementation of the linear regression model using sklearn :-

### Step 1: Import all the required libraries for the model.

# importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Step 2: Import the dataset.

# read the dataset using pandas
print(“Data imported successfully”)
The dataset is imported using the read_csv function in the pandas library. If the dataset file is imported successfully it will print “Data imported successfully”
Data.head()    # displays the top 5 rows of the data
• head() function in NumPy is used to display the number of rows of data we want to display. By default, it will display 5 rows if we didn’t pass any arguments.

Data.info()    # Provides information regarding the columns in the data

### Step 3: Plotting the data in a graph

Data.plot(x=’YearsExperience’,y=’Salary’,style=’o’)
plt.title(“Salary Prediction with experience”)
plt.xlabel(“Experience in years”)
plt.ylabel(“Salary”)
•  Plotting the data to know how the values are scattered using the plot function.

# Assigning the data into rows and columns
X = Data.iloc[:, :-1].values
Y = Data.iloc[:, 1].values

### Step 4: Splitting the data into training and testing data

# Split the data for train and test
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.2,random_state=0)
• To split the data into training and testing sets, import a module train_test_split in the sklearn library.

### Step 5: Train the data

#Training the data
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
print(“Training sucessful”)
• Train the dataset using the fit() method in the LinearRegression function of sklearn. It will print “Training successful” if the dataset is trained by our model.

### Step 6: Plot Regression line

# plotting the regression line
line=regressor.coef_*X+regressor.intercept_
plt.scatter(X,Y)
plt.plot(X,line)
plt.show()

# Intecept and coeff of the line
print(‘Intercept of the model:’,regressor.intercept_)
print(‘Coefficient of the line:’,regressor.coef_)
• The linear regression model is represented by the equation, y = mx + c
• m = coefficient of the line
• c = intercept of the line

Therefore the equation for this model is represented as
• y = (26780.09)x + 9312.5
print(x_test)  #printing the test data
Test data:

### Step 7: Predicting the result

y_pred=regressor.predict(x_test)
df=pd.DataFrame({‘actual’:y_test,’predicted’:y_pred})
df
• The predict() method is used to predict the result. Here we passed the test data as input.

year = float(input(“Enter number of years : “))
year = np.array(year).reshape(-1, 1)
own_pred=regressor.predict(year)
print(“Predicted Salary = {}”.format(own_pred[0]))
• We can also predict the output by giving the input manually.

### Step 8: Calculating Error

•  Mean absolute error (MAE) is a measure of errors between prediction and true values.

 = prediction values = true value = total number of data
from sklearn import metrics
print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred))
• The mean absolute error is calculated by the mean_absolute_error() method in sklearn.

Mean Absolute Error: 2446.1723690465055