Machine Learning : Simple Linear Regression

Regression in terms of mathematics means a relationship between the values – particularly the output value with its corresponding values. Regression models in machine learning are used to predict real values. Let’s assume the case where we have a dataset that defines the salary trend in an organization on the basis of the experience of the employee in years. If the independent variable is experience, we can use the regression model to predict salary values of new employees entering the organization with different experience, or otherwise, our model can give us present but unknown salary values in the dataset.

A simple linear regression refers to a straight sloped line with can mathematically be denoted with the equation :

y = mx+c

y = Dependent Variable. Its value is dependent upon the value of x.

x = Independent Variable.

m = Slope of the line

c = Constant

Click here to download the dataset along with the code.

For our dataset, the simple linear equation is as below:

Salary = m * Experience + c

The value of the constant c defines the basic salary of a fresher in the organization with 0 years of experience.

Ordinary Least Squares Method

The ordinary least squares method, also called as the Best Fitting line is another method used for simple linear regression. For this technique, the difference between the salary as per the model line(y = mx+c) is taken and stored. Assume that at some point, the salary is $50000, however as per the model it should be $48500. So the difference between the two i.e y2 – y1 is taken and is then squared i.e (y2 – y1)^2. For all the available data, such points are calculated and stored and the minimum value of the square values is selected. This line having the minimum (y2-y1)^2 will be called as the best-fitting line.

To create a machine learning algorithm that follows the simple linear regression approach, the first step is to preprocess the data. We will use the preprocessing template we already have. Once we load the dataset, we will use it to train our model using simple linear regression. As our model is trained, it will be able to understand the correlation between experience and salary and will be able to predict salary values for a given experience value. We will edit the data preprocessing template according to our dataset and once this is done, we will have the matrix of features- X and the dependent variable vector – Y along with their respective train and test sets.

The very basic step of the regression technique is to create a regressor. A regressor is an object of the regression class. This regressor is now a machine which is learning by getting experiences/observances from the training sets. The regressor understands the correlation between the salary and the experience. Comparing it to machine learning, the machine is the simple linear regressor model and learning is that the simple linear regressor machine learns from the training sets composed of X and Y variables – independent and dependent variables respectively.

#Fitting single linear regression to the Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)

For finding out predictions, we create a vector called as y_pred to hold the prediction results. y_pred will be the vector of predictions of the dependent variable. We use the predict method of the regressor object to obtain the predictions for the test set. And finally, we use the matplotlib library to get a graphical representation of the salary predictions and the experience component.

#Predecting the Test set results
y_pred = regressor.predict(X_test)
y_train_pred = regressor.predict(X_train)
 
#Visualizing the Training set results
plt.scatter(X_train, y_train, color = 'red') #scatter graph
plt.plot(X_train,y_train_pred,color = 'blue') #regression line

See the complete python code for the case of simple linear regression technique for machine learning below :

# Data Preprocessing Template
 
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
 
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
 
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
#Fitting single linear regression to the Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)
 
#Predecting the Test set results
y_pred = regressor.predict(X_test)
y_train_pred = regressor.predict(X_train)
 
#Visualizing the Training set results
plt.scatter(X_train, y_train, color = 'red') #scatter graph
plt.plot(X_train,y_train_pred,color = 'blue') #regression line
 
#labelling the plot
plt.title('Salary Vs Experience - Training Set')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show() #to mark the end of graph
 
 
#for test set
plt.scatter(X_test, y_test, color = 'red') #scatter graph
plt.plot(X_train,y_train_pred,color = 'blue') #regression line
 
#labelling the plot
plt.title('Salary Vs Experience - Test Set')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show() #to mark the end of graph

Spread the word :)

Socializing is Healthy !

Checkout My Portfolio

Anirudh Sethi

Machine Learning : Simple Linear Regression

Ordinary Least Squares Method

Leave a Reply Cancel reply

Recent Posts

Search Here

Categories

Pages

Archives

Advertisement

Follow Me On