 ### Gaussian Distribution

Gaussian distribution is a bell-shaped curve, it follows the normal distribution with the equal number of measurements right side and left side of the mean value. Mean is situated in the centre of the curve, the right side values from the mean are greater than the mean value and the left side values from the mean are smaller than the mean. It is used for mean, median, and mode for continuous values. You all know the basic meaning of mean, median, and mod. The mean is an average of the values, the median is the centre value of the distribution and the mode is the value of the distribution which is frequently occurred. In the normal distribution, the values of mean, median, and are all same. If the values generate skewness then it is not normally distributed. The normal distribution is very important in statistics because it fits for many occurrences such as heights, blood pressure, measurement error, and many numerical values. A gaussian and normal distribution is the same in statistics theory. Gaussian distribution is also known as a normal distribution. The curve is made with the help of probability density function with the random values. F(x) is the PDF function and x is the value of gaussian & used to represent the real values of random variables having unknown distribution.

There is a property of Gaussian distribution which is known as Empirical formula which shows that in which confidence interval the value comes under. The normal distribution contains the mean value as 0 and standard deviation 1. The empirical rule also referred to as the three-sigma rule or 68-95-99.7 rule, is a statistical rule which states that for a normal distribution, almost all data falls within three standard deviations (denoted by σ) of the mean (denoted by µ). Broken down, the empirical rule shows that 68% falls within the first standard deviation (µ ± σ), 95% within the first two standard deviations (µ ± 2σ), and 99.7% within the first three standard deviations (µ ± 3σ).

Python code for plotting the gaussian graph:

``````import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.plot(x, stats.norm.pdf(x, mu, sigma))
plt.show()``` ```

The above code shows the Gaussian distribution with 99% of the confidence interval with a standard deviation of 3 with mean 0.

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM. ### What is Supervised, Unsupervised Learning, and Reinforcement Learning in Machine Learning

The Supervised Unsupervised And Reinforcement Learning algorithm is widely used in the industries to predict the business outcome, and forecasting the result on the basis of historical data. The output of any supervised learning depends on the target variables. It allows the numerical, categorical, discrete, linear datasets to build a machine learning model. The target variable is known for building the model and that model predicts the outcome on the basis of the given target variable if any new data point comes to the dataset.

The supervised learning model is used to teach the machine to predict the result for the unseen input. It contains a known dataset to train the machine and its performance during the training time of a model. And then the model predicts the response of testing data when it is fed to the trained model. There are different machine learning models (Supervised Unsupervised And Reinforcement Learning)that are suitable for different kinds of datasets. The supervised algorithm uses regression and classification techniques for building predictive models.

For example, you have a bucket of fruits and there are different types of fruits in the bucket. You need to separate the fruits according to their features and you know the name of the fruits follow up its corresponding features the features of the fruits are independent variables and name of fruits are dependent variable that is out target variable. We can build a predicting model to determine the fruit name.

There are various types of Supervised learning:

1. Linear regression
2. Logistic regression
3. Decision tree
4. Random forest
5. support vector machine
6. k-Nearest neighbors

Linear and logistic regression is used when we have continuous data. Linear regression defines the relationship between the variables where we have independent and dependent variables. For example, what would be the performance percentage of a student after studying a number of hours? The numbers of hours are in an independent feature and the performance of students in the dependent features. The linear regression is also categorized in types
those are simple linear regression, multiple linear regression, polynomial regression.

Classification algorithms help to classify the categorical values. It is used for the categorical values, discrete values, or the values which belong to a particular class. Decision tree and Random forest and KNN all are used for the categorical dataset. Popular or major applications of classification include bank credit scoring, medical imaging, and speech recognition. Also, handwriting recognition uses classification to recognize letters and numbers, to check whether an email is genuine or spam, or even to detect whether a tumor is benign or cancerous and for recommender systems.

The support vector machine is used for both classification and regression problems. It uses the regression method to create a hyperplane to classify the category of the datapoint. sentiment analysis of a subject is determined with the help of SVM whether the statement is positive or negative.

Unsupervised learning algorithms

Unsupervised learning is a technique in which we need to supervise the model as we have not any target variable or labeled dataset. It discovers its own information to predict the outcome. It is used for the unlabeled datasets. Unsupervised learning algorithms allow you to perform more complex processing tasks compared to supervised learning. Although, unsupervised learning can be more unpredictable compared with other natural learning methods. It is easier to get unlabeled data from a computer than labeled data, which needs manual intervention.

For example, We have a bucket of fruits and we need to separate them accordingly, and there no target variable available to determine whether the fruit is apple, orange, or banana. Unsupervised learning categorizes these fruits to make a prediction when new data comes.

Types of unsupervised learning:

1. Hierarchical clustering
2. K-means clustering
3. K-NN (k nearest neighbors)
4. Principal Component Analysis
5. Singular Value Decomposition
6. Independent Component Analysis

Hierarchical clustering is an algorithm that builds a hierarchy of clusters. It begins with all the data which is assigned to a cluster of their own. Here, two close clusters are going to be in the same cluster. This algorithm ends when there is only one cluster left.

K-means and KNN is also a clustering method to classify the dataset. k-means is an iterative method of clustering and also used to find the highest value for every iteration, we can select the numbers of clusters. You need to define the k cluster for making a good predictive model. K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other machine learning techniques, in that it doesn’t produce a model. It is a simple algorithm that stores all available cases and classifies new instances based on a similarity measure.

PCA(Principal component analysis) is a dimensionality reduction algorithm. For example, you have a dataset with 200 of the features/columns. You need to reduce the number of features for the model with only an important feature. It maintains the complexity of the dataset.

Reinforcement learning is also a type of Machine learning algorithm. It provides a suitable action in a particular situation, and it is used to maximize the reward. The reward could be positive or negative based on the behavior of the object. Reinforcement learning is employed by various software and machines to find the best possible behavior in a situation.

Main points in Reinforcement learning –

• Input: The input should be an initial state from which the model will start
• Output: There are much possible output as there are a variety of solution to a particular problem
• Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson,Supervised Unsupervised  ,And Reinforcement Learning , Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

### Gradient Descent for Machine Learning

It is an optimization algorithm to find the values of parameters(coefficient) of a function(f) that minimizes the cost function. It is used when parameters can not be calculated by Linear algebra. To minimize the cost function J(w) parameterized by a model parameter w. It tells about the slope of the cost function. To minimize the cost function we move in the direction of Gradient descent. It helps to scale the large dataset.

We can fit the line with linear regression which is a straight line and squiggle with Logistic regression now we can fit the data into the line with the Gradient descent. Gradient descent optimizes these things and many more.

For example, we have a simple dataset:

We have weight in X-axis and Height in Y-axis with values (x1,y1)=(0.4, 1.3)
x2,y2= (1.2, 1.6)
x3,y3= (2, 3.1)

When we fit a line with linear regression we optimize the intercept and slope.

Height = intercpet+slope*Weight (simple line equation)

Here we can find the initialize the slope as 0.64 to find the intercept so for that just plug in the Least square estimate for the slope 0.64 and intercept as 0. Height = intercpet+slope*Weight
Height = intercept+0.64*weight

The very first step we do is pick a random value for the intercept. This is an initial guess that gradient descent something to improve upon.

Predicted Height = 0+0.64*0.4
Predicted Height = 0.25

Residual = Observed – Predicted
1.4-0.25 = (1.15)

Similarly, we will calculate the residuals for all the three points in the dataset, so have all three predicted values are 1.15, 0.84, 1.82

Now we will calculate Sum of Squared  Residuals = (1.15)^2+(0.84)^2+(1.82)^2 = 5.34

This is the new value for y-axis If we want to plot the graph with value 5.34 on the y-axis , we have This graph represents the  Sum of squared residuals with intercept zero, If we have intercept 0.5 the sum of squared residuals comes down on the graph. We can find the sum of squared residuals by changing the value of the intercept and residuals. Gradient descent does only a few calculations to find the optimal values and increases the number of calculations closer to the optimal values.

It identifies the optimal values with big steps if the values are far from each other and it takes baby steps when values are close to each other. After getting all values of Sum of squared residuals now we have an equation of a curve, Thus we can take a derivative of this function & determine the slope & value of the intercept.

d/d intercept =  d/d intercept*(1.3 – (intercept+ 0.64*0.4))^2

+ d/d intercept*(1.6 – (intercept+ 0.64*0.1.2)^2

+ d/d intercept*(3.1 – (intercept+ 0.64*2))^2

we will apply the chian rule to solve the derivative

d/d inetrcept = 2(1.4-(intercept+0.64*0.3))*(-1) so that we have

d/d inetrcept = -2(1.4-(intercept+0.64*0.3))

+ -2(1.6 – (intercept+ 0.64*1.2)

+ -2(3.1 – (intercept+ 0.64*2)

Now we have the derivative so with the help of this derivative we can find that where the sum of squared residuals is lowest. With the help of the least square method to find the optimal number of intercepts we only determine the value where the value of slope would be 0, but with the help of gradient descent, we can find the minimum value by taking steps from the initial guess until we reach the best value.

This makes the GD very efficient when it is not possible to solve for where the derivative is 0, The closer we get to the optimal value for the intercept the closer the slope of the curve gets to zero 0. This means that the slope of the curve is 0.

We need to take baby steps when we close to the optimal value when the slope is near to 0 and if the slope is far from 0 then we need to take big steps as we are far from the optimal values.

d/d intercept = -2(1.4-(intercept+0.64*0.3))

+ -2(1.6 – (intercept+ 0.64*1.2)

+ -2(3.1 – (intercept+ 0.64*2)

d/d inetrcept = -2(1.4-(0+0.64*0.3))

+ -2(1.6 – (0+ 0.64*1.2)

+ -2(3.1 – (0+ 0.64*2) = -7.71

Step size = slope * learning rate

Step size = -7.71*0.01 = -0.77

New parameter =  Old parameters – step size

= 0-(-0.77)= 0.77

What is learning rate?

In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

There are following steps to solve the Gradient descent :

1. Take the derivative of the loss function
2.  Pick a random value of parameters
3. Plug the parameters to the derivative.
4.  Calculate the step size: Step size = slope * learning rate
5. Calculate the new parameters: New parameteres =  Old parameters – step size Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

#iguru_button_628c6da7caaa1 .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_628c6da7caaa1 .wgl_button_link:hover { color: rgba(45,151,222,1); }#iguru_button_628c6da7caaa1 .wgl_button_link { border-color: rgba(45,151,222,1); background-color: rgba(45,151,222,1); }#iguru_button_628c6da7caaa1 .wgl_button_link:hover { border-color: rgba(45,151,222,1); background-color: rgba(255,255,255,1); }#iguru_button_628c6da7cee6e .wgl_button_link { color: rgba(102,75,196,1); }#iguru_button_628c6da7cee6e .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_628c6da7cee6e .wgl_button_link { border-color: rgba(102,75,196,1); background-color: transparent; }#iguru_button_628c6da7cee6e .wgl_button_link:hover { border-color: rgba(102,75,196,1); background-color: rgba(102,75,196,1); } 