Call WhatsApp Enquiry

Win the COVID-19

If you slightly change your perspective towards the lock-down situation you can find hope of this pandemic to end and can hope of a brighter than ever future. Go for Data Science, it will be worth it.

What is Supervised, unsupervised learning, and reinforcement learning in Machine learning

The supervised learning algorithm is widely used in the industries to predict the business outcome, and forecasting the result on the basis of historical data. The output of any supervised learning depends on the target variables. It allows the numerical, categorical, discrete, linear datasets to build a machine learning model. The target variable is known for building the model and that model predicts the outcome on the basis of the given target variable if any new data point comes to the dataset.

The supervised learning model is used to teach the machine to predict the result for the unseen input. It contains a known dataset to train the machine and its performance during the training time of a model. And then the model predicts the response of testing data when it is fed to the trained model. There are different machine learning models that are suitable for different kinds of datasets. The supervised algorithm uses regression and classification techniques for building predictive models.

For example, you have a bucket of fruits and there are different types of fruits in the bucket. You need to separate the fruits according to their features and you know the name of the fruits follow up its corresponding features the features of the fruits are independent variables and name of fruits are dependent variable that is out target variable. We can build a predicting model to determine the fruit name.

There are various types of Supervised learning:

  1. Linear regression
  2. Logistic regression
  3. Decision tree
  4. Random forest
  5. support vector machine
  6. k-Nearest neighbors

Linear and logistic regression is used when we have continuous data. Linear regression defines the relationship between the variables where we have independent and dependent variables. For example, what would be the performance percentage of a student after studying a number of hours? The numbers of hours are in an independent feature and the performance of students in the dependent features. The linear regression is also categorized in types
those are simple linear regression, multiple linear regression, polynomial regression. 

Classification algorithms help to classify the categorical values. It is used for the categorical values, discrete values, or the values which belong to a particular class. Decision tree and Random forest and KNN all are used for the categorical dataset. Popular or major applications of classification include bank credit scoring, medical imaging, and speech recognition. Also, handwriting recognition uses classification to recognize letters and numbers, to check whether an email is genuine or spam, or even to detect whether a tumor is benign or cancerous and for recommender systems.

The support vector machine is used for both classification and regression problems. It uses the regression method to create a hyperplane to classify the category of the datapoint. sentiment analysis of a subject is determined with the help of SVM whether the statement is positive or negative.

Unsupervised learning algorithms

Unsupervised learning is a technique in which we need to supervise the model as we have not any target variable or labeled dataset. It discovers its own information to predict the outcome. It is used for the unlabeled datasets. Unsupervised learning algorithms allow you to perform more complex processing tasks compared to supervised learning. Although, unsupervised learning can be more unpredictable compared with other natural learning methods. It is easier to get unlabeled data from a computer than labeled data, which needs manual intervention.

For example, We have a bucket of fruits and we need to separate them accordingly, and there no target variable available to determine whether the fruit is apple, orange, or banana. Unsupervised learning categorizes these fruits to make a prediction when new data comes.

Types of unsupervised learning:

  1. Hierarchical clustering
  2. K-means clustering
  3. K-NN (k nearest neighbors)
  4. Principal Component Analysis
  5. Singular Value Decomposition
  6. Independent Component Analysis

Hierarchical clustering is an algorithm that builds a hierarchy of clusters. It begins with all the data which is assigned to a cluster of their own. Here, two close clusters are going to be in the same cluster. This algorithm ends when there is only one cluster left.

K-means and KNN is also a clustering method to classify the dataset. k-means is an iterative method of clustering and also used to find the highest value for every iteration, we can select the numbers of clusters. You need to define the k cluster for making a good predictive model. K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other machine learning techniques, in that it doesn’t produce a model. It is a simple algorithm that stores all available cases and classifies new instances based on a similarity measure.

PCA(Principal component analysis) is a dimensionality reduction algorithm. For example, you have a dataset with 200 of the features/columns. You need to reduce the number of features for the model with only an important feature. It maintains the complexity of the dataset.

Reinforcement learning is also a type of Machine learning algorithm. It provides a suitable action in a particular situation, and it is used to maximize the reward. The reward could be positive or negative based on the behavior of the object. Reinforcement learning is employed by various software and machines to find the best possible behavior in a situation.

Main points in Reinforcement learning –

  • Input: The input should be an initial state from which the model will start
  • Output: There are much possible output as there are a variety of solution to a particular problem
  • Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output.
  • The model keeps continues to learn.
  • The best solution is decided based on the maximum reward.

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

XGBoost Classifier

What is the XGboost classifier?

XGBoost classifier is a Machine learning algorithm that is applied for structured and tabular data. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. XGBoost is an extreme gradient boost algorithm. And that means it’s a big Machine learning algorithm with lots of parts. XGBoost works with large complicated datasets. XGBoost is an ensemble modeling technique.

What is ensemble modeling?

XGBoost is an ensemble learning method. Sometimes, it may not be sufficient to rely upon the results of just one machine learning model. Ensemble learning offers a systematic solution to combine the predictive power of multiple learners. The resultant is a single model that gives the aggregated output from several models.

The models that form the ensemble, also known as base learners, could be either from the same learning algorithm or different learning algorithms. Bagging and boosting are two widely used ensemble learners. Though these two techniques can be used with several statistical models, the most predominant usage has been with decision trees.

Unique features of XGBoost:

XGBoost is a popular implementation of gradient boosting. Let’s discuss some features of XGBoost that make it so interesting.

    • Regularization: XGBoost has an option to penalize complex models through both L1 and L2 regularization. Regularization helps in preventing overfitting
    • Handling sparse data: Missing values or data processing steps like one-hot encoding make data sparse. XGBoost incorporates a sparsity-aware split finding algorithm to handle different types of sparsity patterns in the data
    • Weighted quantile sketch: Most existing tree based algorithms can find the split points when the data points are of equal weights (using a quantile sketch algorithm). However, they are not equipped to handle weighted data. XGBoost has a distributed weighted quantile sketch algorithm to effectively handle weighted data
    • Block structure for parallel learning: For faster computing, XGBoost can make use of multiple cores on the CPU. This is possible because of a block structure in its system design. Data is sorted and stored in in-memory units called blocks. Unlike other algorithms, this enables the data layout to be reused by subsequent iterations, instead of computing it again. This feature also serves useful for steps like split finding and column sub-sampling
    • Cache awareness: In XGBoost, non-continuous memory access is required to get the gradient statistics by row index. Hence, XGBoost has been designed to make optimal use of hardware. This is done by allocating internal buffers in each thread, where the gradient statistics can be stored
    • Out-of-core computing: This feature optimizes the available disk space and maximizes its usage when handling huge datasets that do not fit into memory.

Solve the XGBoost mathematically:


Here we will use simple Training Data, which has a Drug dosage on the x-axis and Drug effectiveness in the y-axis. These above two observations(6.5, 7.5) have a relatively large value for Drug Effectiveness and that means that the drug was helpful and these below two observations(-10.5, -7.5) have a relatively negative value for Drug Effectiveness, and that means that the drug did more harm than good.

The very 1st step in fitting XGBoost to the training data is to make an initial prediction. This prediction could be anything but by default, it is 0.5, regardless of whether you are using XGBoost for Regression or Classification.

The prediction 0.5 corresponds to the thick black horizontal line.


Unlike unextreme Gradient Boost which typically uses regular off-the-shelf, Regression Trees. XGBoost uses a unique Regression tree that is called an XGBoost Tree.

Now we need to calculate the Quality score or Similarity score for the Residuals.

Here λ  is a regularization parameter.

So we split the observations into two groups, based on whether or not the Dosage<15.


The observation on the left is the only one with Dosage<15. All of the other residuals go to the leaf on the right.

When we calculate the similarity score for the observations –10.5,-7.5,6.5,7.5 while putting λ =0
we got similarity =4  and


Hence the result we got is:

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

Human activity recognition with smart phone

Human Activity recognition:

In this case study, we design a model by which a smartphone can detect its owner’s activity precisely. Human activity recognition with a smartphone is a very famous ML project. It is a wellness approach for a human.  Human activity is a very exciting project for AI.

Most of the smartphones have two smart sensors accelerometer and gyroscope, which is an IoT sensor. With the help of the IoT devices captures the activity of a human. The data of human activity collected through the IoT sensor. The two smartphone sensors are accelerometer and gyroscope. Accelerometer collects the data of mobile movement such as move landscape and portrait when playing mobile games and gyroscope measure the rotational movement.

An example that a smartphone has an android app that reads the accelerometers and gyroscope which can predict the human activity that he/she walking normally, walking upstairs, walking downstairs, laying down, sitting all these are the human activities.  Some of the accelerometer and gyroscope measures heart rate, calories burned, etc. by reading all the human activities these tells how much work have done in a day by the human this is also the area of the internet of things(IoT).

Working of Human activity project:

  1. Human activity recognition: With the help of sensors we collect the data of body movement which is captured by the smartphone. Movements are often indoor activities such as walking, walking upstairs, walking downstairs, lying down, sitting and standing. The data have recorded for the prediction of the data.

      2. Data set collection of activity: The data was collected from the 30 volunteers aged between 19 to 48                     performing the activities mentioned above while wearing a smartphone on waist. The example video is given below to understand Subject performing the activities and the movement data was labeled manually.

3. Human Activity Recognition Using Smartphones Data Set: The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers were selected for generating the training data and 30% the test data. The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

4.Download the Dataset:

  • There are “train” and “test” folders containing the split portions of the data for modeling (e.g. 70%/30%).
  • There is a “txt” file that contains a detailed technical description of the dataset and the contents of the unzipped files.
  • There is a “txt” file that contains a technical description of the engineered features.

The contents of the “train” and “test” folders are similar (e.g. folders and file names), although with differences in the specific data they contain.

Load  set data and process it:

Important libraries to import for data processing

#start with some necessary imports
import numpy as np
import pandas as pd
from google.colab import files
uploaded = files.upload()

google.colab used to fetch the data from the collaborator files.

train_data = pd.read_csv("train.csv")

we select the training data set for the modeling.


The above function defines how many rows and columns the dataset have.


It describes that there are (8 rows and 563 columns) with all the features of the data. For numeric data, the result’s index will include countmeanstdminmax as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

uploaded = files.upload()
test_data = pd.read_csv('test.csv')

Here we read the csv file to analyze the data set and the operation which is supposed to be programmed. head()
shows the first 5 rows with their respective columns so here we have (5 rows and 563 columns).

# suffling data
from sklearn.utils import shuffle

# test = shuffle(test)
train_data = shuffle(train_data)

Shuffling data serves the purpose of reducing variance and making sure that models remain general and overfit less.
The obvious case where you’d shuffle your data is if your data is sorted by their class/target. Here, you will want to shuffle to make sure that your training/test/validation sets are representative of the overall distribution of the data.

# separating data inputs and output lables
trainData = train_data.drop('Activity' , axis=1).values
trainLabel = train_data.Activity.values

testData = test_data.drop('Activity' , axis=1).values
testLabel = test_data.Activity.values

By using the above code we separate the input and output, here it determines the human activities which are captured by the IoT device. The human activities walking, standing, walking upstairs, walking downstairs, sitting and lying down are got separated to optimize the result.

# encoding labels
from sklearn import preprocessing

encoder = preprocessing.LabelEncoder()
# encoding test labels
testLabelE = encoder.transform(testLabel)

# encoding train labels
trainLabelE = encoder.transform(trainLabel)

Holds the label for each class. encode categorical features using a one-hot or ordinal encoding scheme. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

# applying supervised neural network using multi-layer preceptron
import sklearn.neural_network as nn
mlpSGD = nn.MLPClassifier(hidden_layer_sizes=(90,) \
, max_iter=1000 , alpha=1e-4 \
, solver='sgd' , verbose=10 \
, tol=1e-19 , random_state=1 \
, learning_rate_init=.001) 

mlpADAM = nn.MLPClassifier(hidden_layer_sizes=(90,) \
, max_iter=1000 , alpha=1e-4 \
, solver='adam' , verbose=10 \
, tol=1e-19 , random_state=1 \
, learning_rate_init=.001)
nnModelSGD = , trainLabelE)
y_pred = mlpSGD.predict(testData).reshape(-1,1)
from sklearn.metrics import classification_report
print(classification_report(testLabelE, y_pred))

import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure(figsize=(32,24))
ax1 = fig.add_subplot(221)
ax1 = sns.stripplot(x='Activity', y=sub_01.iloc[:,0], data=sub_01, jitter=True)
ax2 = fig.add_subplot(222)
ax2 = sns.stripplot(x='Activity', y=sub_01.iloc[:,1], data=sub_01, jitter=True) 


fig = plt.figure(figsize=(32,24))
ax1 = fig.add_subplot(221)
ax1 = sns.stripplot(x='Activity', y=sub_01.iloc[:,2], data=sub_01, jitter=True)
ax2 = fig.add_subplot(222)
ax2 = sns.stripplot(x='Activity', y=sub_01.iloc[:,3], data=sub_01, jitter=True)


Click here to watch the video:

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

Data Preprocessing

Data Preprocessing:

Introduction to Preprocessing:- Before modeling the data we need to clean the data to get a training sample for the modeling. Data preprocessing is a data mining technique that involves transforming the raw data into an understandable format. It provides the technique for cleaning the data from the real world which is often incomplete, inconsistent, lacking accuracy and more likely to contain many errors. Preprocessing provides a clean the data before it gets to the modeling phase.

Preprocessing of data in a stepwise fashion in scikit learn.

1.Introduction to Preprocessing:

  • Learning algorithms have an affinity towards a certain pattern of data.
  • Unscaled or unstandardized data have might have an unacceptable prediction.
  • Learning algorithms understand the only number, converting text image to number is required.
  • Preprocessing refers to transformation before feeding to machine learning.

2. StandardScaler

  • The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centered around 0, with a standard deviation of 1.
  • Calculate – Subtract mean of column & div by the standard deviation
  • If data is not normally distributed, this is not the best scaler to use.

3. MinMaxScaler

  • Calculate – Subtract min of column & div by the difference between max & min
  • Data shifts between 0 & 1
  • If distribution not suitable for StandardScaler, this scaler works out.
  • Sensitive to outliers.

4. Robust Scaler

  • Suited for data with outliers
  • Calculate by subtracting 1st-quartile & div by difference between 3rd-quartile & 1st-quartile.

5. Normalizer

  • Each parameter value is obtained by dividing by magnitude.
  • Enabling you to more easily compare data from different places.

6. Binarization

  • Thresholding numerical values to binary values ( 0 or 1 )
  • A few learning algorithms assume data to be in Bernoulli distribution – Bernoulli’s Naive Bayes

7. Encoding Categorical Value

  • Ordinal Values – Low, Medium & High. Relationship between values
  • LabelEncoding with the right mapping

8. Imputation

  • Missing values cannot be processed by learning algorithms
  • Imputers can be used to infer the value of missing data from existing data

9. Polynomial Features

  • Deriving non-linear feature by converting data into a higher degree
  • Used with linear regression to learn a model of higher degree

10. Custom Transformer

  • Often, you will want to convert an existing Python function into a transformer to assist in data cleaning or processing.
  • FunctionTransformer is used to create one Transformer
  • validate = False, is required for the string column.

11. Text Processing

  • Perhaps one of the most common information
  • Learning algorithms don’t understand the text but only numbers
  • Below methods convert text to numbers

12. CountVectorizer

  • Each column represents one word, count refers to the frequency of the word
  • A sequence of words is not maintained


  • n_grams – Number of words considered for each column
  • stop_words – words not considered
  • vocabulary – only words considered

13. TfIdfVectorizer

  • Words occurring more frequently in a doc versus entire corpus is considered more important
  • The importance is on the scale of 0 & 1

14. HashingVectorizer

  • All the above techniques convert data into a table where each word is converted to column
  • Learning on data with lakhs of columns is difficult to process
  • HashingVectorizer is a useful technique for out-of-core learning
  • Multiple words are hashed to limited column
  • Limitation – Hashed value to word mapping is not possible

15. Image Processing using skimage

  • skimage doesn’t come with anaconda. install with ‘pip install skimage’
  • Images should be converted from 0-255 scale to 0-1 scale.
  • skimage takes image path & returns numpy array
  • images consist of 3 dimensions.

Future of Education in hands of Machine Learning

Machine Learning is not only doing its magic in the world of technology but also in Education sector of today and future, know about it.

#iguru_button_61747ea7d3542 .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_61747ea7d3542 .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_61747ea7d3542 .wgl_button_link { border-color: transparent; background-color: rgba(255,149,98,1); }#iguru_button_61747ea7d3542 .wgl_button_link:hover { border-color: rgba(230,95,42,1); background-color: rgba(253,185,0,1); }#iguru_button_61747ea7d4a5c .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_61747ea7d4a5c .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_61747ea7d4a5c .wgl_button_link { border-color: rgba(218,0,0,1); background-color: rgba(218,0,0,1); }#iguru_button_61747ea7d4a5c .wgl_button_link:hover { border-color: rgba(218,0,0,1); background-color: rgba(218,0,0,1); }#iguru_button_61747ea7d92c7 .wgl_button_link { color: rgba(241,241,241,1); }#iguru_button_61747ea7d92c7 .wgl_button_link:hover { color: rgba(250,249,249,1); }#iguru_button_61747ea7d92c7 .wgl_button_link { border-color: rgba(102,75,196,1); background-color: rgba(48,90,169,1); }#iguru_button_61747ea7d92c7 .wgl_button_link:hover { border-color: rgba(102,75,196,1); background-color: rgba(57,83,146,1); }#iguru_soc_icon_wrap_61747ea7e7a3d a{ background: transparent; }#iguru_soc_icon_wrap_61747ea7e7a3d a:hover{ background: transparent; border-color: #3aa0e8; }#iguru_soc_icon_wrap_61747ea7e7a3d a{ color: #acacae; }#iguru_soc_icon_wrap_61747ea7e7a3d a:hover{ color: #ffffff; }#iguru_soc_icon_wrap_61747ea7e7a3d { display: inline-block; }.iguru_module_social #soc_icon_61747ea7e7a6e1{ color: #ffffff; }.iguru_module_social #soc_icon_61747ea7e7a6e1:hover{ color: #ffffff; }.iguru_module_social #soc_icon_61747ea7e7a6e1{ background: #44b1e4; }.iguru_module_social #soc_icon_61747ea7e7a6e1:hover{ background: #44b1e4; }
Get The Learnbay Advantage For Your Career
Note : Our programs are suitable for working professionals(any domain). Fresh graduates are not eligible.
Overlay Image
Note : Our programs are suitable for working professionals(any domain). Fresh graduates are not eligible.