Call WhatsApp Enquiry

Data Science for working professionals

To secure a job in any domain one has to give it a lot of preparation, should be trained for the role and should have absolute knowledge about the field, usually people will dedicate years in preparing for their desired roles. Shifting from a prepared role of domain to a different domain will not usually be easy, strong gust of skepticism would surely haunt. The process of shifting from one domain to another is hard, it gets harder to learn data science for working professionals because they will have to prepare for the new job role while maintaining their current one.

If and only if you plan the whole process of domain shifting in an organised and rational way, you can have a win-win situation.

Have a vision and plan your strategy

You must win in both the games of learning and working, for that you will have to strategize in such a way that your time in learning data science should not in any way collide with your work life and vice-versa. Because both of the activities are equally important as they require immense attention and individual preference.

let us start from the scratch, here are some possible concerns of a working professional:

  1. Time management
  2. Balancing the energy between two activities
  3. Scheduling
  4. Risk of affording a wrong move
  5. Risk of inefficient or improper execution

As a working professional you will have to manage your responsibilities in a way that you will have control over every single thing that happens to exist. With proper planning and the right way of approach, the above mentioned concerns could be easily tamed.

Firmly state your purpose of learning data science
Why do you want to change your domain into Data Science while you already have a job? firmly define the purpose. You should know that by shifting to data science everything will change, you will have to develop new skill sets for the role that you are targeting, processing of workflow will be different, your future job role will have different goals, purpose and aim. Act consciously when you are risking to give up on the comfort and expertise you have in your current job, be very sure about the purpose of doing so. Doing this will eliminate the skepticism about the risk of getting out of your comfort zone. The efforts that you put over learning Data Science will never go in vain because you will learn about the currently trending technologies and tools, that will help you survive not only in data science but anywhere in the IT firm.

Have a soft target
People think only the role of ‘data scientist’ matters the most but the fact is that there are several other roles in data science which significantly matter in the field, choose one role that which you want to become and start preparing for it. Doing this should be good for the starters, because you do not have to be a scholar in every tool that has ever been used in the field, smartly target those topics that are the essentials in Data Science. When you specifically work on a targeted role you will have the chance to completely know about it and its importance in the field. This way of approach will be a very smart move because you will not be confused regarding what exactly to study in the vast field of data science and the field generally prioritizes those who holds master expertise in specified field. So be very sure about the role you want to serve in, in data science.

Plan the execution
To perfectly plan the execution part you will first have to design the implementation part, do it wise and rationally. Revise your daily-life activities, reschedule it for the sake of balancing between learning and working.

Exercise on the way you spend time on everyday things, revise it according to your daily schedules. Practice to make a note of your tasks everyday, according to that plan on how much time you would invest on the things and try your best to act as decided. In other words, this way of dealing with the things is called as discipline, to have a structured day you will have to practice discipline in all possible ways. Revise your activities from sleeping habits to break sessions, reschedule them in such a way that the things will itself fall in the right place. Set targets, set your own deadlines and design the way that you want things to work in.

Networking and understanding the field
Involve with the people that come from the field of Data Science, know about the insider story of the field and about how it works. Having field knowledge is very much necessary, remember that when you get into data science you will have to work in teams, so practice skills in communication and confidence. Get interactive with the people by asking them about the ways to reach to the field, this way you will build good connections and will get great suggestions as well. Start associating yourself with the people who belongs to Data science, you will need to get used to that.

A good course
Everything that you do and every effort that you put is only to learn Data Science, but if you make the mistake of choosing a wrong course every effort of yours will go in vain. Your purpose of learning Data Science is to shift your domain into that of Data science, you cannot do this without the help of a good course. The course that you choose should not only help you to have fine knowledge in data science but also should help you to manage your planned schedules. There are many data science courses that are specially built for working professionals, it will greatly help if you choose the right one among them.

Conclusion
With the right approach and proper planning you can triumph in learning Data Science while maintaining a full time job. Stick to your plans and preparations, seek help from a good course, practice as much as you could and start involving yourself with the field. If you manage to everyday execute the plans you will surely reach your destination in ease.

Learnbay could help you
The data science course of Learnbay is specially designed for working professionals, the benefits provided in the course will help you balance your scheduling. Learnbay powered by IBM will help you throughout the journey of learning and experiencing data science.

Win the COVID-19

If you slightly change your perspective towards the lock-down situation you can find hope of this pandemic to end and can hope of a brighter than ever future. Go for Data Science, it will be worth it.

Exploratory Data Analysis on Iris dataset

What is EDA?

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, spot anomalies, to test hypotheses and to check assumptions with the help of summary statistics and graphical representations.

It is always good to explore and compare a data set with multiple exploratory techniques. After the exploratory data analysis, you will get confidence in your data to point where you’re ready to engage a machine learning algorithm and another benefit of EDA is to the selection of feature variables that will be used later for Machine Learning.
In this post, we take Iris Dataset to get the process of EDA.

Importing libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt Loading the Iris data iris_data= pd.read_csv("Iris.csv") 




Understand the data: iris_data.shape
(150,5)
iris_data['Species'].value_counts()
setosa        50
virginica     50
versicolor    50
Name: species, dtype: int64 iris_data.columns() Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width','species'],dtype='object') 1D scatter plot of the iris data: iris_setso = iris.loc[iris["species"] == "setosa"];
iris_virginica = iris.loc[iris["species"] == "virginica"];
iris_versicolor = iris.loc[iris["species"] == "versicolor"];
plt.plot(iris_setso["petal_length"],np.zeros_like(iris_setso["petal_length"]), 'o')
plt.plot(iris_versicolor["petal_length"],np.zeros_like(iris_versicolor["petal_length"]), 'o')
plt.plot(iris_virginica["petal_length"],np.zeros_like(iris_virginica["petal_length"]), 'o')
plt.grid()
plt.show()   2D scatter plot: iris.plot(kind="scatter",x="sepal_length",y="sepal_width")
plt.show()  2D scatter plot with the seaborn library : import seaborn as sns
sns.set_style("whitegrid");
sns.FacetGrid(iris,hue="species",size=4) \
.map(plt.scatter,"sepal_length","sepal_width") \
.add_legend()
plt.show()  

 Conclusion

  • Blue points can be easily separated from red and green by drawing a line.
  • But red and green data points cannot be easily separated.
  • Using sepal_length and sepal_width features, we can distinguish Setosa flowers from others.
  • Separating Versicolor from Viginica is much harder as they have considerable overlap.

Pair Plot:

A pairs plot allows us to see both the distribution of single variables and relationships between two variables. For example, let’s say we have four features ‘sepal length’, ‘sepal width’, ‘petal length’ and ‘petal width’ in our iris dataset. In that case, we will have 4C2 plots i.e. 6 unique plots. The pairs, in this case, will be :

  •  Sepal length, sepal width
  • sepal length, petal length
  • sepal length, petal width
  • sepal width, petal length
  • sepal width, petal width
  • petal length, petal width

So, here instead of trying to visualize four dimensions which are not possible. We will look into 6 2D plots and try to understand the 4-dimensional data in the form of a matrix.

sns.set_style("whitegrid");
sns.pairplot(iris,hue="species",size=3);
plt.show()

Conclusion:

  1. petal length and petal width are the most useful features to identify various flower types.
  2. While Setosa can be easily identified (linearly separable), virginica and Versicolor have some overlap (almost linearly separable).
  3. We can find “lines” and “if-else” conditions to build a simple model to classify the flower types.

Cumulative distribution function:

iris_setosa = iris.loc[iris["species"] == "setosa"];
iris_virginica = iris.loc[iris["species"] == "virginica"];
iris_versicolor = iris.loc[iris["species"] == "versicolor"];
counts, bin_edges = np.histogram(iris_setosa['petal_length'], bins=10, density = True)
pdf = counts/(sum(counts))
print(pdf);
>>>[0.02 0.02 0.04 0.14 0.24 0.28 0.14 0.08 0.   0.04]
print(bin_edges);
>>>[1.   1.09 1.18 1.27 1.36 1.45 1.54 1.63 1.72 1.81 1.9 ]
cdf = np.cumsum(pdf)
plt.grid()
plt.plot(bin_edges[1:],pdf);
plt.plot(bin_edges[1:], cdf) 

Mean, Median, and Std-Dev:

print("Means:")
print(np.mean(iris_setosa["petal_length"]))
print(np.mean(np.append(iris_setosa["petal_length"],50)));
print(np.mean(iris_virginica["petal_length"]))
print(np.mean(iris_versicolor["petal_length"]))
print("\nStd-dev:");
print(np.std(iris_setosa["petal_length"]))
print(np.std(iris_virginica["petal_length"]))
print(np.std(iris_versicolor["petal_length"])) OutPut: - Means: 1.464 2.4156862745098038 5.5520000000000005 4.26

Std-dev:
0.17176728442867112
0.546347874526844
0.4651881339845203

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

Human activity recognition with smart phone

Human Activity recognition:

In this case study, we design a model by which a smartphone can detect its owner’s activity precisely. Human activity recognition with a smartphone is a very famous ML project. It is a wellness approach for a human.  Human activity is a very exciting project for AI.

Most of the smartphones have two smart sensors accelerometer and gyroscope, which is an IoT sensor. With the help of the IoT devices captures the activity of a human. The data of human activity collected through the IoT sensor. The two smartphone sensors are accelerometer and gyroscope. Accelerometer collects the data of mobile movement such as move landscape and portrait when playing mobile games and gyroscope measure the rotational movement.

An example that a smartphone has an android app that reads the accelerometers and gyroscope which can predict the human activity that he/she walking normally, walking upstairs, walking downstairs, laying down, sitting all these are the human activities.  Some of the accelerometer and gyroscope measures heart rate, calories burned, etc. by reading all the human activities these tells how much work have done in a day by the human this is also the area of the internet of things(IoT).

Working of Human activity project:

  1. Human activity recognition: With the help of sensors we collect the data of body movement which is captured by the smartphone. Movements are often indoor activities such as walking, walking upstairs, walking downstairs, lying down, sitting and standing. The data have recorded for the prediction of the data.

      2. Data set collection of activity: The data was collected from the 30 volunteers aged between 19 to 48                     performing the activities mentioned above while wearing a smartphone on waist. The example video is given below to understand Subject performing the activities and the movement data was labeled manually.

3. Human Activity Recognition Using Smartphones Data Set: The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers were selected for generating the training data and 30% the test data. The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

4.Download the Dataset:

  • There are “train” and “test” folders containing the split portions of the data for modeling (e.g. 70%/30%).
  • There is a “txt” file that contains a detailed technical description of the dataset and the contents of the unzipped files.
  • There is a “txt” file that contains a technical description of the engineered features.

The contents of the “train” and “test” folders are similar (e.g. folders and file names), although with differences in the specific data they contain.

Load  set data and process it:

Important libraries to import for data processing

#start with some necessary imports
import numpy as np
import pandas as pd
from google.colab import files
uploaded = files.upload()

google.colab used to fetch the data from the collaborator files.


train_data = pd.read_csv("train.csv")
train_data.head()

we select the training data set for the modeling.

train_data.Activity.value_counts()
train_data.shape

The above function defines how many rows and columns the dataset have.


train_data.describe()  

It describes that there are (8 rows and 563 columns) with all the features of the data. For numeric data, the result’s index will include countmeanstdminmax as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.


uploaded = files.upload()
test_data = pd.read_csv('test.csv')
test_data.head()

Here we read the csv file to analyze the data set and the operation which is supposed to be programmed. head()
shows the first 5 rows with their respective columns so here we have (5 rows and 563 columns).

# suffling data
from sklearn.utils import shuffle

# test = shuffle(test)
train_data = shuffle(train_data)

Shuffling data serves the purpose of reducing variance and making sure that models remain general and overfit less.
The obvious case where you’d shuffle your data is if your data is sorted by their class/target. Here, you will want to shuffle to make sure that your training/test/validation sets are representative of the overall distribution of the data.

# separating data inputs and output lables
trainData = train_data.drop('Activity' , axis=1).values
trainLabel = train_data.Activity.values

testData = test_data.drop('Activity' , axis=1).values
testLabel = test_data.Activity.values
print(testLabel)

By using the above code we separate the input and output, here it determines the human activities which are captured by the IoT device. The human activities walking, standing, walking upstairs, walking downstairs, sitting and lying down are got separated to optimize the result.

# encoding labels
from sklearn import preprocessing

encoder = preprocessing.LabelEncoder()
# encoding test labels
encoder.fit(testLabel)
testLabelE = encoder.transform(testLabel)

# encoding train labels
encoder.fit(trainLabel)
trainLabelE = encoder.transform(trainLabel)

Holds the label for each class. encode categorical features using a one-hot or ordinal encoding scheme. It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

# applying supervised neural network using multi-layer preceptron
import sklearn.neural_network as nn
mlpSGD = nn.MLPClassifier(hidden_layer_sizes=(90,) \
, max_iter=1000 , alpha=1e-4 \
, solver='sgd' , verbose=10 \
, tol=1e-19 , random_state=1 \
, learning_rate_init=.001) 

mlpADAM = nn.MLPClassifier(hidden_layer_sizes=(90,) \
, max_iter=1000 , alpha=1e-4 \
, solver='adam' , verbose=10 \
, tol=1e-19 , random_state=1 \
, learning_rate_init=.001)
nnModelSGD = mlpSGD.fit(trainData , trainLabelE)
y_pred = mlpSGD.predict(testData).reshape(-1,1)
#print(y_pred)
from sklearn.metrics import classification_report
print(classification_report(testLabelE, y_pred))
 

import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure(figsize=(32,24))
ax1 = fig.add_subplot(221)
ax1 = sns.stripplot(x='Activity', y=sub_01.iloc[:,0], data=sub_01, jitter=True)
ax2 = fig.add_subplot(222)
ax2 = sns.stripplot(x='Activity', y=sub_01.iloc[:,1], data=sub_01, jitter=True)
plt.show() 

 

fig = plt.figure(figsize=(32,24))
ax1 = fig.add_subplot(221)
ax1 = sns.stripplot(x='Activity', y=sub_01.iloc[:,2], data=sub_01, jitter=True)
ax2 = fig.add_subplot(222)
ax2 = sns.stripplot(x='Activity', y=sub_01.iloc[:,3], data=sub_01, jitter=True)
plt.show()

 

Click here to watch the video:

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

Young Data Scientists

There is a common notion among the adults that Data Science is too much of a field to handle but surprisingly two young techies of the age 12 and 14 are already working as Data Scientists. Know what they have that you lack to acquire the sexiest job.

Decision Tree

Decision tree:

The decision tree is the classification algorithm in ML(machine learning). A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

To understand the algorithm of the decision tree we need to know about the classification.

What is Classification?

Classification is the process of dividing the datasets into different categories or groups by adding a label. It adds the data point to a particular labeled group on the basis of some condition.

As we see in daily life there are three categories in an email(Spam, Promotions, Personal) they are classified to get the proper information. Here decision tree is used to classify the mail type and fix it the proper one.

Types of classification 

  • DECISION TREE
  • RANDOM FOREST
  • NAIVE BAYES
  • KNN

Decision tree:

  1. Graphical representation of all the possible solutions to a decision.
  2. A decision is based on some conditions.
  3. The decision made can be easily explained.

There are following steps to get a decision with the decision tree

1. Entropy:

Entropy is basically used to create a tree. We find our entropy from attribute or class. A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous). ID3 algorithm uses entropy to calculate the homogeneity of a sample.

2.Information Gain:

The information gain is based on the decrease in entropy after a data-set is split on an attribute. Constructing a decision tree is all about finding an attribute that returns the highest information gain.

  • The information gain is based on the decrease in entropy after a dataset is split on an attribute.
  • Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches).
  • Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]
  • We intend to choose the attribute, splitting by which information gain will be the most
  • Next step is calculating information gain for all attributes
Here the short example of a Decision tree:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
play_data=pd.read_csv('data/tennis.csv.txt')
print(play_data)
play_data=pd.read_csv('data/tennis.csv.txt')
play_data

Output:

outlook temp humidity windy play
0 sunny hot high False no
1 sunny hot high True no
2 overcast hot high False yes
3 rainy mild high False yes
4 rainy cool normal False yes
5 rainy cool normal True no
6 overcast cool normal True yes
7 sunny mild high False no
8 sunny cool normal False yes
9 rainy mild normal False yes
10 sunny mild normal True yes
11 overcast mild high True yes
12 overcast hot normal False yes
13 rainy mild high True no 

Entropy of play:

  • Entropy(play) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

play_data.play.value_counts()
Entropy_play=-(9/14)*np.log2(9/14)-(5/14)*np.log2(5/14)
print(Entropy_play)

output:
0.94028595867063114

Information Gain on splitting by Outlook

  • Gain(Play, Outlook) = Entropy(Play) – ∑ [ p(Play|Outlook) . Entropy(Play|Outlook) ]
  • Gain(Play, Outlook) = Entropy(Play) – [ p(Play|Outlook=Sunny) . Entropy(Play|Outlook=Sunny) ] – [ p(Play|Outlook=Overcast) . Entropy(Play|Outlook=Overcast) ] – [ p(Play|Outlook=Rain) . Entropy(Play|Outlook=Rain) ]

play_data[play_data.outlook == 'sunny'] 

# Entropy(Play|Outlook=Sunny)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)
Entropy_Play_Outlook_Sunny
play_data[play_data.outlook == 'overcast'] # Entropy(Play|Outlook=overcast)
# Since, it's a homogenous data entropy will be 0
play_data[play_data.outlook == 'rainy'] # Entropy(Play|Outlook=rainy)
Entropy_Play_Outlook_Rain = -(2/5)*np.log2(2/5) - (3/5)*np.log2(3/5)
print(Entropy_play_Outlook_Rain)
# Entropy(Play_Sunny|)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)
#Gain(Play, Outlook) = Entropy(Play) – [ p(Play|Outlook=Sunny) . Entropy(Play|Outlook=Sunny) ] –
#[ p(Play|Outlook=Overcast) . Entropy(Play|Outlook=Overcast) ] – [ p(Play|Outlook=Rain) . Entropy(Play|Outlook=Rain) ]

Other gains

  • Gain(Play, Temperature) – 0.029
  • Gain(Play, Humidity) – 0.151
  • Gain(Play, Wind) – 0.048

Conclusion – Outlook is winner & thus becomes root of the tree

Time to find the next splitting criteria

play_data[play_data.outlook == 'overcast'] play_data[play_data.outlook == 'sunny'] # Entropy(Play_Sunny|)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)
print(Entropy_Play_Outlook_Sunny)
# Entropy(Play_Sunny|)
Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)
print(Entropy_Play_Outlook_Sunny)

Information Gain for humidity

#Entropy for attribute high = 0, also entropy for attribute normal = 0
Entropy_Play_Outlook_Sunny - (3/5)*0 - (2/5)*0 

Information Gain for windy

  • False -> 3 -> [1+ 2-]
  • True -> 2 -> [1+ 1-]

Entropy_Wind_False = -(1/3)*np.log2(1/3) - (2/3)*np.log2(2/3)
print(Entropy_Wind_False)
Entropy_Play_Outlook_Sunny - (3/5)* Entropy_Wind_False - (2/5)*1  

Information Gain for temperature

  • hot -> 2 -> [2- 0+]
  • mild -> 2 -> [1+ 1-]
  • cool -> 1 -> [1+ 0-]

Entropy_Play_Outlook_Sunny - (2/5)*0 - (1/5)*0 - (2/5)* 1]

Conclusion : Humidity is the best choice on sunny branch:

play_data[(play_data.outlook == 'sunny') & (play_data.humidity == 'high')] 

Output:

outlook temp humidity windy play
0 sunny hot high False no
1 sunny hot high True no
7 sunny mild high False no 

play_data[(play_data.outlook == 'sunny') & (play_data.humidity == 'normal']

Output:
outlook temp humidity windy play
8 sunny cool normal False yes
10 sunny mild normal True yes

Splitting the rainy branch:

play_data[play_data.outlook == 'rainy'] # Entropy(Play_Rainy|)
Entropy_Play_Outlook_Rainy =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)outlook temp humidity windy play
3 rainy mild high False yes
4 rainy cool normal False yes
5 rainy cool normal True no
9 rainy mild normal False yes
13 rainy mild high True no 

Information Gain for temp

  • mild -> 3 [2+ 1-]
  • cool -> 2 [1+ 1-]

Entropy_Play_Outlook_Rainy - (3/5)*0.918 - (2/5)*1

Output:
0.020150594454668602

Information Gain for Windy:

Entropy_Play_Outlook_Rainy - (2/5)*0 - (3/5)*0

Output:
0.97095059445466858 

Information Gain for Humidity

  • High -> 2 -> [1+ 1-]
  • Normal -> 3 -> [2+ 1-]

Entropy_Play_Outlook_Rainy_Normal = -(1/3)*np.log2(1/3) - (2/3)*np.log2(2/3)
Entropy_Play_Outlook_Rainy_Normal
Entropy_Play_Outlook_Rainy - (2/5)*1 - (3/5)*Entropy_Play_Outlook_Rainy_Normal
Entropy_Play_Outlook_Rainy_Normal
Entropy_Play_Outlook_Rainy_Normal

Output:
0.91829583405448956
0.019973094021974891 

Final tree:

Decision trees are popular among non-statisticians as they produce a model that is very easy to interpret. Each leaf node is presented as an if/then rule. Cases that satisfy the if/then the statement is placed in the node. Are non-parametric and therefore do not require normality assumptions of the data. Parametric models specify the form of the relationship between predictors and response. An example is a linear relationship for regression. In many cases, however, the nature of the relationship is unknown. This is a case in which non-parametric models are useful. Can handle data of different types, including continuous, categorical, ordinal, and binary. Transformations of the data are not required. It can be useful for detecting important variables, interactions, and identifying outliers. It handles missing data by identifying surrogate splits in the modeling process. Surrogate splits are splitting highly associated with the primary split. In other models, records with missing values are omitted by default.

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. Choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

Evolution of Data Science in India

Data Science is indeed the sexiest job but not much of a brand new concept to the world. Data Science is being used from decades before, Know more about it.

#iguru_button_6173d470c8a5a .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_6173d470c8a5a .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_6173d470c8a5a .wgl_button_link { border-color: transparent; background-color: rgba(255,149,98,1); }#iguru_button_6173d470c8a5a .wgl_button_link:hover { border-color: rgba(230,95,42,1); background-color: rgba(253,185,0,1); }#iguru_button_6173d470c9f07 .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_6173d470c9f07 .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_6173d470c9f07 .wgl_button_link { border-color: rgba(218,0,0,1); background-color: rgba(218,0,0,1); }#iguru_button_6173d470c9f07 .wgl_button_link:hover { border-color: rgba(218,0,0,1); background-color: rgba(218,0,0,1); }#iguru_button_6173d470cde7b .wgl_button_link { color: rgba(241,241,241,1); }#iguru_button_6173d470cde7b .wgl_button_link:hover { color: rgba(250,249,249,1); }#iguru_button_6173d470cde7b .wgl_button_link { border-color: rgba(102,75,196,1); background-color: rgba(48,90,169,1); }#iguru_button_6173d470cde7b .wgl_button_link:hover { border-color: rgba(102,75,196,1); background-color: rgba(57,83,146,1); }#iguru_soc_icon_wrap_6173d470dbbaa a{ background: transparent; }#iguru_soc_icon_wrap_6173d470dbbaa a:hover{ background: transparent; border-color: #3aa0e8; }#iguru_soc_icon_wrap_6173d470dbbaa a{ color: #acacae; }#iguru_soc_icon_wrap_6173d470dbbaa a:hover{ color: #ffffff; }#iguru_soc_icon_wrap_6173d470dbbaa { display: inline-block; }.iguru_module_social #soc_icon_6173d470dbbfc1{ color: #ffffff; }.iguru_module_social #soc_icon_6173d470dbbfc1:hover{ color: #ffffff; }.iguru_module_social #soc_icon_6173d470dbbfc1{ background: #44b1e4; }.iguru_module_social #soc_icon_6173d470dbbfc1:hover{ background: #44b1e4; }
Get The Learnbay Advantage For Your Career
Note : Our programs are suitable for working professionals(any domain). Fresh graduates are not eligible.
Overlay Image
GET THE LEARNBAY ADVANTAGE FOR YOUR CAREER
Note : Our programs are suitable for working professionals(any domain). Fresh graduates are not eligible.