 ### Decision tree:

The decision tree is the classification algorithm in ML(machine learning). A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

To understand the algorithm of the decision tree we need to know about the classification.

### What is Classification?

Classification is the process of dividing the datasets into different categories or groups by adding a label. It adds the data point to a particular labeled group on the basis of some condition.

As we see in daily life there are three categories in an email(Spam, Promotions, Personal) they are classified to get the proper information. Here decision tree is used to classify the mail type and fix it the proper one.

#### Types of classification

• DECISION TREE
• RANDOM FOREST
• NAIVE BAYES
• KNN

Decision tree:

1. Graphical representation of all the possible solutions to a decision.
2. A decision is based on some conditions.
3. The decision made can be easily explained. There are the following steps to get a decision with the decision tree

1. Entropy:

Entropy is basically used to create a tree. We find our entropy from attribute or class. A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. 2.Information Gain:

The information gain is based on the decrease in entropy after a data-set is split on an attribute. Constructing a decision tree is all about finding an attribute that returns the highest information gain.

• The information gain is based on the decrease in entropy after a dataset is split on an attribute.
• Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches).
• Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]
• We intend to choose the attribute, splitting by which information gain will be the most
• Next step is calculating information gain for all attributes
Decision Tree from RachitVerma25
Decision Tree from RachitVerma25
Here the short example of a Decision tree:

`import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline play_data=pd.read_csv('data/tennis.csv.txt') print(play_data) play_data=pd.read_csv('data/tennis.csv.txt') play_data`

Output:

`outlook temp humidity windy play 0 sunny hot high False no 1 sunny hot high True no 2 overcast hot high False yes 3 rainy mild high False yes 4 rainy cool normal False yes 5 rainy cool normal True no 6 overcast cool normal True yes 7 sunny mild high False no 8 sunny cool normal False yes 9 rainy mild normal False yes 10 sunny mild normal True yes 11 overcast mild high True yes 12 overcast hot normal False yes 13 rainy mild high True no `

Entropy of play:

• Entropy(play) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

`play_data.play.value_counts() Entropy_play=-(9/14)*np.log2(9/14)-(5/14)*np.log2(5/14) print(Entropy_play)`

output:
0.94028595867063114

Information Gain on splitting by Outlook

• Gain(Play, Outlook) = Entropy(Play) – ∑ [ p(Play|Outlook) . Entropy(Play|Outlook) ]
• Gain(Play, Outlook) = Entropy(Play) – [ p(Play|Outlook=Sunny) . Entropy(Play|Outlook=Sunny) ] – [ p(Play|Outlook=Overcast) . Entropy(Play|Outlook=Overcast) ] – [ p(Play|Outlook=Rain) . Entropy(Play|Outlook=Rain) ]

`play_data[play_data.outlook == 'sunny'] `

```# Entropy(Play|Outlook=Sunny) Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5) Entropy_Play_Outlook_Sunny play_data[play_data.outlook == 'overcast'] # Entropy(Play|Outlook=overcast) # Since, it's a homogenous data entropy will be 0 play_data[play_data.outlook == 'rainy'] # Entropy(Play|Outlook=rainy) Entropy_Play_Outlook_Rain = -(2/5)*np.log2(2/5) - (3/5)*np.log2(3/5) print(Entropy_play_Outlook_Rain) # Entropy(Play_Sunny|) Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5) #Gain(Play, Outlook) = Entropy(Play) – [ p(Play|Outlook=Sunny) . Entropy(Play|Outlook=Sunny) ] – #[ p(Play|Outlook=Overcast) . Entropy(Play|Outlook=Overcast) ] – [ p(Play|Outlook=Rain) . Entropy(Play|Outlook=Rain) ]```

Other gains

• Gain(Play, Temperature) – 0.029
• Gain(Play, Humidity) – 0.151
• Gain(Play, Wind) – 0.048

Conclusion – Outlook is winner & thus becomes root of the tree Time to find the next splitting criteria¶

```play_data[play_data.outlook == 'overcast'] play_data[play_data.outlook == 'sunny'] # Entropy(Play_Sunny|) Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5) print(Entropy_Play_Outlook_Sunny) # Entropy(Play_Sunny|) Entropy_Play_Outlook_Sunny =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5) print(Entropy_Play_Outlook_Sunny) ```

Information Gain for humidity

`#Entropy for attribute high = 0, also entropy for attribute normal = 0 Entropy_Play_Outlook_Sunny - (3/5)*0 - (2/5)*0 `

Information Gain for windy

• False -> 3 -> [1+ 2-]
• True -> 2 -> [1+ 1-]

`Entropy_Wind_False = -(1/3)*np.log2(1/3) - (2/3)*np.log2(2/3) print(Entropy_Wind_False) Entropy_Play_Outlook_Sunny - (3/5)* Entropy_Wind_False - (2/5)*1  `

Information Gain for temperature

• hot -> 2 -> [2- 0+]
• mild -> 2 -> [1+ 1-]
• cool -> 1 -> [1+ 0-]

`Entropy_Play_Outlook_Sunny - (2/5)*0 - (1/5)*0 - (2/5)* 1]`

Conclusion : Humidity is the best choice on sunny branch: `play_data[(play_data.outlook == 'sunny') & (play_data.humidity == 'high')] `

Output:

`outlook temp humidity windy play 0 sunny hot high False no 1 sunny hot high True no 7 sunny mild high False no `

`play_data[(play_data.outlook == 'sunny') & (play_data.humidity == 'normal']`

Output:
`outlook temp humidity windy play 8 sunny cool normal False yes 10 sunny mild normal True yes`

Splitting the rainy branch:

```play_data[play_data.outlook == 'rainy'] # Entropy(Play_Rainy|) Entropy_Play_Outlook_Rainy =-(3/5)*np.log2(3/5) -(2/5)*np.log2(2/5)outlook temp humidity windy play 3 rainy mild high False yes 4 rainy cool normal False yes 5 rainy cool normal True no 9 rainy mild normal False yes 13 rainy mild high True no ```

Information Gain for temp

• mild -> 3 [2+ 1-]
• cool -> 2 [1+ 1-]

`Entropy_Play_Outlook_Rainy - (3/5)*0.918 - (2/5)*1`

Output:
`0.020150594454668602`

Information Gain for Windy:

`Entropy_Play_Outlook_Rainy - (2/5)*0 - (3/5)*0`

Output:
`0.97095059445466858 `

Information Gain for Humidity

• High -> 2 -> [1+ 1-]
• Normal -> 3 -> [2+ 1-]

`Entropy_Play_Outlook_Rainy_Normal = -(1/3)*np.log2(1/3) - (2/3)*np.log2(2/3) Entropy_Play_Outlook_Rainy_Normal Entropy_Play_Outlook_Rainy - (2/5)*1 - (3/5)*Entropy_Play_Outlook_Rainy_Normal Entropy_Play_Outlook_Rainy_Normal Entropy_Play_Outlook_Rainy_Normal`

Output:` 0.91829583405448956 0.019973094021974891 `

Final tree: Decision trees are popular among non-statisticians as they produce a model that is very easy to interpret. Each leaf node is presented as an if/then rule. Cases that satisfy the if/then the statement is placed in the node. Are non-parametric and therefore do not require normality assumptions of the data. Parametric models specify the form of the relationship between predictors and response. An example is a linear relationship for regression. In many cases, however, the nature of the relationship is unknown. This is a case in which non-parametric models are useful. Can handle data of different types, including continuous, categorical, ordinal, and binary. Transformations of the data are not required. It can be useful for detecting important variables, interactions, and identifying outliers. It handles missing data by identifying surrogate splits in the modeling process. Surrogate splits are splitting highly associated with the primary split. In other models, records with missing values are omitted by default.

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science roles. By choosing Learnbay you will reach the most aspiring job of present and future.
Learnbay data science course covers Data Science with Python, Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM. ## Introduction of  Support Vector Machine

Support vector machines (SVMs) are a particularly powerful and flexible class of supervised algorithms for both classification and regression.

SVMs were introduced initially in the 1960s and were later refined in 1990s. However, it is only now that they are becoming extremely popular, owing to their ability to achieve brilliant results. SVMs are implemented uniquely when compared to other machine learning algorithms.

Support vector machine(SVM) is a supervised learning algorithm that is used to classify the data into different classes, now unlike most algorithms SVM makes use of hyperplane which acts as a decision boundary between the various classes. In general, SVM can be used to generate multiple separating the hyperplane so that the data is divided into segments. These segments contain some kind of data. SVM used to classify the data into two different segments depending on the feature of data.

### Feature of Support Vector Machine SVM-

SVM studies the labeled data & then classify any new input data depending on what it learned into the training phase.

It can be used for both classification and regression problems. As SVC supports vector classification SVR stands for support vector regression. One of the main features of SVM is kernel function, it can be used for nonlinear data by using the kernel trick.  The working of the kernel trick is to transform the data into another dimension so that we can draw a hyperplane that classifies the data.

How SVM work??

SVM works by mapping data to a high-dimensional feature space so that data points can be classified, even when the data are not linearly separable. A separator between the classifies is found, then the data are transformed in such a way that the separator could be drawn as a hyperplane. Following this, characteristics of new data can be used to predict the group to which a new record should belong.

Importing Libraries:
` import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")`

Exploratory Data Analysis:

`bankdata.shape bankdata.head()`

VarianceSkewnessCurtosisEntropyClass
03.621608.6661-2.8073-0.446990
14.545908.1674-2.4586-1.462100
23.86600-2.63831.92420.106450
33.456609.5228-4.0112-3.594400
40.32924-4.45524.5718-0.988800

Data preprocessing:

```X = bankdata.drop('Class', axis=1) y = bankdata['Class'] from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20) ```

Training the Algorithm:

`from sklearn.svm import SVC svclassifier = SVC(kernel='linear') svclassifier.fit(X_train, y_train)`

Making prediction

`y_pred = svclassifier.predict(X_test)`

Evaluating the Algorithm:

`from sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test,y_pred)) print(classification_report(y_test,y_pred))`

Output:

```[[152 0] [ 1 122]] precision recall f1-score support```

0 0.99 1.00 1.00 152
1 1.00 0.99 1.00 123

avg / total 1.00 1.00 1.00 275

### SVM Linear Classifier:

In the linear classifier model, we assumed that training examples plotted in space. These data points are expected to be separated by an apparent gap. It predicts a straight hyperplane dividing 2 classes. The primary focus while drawing the hyperplane is on maximizing the distance from hyperplane to the nearest data point of either class. The drawn hyperplane called a maximum-margin hyperplane.

### SVM Non-Linear Classifier:

In the real world, our dataset is generally dispersed up to some extent. To solve this problem separation of data into different classes based on a straight linear hyperplane can’t be considered a good choice. For this Vapnik suggested creating Non-Linear Classifiers by applying the kernel trick to maximum-margin hyperplanes. In Non-Linear SVM Classification, data points plotted in a higher-dimensional space. Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor Flow, IBM Watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real-time industry projects. Students will be efficient by being certified by IBM. ## Data Preprocessing:

Introduction to Data Preprocessing:- Before modeling the data we need to clean the information to get a training sample for the modeling. Data preprocessing is a data mining technique that involves transforming the raw data into an understandable format. It provides the technique for cleaning the data from the real world which is often incomplete, inconsistent, lacking accuracy and more likely to contain many errors. Preprocessing provides clean information before it gets to the modeling phase.

### Preprocessing of data in a stepwise fashion in scikit learn.

1.Introduction to Preprocessing:

• Learning algorithms have an affinity towards a certain pattern of data.
• Unscaled or unstandardized data have might have an unacceptable prediction.
• Learning algorithms understand the only number, converting text image to number is required.
• Preprocessing refers to transformation before feeding to machine learning.

## 2. StandardScaler

• The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centered around 0, with a standard deviation of 1.
• Calculate – Subtract mean of column & div by the standard deviation
• If data is not normally distributed, this is not the best scaler to use. 3. MinMaxScaler

• Calculate – Subtract min of column & div by the difference between max & min
• Data shifts between 0 & 1
• If distribution not suitable for StandardScaler, this scaler works out.
• Sensitive to outliers. 4. Robust Scaler

• Suited for data with outliers
• Calculate by subtracting 1st-quartile & div by difference between 3rd-quartile & 1st-quartile. 5. Normalizer

• Each parameter value is obtained by dividing by magnitude.
• Enabling you to more easily compare data from different places. 6. Binarization

• Thresholding numerical values to binary values ( 0 or 1 )
• A few learning algorithms assume data to be in Bernoulli distribution – Bernoulli’s Naive Bayes

7. Encoding Categorical Value

• Ordinal Values – Low, Medium & High. Relationship between values
• LabelEncoding with the right mapping

8. Imputation

• Missing values cannot be processed by learning algorithms
• Imputers can be used to infer the value of missing details from existing data

9. Polynomial Features

• Deriving non-linear feature by converting information into a higher degree
• Used with linear regression to learn a model of higher degree

10. Custom Transformer

• Often, you will want to convert an existing Python function into a transformer to assist in data cleaning or processing.
• FunctionTransformer is used to create one Transformer
• validate = False, is required for the string column.

11. Text Processing

• Perhaps one of the most common information
• Learning algorithms don’t understand the text but only numbers
• Below methods convert text to numbers

12. CountVectorizer

• Each column represents one word, count refers to the frequency of the word
• A sequence of words is not maintained

13.Hyperparameters

• n_grams – Number of words considered for each column
• stop_words – words not considered
• vocabulary – only words considered

13. TfIdfVectorizer

• Words occurring more frequently in a doc versus entire corpus is considered more important
• The importance is on the scale of 0 & 1

14. HashingVectorizer

• All the above techniques convert information into a table where each word is converted to column
• Learning on data with lakhs of columns is difficult to process
• HashingVectorizer is a useful technique for out-of-core learning
• Multiple words are hashed to limited column
• Limitation – Hashed value to word mapping is not possible

15. Image Processing using skimage

• skimage doesn’t come with anaconda. install with ‘pip install skimage’
• Images should be converted from 0-255 scale to 0-1 scale.
• skimage takes image path & returns numpy array
• images consist of 3 dimensions. ### Understanding Different Job Opportunities of Data Science

Data Science provides so much more job position opportunities than just “Data Scientist”, which are as exciting and interesting as Data Science.

#iguru_button_628c6d5780304 .wgl_button_link { color: rgba(255,255,255,1); }#iguru_button_628c6d5780304 .wgl_button_link:hover { color: rgba(45,151,222,1); }#iguru_button_628c6d5780304 .wgl_button_link { border-color: rgba(45,151,222,1); background-color: rgba(45,151,222,1); }#iguru_button_628c6d5780304 .wgl_button_link:hover { border-color: rgba(45,151,222,1); background-color: rgba(255,255,255,1); }#iguru_button_628c6d57856d8 .wgl_button_link { color: rgba(102,75,196,1); }#iguru_button_628c6d57856d8 .wgl_button_link:hover { color: rgba(255,255,255,1); }#iguru_button_628c6d57856d8 .wgl_button_link { border-color: rgba(102,75,196,1); background-color: transparent; }#iguru_button_628c6d57856d8 .wgl_button_link:hover { border-color: rgba(102,75,196,1); background-color: rgba(102,75,196,1); } 