Classification :
Classification is identifying to which of a set of categories a new observation of the test dataset belongs, on the basis of a previously trained dataset. A trained dataset contains the class label of the parameter whose value has to be deduced in the test dataset. Like Clustering, the observations belonging to a certain category are similar to each other but the category labels are pre defined in the Training dataset.
Flow : Feed Train Data set -> Train -> Get Class label for Test Dataset
Following are the steps to do classification in Ideata Analytics -
Step 1: When you create a new connection, for example as shown here Connect To hadoop , you will get to Preview Screen.
Step 2: On preview screen, In right hand side panel click on Advance Analytics tab.
Step 3: Click on Classification from the list , it will open a pop up for Classification.
Step 4: Fill the required details as listed below and click "Classify"
- PREDICT: The attribute in the test dataset whose class label has to be predicted based on the specified class label of the same attributed in the training dataset.
-
MODEL TYPE :
- SVM - This kind of classification model generated over training dataset contains only two class labels i.e. the entire dataset is categorized in no more than two class labels.
- LOGISTIC REGRESSION - It is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.
- DECISION TREE - A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.
- NO OF ITERATIONS: The class labels are determined using a training model, which is formed over the training dataset using the specified number of iterations. Higher values of this parameter results in more precise classification of the dataset, but is also time expensive.
-
SELECT FEATURES: Specify the features based on which the class labels in the test dataset have to been determined. The observations belonging to a certain class label will have related values of these selected features.
- REMOVE FEATURES: Remove features disables all your selection of the features for classification.
Step 5: Once you click on classify, the system will apply the prediction algorithm on whole data considering the features selected, and the column to be predicted (which you selected to predict in predict box above) will be populated with the binary classification either as 0 or 1. You can also save the result of the classification algorithm by clicking on "Save". You will be then obtained with the classified column result just adjacent to the column on which algorithm applied.