Clustering :


Clustering is a data mining (machine learning) technique used to place data elements into related groups without advance knowledge of the group definitions. The grouping of the dataset is in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).



Following are the steps to do clustering in Ideata Analytics -


Step 1: When you create a new connection, for example  as shown here Connect To hadoop , you will get to Preview Screen.

 

Step 2: On preview screen, on right hand side panel click on Advance Analytics tab.

 

 


Step 3: Click on Classification from the list , it will open a pop up for Clustering.






Step 4: Fill the required details as listed below and click "Apply"

 

  • SELECT COLUMNS : Select the attribute/attributes in the dataset based on which the data is to be divided into clusters.(Numeric attributes only)
  • NO OF CLUSTERS : Provide the number of clusters to be formed.
  • NO OF ITERATIONS: The data is clustered using a training model, which is formed using the specified number of iterations. Higher values of this parameter results in more precise clustering of the dataset, but is also time expensive.

Step 5: Once you click on apply, the system will apply the clustering algorithm on whole data considering the columns selected, and the divide the whole data in to number of clusters as mentioned. You can rerun it until you are satisfied.



Step 6: The data cluster will be shown by a clustered chart in the same window , you can hover over it to see further details.



Step 7: Once you click on Save, a new column will be added in to data having the corresponding cluster number against each row.