Step 1: When you create a new connection, for example as shown here in Connect To hadoop article, you will get to Preview Screen.

 

Step 2: On preview screen, select the"remove duplicates" from others section in drop down list as shown

 

Step 3: It will apply the operation on row basis and will compare the whole row. If the row is having all columns same it will consider it as a duplicate.

 

 


 

 Step 4:  As and when you click on remove duplicates, the system will apply the opeartion and will delete the duplicates entries by deleting the rows.