Step 1: When you create a new connection, for example as shown here Connect To hadoop , you will get to Preview Screen.

 

Step 2: On preview screen,on right panel, you will see a section of "Other Operations".

 

 


 


Step 3You get to see various options available under this section :

  • Remove Duplicate Rows -  By selecting this tab under "Other Operations" section, removes all duplicate records from the dataset.