Step 1: When you create a new connection, for example as shown here in Connect To hadoop article, you will get to Preview Screen.
Step 2: On preview screen, select the"remove duplicates" from others section in drop down list as shown
Step 3: It will apply the operation on row basis and will compare the whole row. If the row is having all columns same it will consider it as a duplicate.
Step 4: As and when you click on remove duplicates, the system will apply the opeartion and will delete the duplicates entries by deleting the rows.