Step 1: When you create a new connection, for example as shown here Connect To hadoop , you will get to Preview Screen.
Step 2: On preview screen,on right panel, you will see a section of "Other Operations".
Step 3: You get to see various options available under this section :
- Remove Duplicate Rows - By selecting this tab under "Other Operations" section, removes all duplicate records from the dataset.