Step 1: When you create a new connection, for example as shown here Connect To hadoop , you will get to Preview Screen.
Step 2: On preview screen,on right panel, you will see a section of "Other Operations".
Step 3: You get to see various options available under this section :
- Delete Multiple Columns - When you click on this tab, a pop up appears containing all the columns in that dataset, with a check box associated with each column at the left, you can conveniently select the column you wish to delete from that list. If you wish to select all the columns, then you can tick the check box just next to "Column Name". It will select all columns on checking and deselects all the columns on uncheck. And then you can click on "Delete Columns" to delete the columns according to your selection.
- Advanced Formula - When you click on this tab under the "Other Operations" section on the right panel, a pop up appears containing all the advanced formulas like : subtraction, multiplication, addition, trigonometric functions like : cos, tan,exponents, floor,square root and many more mathematical operations. You need to supply the parameters(column names) on which the operation has to be performed. After clicking on validate, your selection of operation over the columns is actually validated, in other words it checks whether the columns under selection are eligible for the operation or not. Once validated, Click on "Apply", to obtain the operational result. The result appears as a new column named - " formula " on the rightmost side of the dataset.
- Union data with another table - By selecting on "Union Data with another table" from the "Other operations" section, unions selected dataset with the existing dataset. You get a popup on selection over this tab, where you need to select the dataset with which you want to perform the union operation. And then finally you click on "Apply" to see the resulting operation.
- Remove Duplicate Rows - By selecting this tab under "Other Operations" section, removes all duplicate records from the dataset.
- Aggregate -