Hadoop Pig Operations
The various Pig operations are as follows:
- Sort: Sorts the data in alphabetical order. The sort operation is described in detail in Sorting Input Records.
- Filter: Allows you to filter the data according to your requirements. The filter operation is described in more detail in Filtering Input Records.
- Aggregate: Allows you to perform statistical operations such as Sum, Count and others, on the
data.
Select the aggregate operations for each field as desired.
- Sum: Calculates the sum of the values in the field.
- Average: Calculates the average from all the values in the field.
- Max: Calculates the maximum value from the values in the field.
- Min: Calculates the minimum value from the values in the field.
- Count: Calculates the total number of values in the field. Note: If you select the Distinct operation, only the values that are unique are counted.
- Distinct: Selecting this option, causes the Aggregate Count operation to count only unique values in the field.
- Limit: Enter a value greater than zero, to limit the number of records processed, to this value.