Hadoop Sort / Merge / By-Group Processing
Workarounds
Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:
1) Hive or Pig for Sort, merge, and by-group processing.
2) rmr2 or plyrmr for by-group processing in R.
Workarounds
Even if you pre-sort in Hadoop and then import to RRE there is no guarantee that the splits will contain whole by-groups or be processed in the correct order. Hence the options narrow to CSV input and:
1) Hive or Pig for Sort, merge, and by-group processing.
2) rmr2 or plyrmr for by-group processing in R.