suppose there are ten partitions in a RDD. When you groupby this RDD, you get a new RDD,. Is the data of the same field in the same partition?
my test results show that data from the same grouping field is divided into the same partition, and data from other fields can exist in the same partition.
extension problem:
(1) if the data of the same field is in the same partition, then the groupByRDD.mapValues obtained after groupBy gets all the values data corresponding to this field. When the amount of data is large,
groupByRdd.mapValues (_ .tolist (). Sortby)) will cause memory overflow.
is this understood correctly