Types Of RDDs
Last updated
Was this helpful?
Last updated
Was this helpful?
The first RDD, HadoopRDD, was created by calling sc.textFile(), the last RDD in the lineage is the ShuffledRDD created by reduceByKey.
The scheduler splits the RDD graph into stages, based on the transformations. The narrow transformations (transformations without data movement) will be grouped (pipe-lined) together into a single stage. This physical plan has two stages, with everything before ShuffledRDD in the first stage.