Types Of RDDs

https://mapr.com/blog/getting-started-spark-web-ui/

The first RDD, HadoopRDD, was created by calling sc.textFile(), the last RDD in the lineage is the ShuffledRDD created by reduceByKey.

The scheduler splits the RDD graph into stages, based on the transformations. The narrow transformations (transformations without data movement) will be grouped (pipe-lined) together into a single stage. This physical plan has two stages, with everything before ShuffledRDD in the first stage.

PreviousSpark 2.0 NextMapPartitionsRDD

Last updated 5 years ago

Was this helpful?