Types Of RDDs
Last updated
Last updated
https://mapr.com/blog/getting-started-spark-web-ui/
The first RDD, HadoopRDD, was created by calling sc.textFile(), the last RDD in the lineage is the ShuffledRDD created by reduceByKey.
The scheduler splits the RDD graph into stages, based on the transformations. The narrow transformations (transformations without data movement) will be grouped (pipe-lined) together into a single stage. This physical plan has two stages, with everything before ShuffledRDD in the first stage.