Notes

https://stackoverflow.com/questions/38687053/how-to-know-which-piece-of-code-runs-on-driver-or-executor/46427703#46427703

Any Spark application consists of a single Driver process and one or more Executor processes. The Driver process will run on the Master node of your cluster and the Executor processes run on the Worker nodes.

Transformations run on executors & actions runs on driver because it needs to return value.

From Spark In action

You can access an accumulator’s value only from within the driver. If you try to access it from an executor, an exception will be thrown.

How to understand the DAG shown in Spark UI ?

ReduceByKey to find averages - the solution uses tuples, understand how it is used ..

https://gist.github.com/ytjia/2e42e3ccc0367c39afa7

Last updated

Was this helpful?