Jobs

A Spark job is the highest element of Spark’s execution hierarchy.Each Spark job corresponds to one action, and each action is called by the driver program of a Spark application. As we discussed in “Functions on RDDs: Transformations Versus Actions”, one way to conceptualize an action is as something that brings data out of the RDD world of Spark into some other storage system (usually by bringing data to the driver or writing to some stable storage system).

The edges of the Spark execution graph are based on dependencies between the partitions in RDD transformations (as illustrated by Figures 2-2 and 2-3). Thus, an operation that returns something other than an RDD cannot have any children. In graph theory, we would say the action forms a “leaf” in the DAG. Thus, an arbitrarily large set of transformations may be associated with one execution graph. However, as soon as an action is called, Spark can no longer add to that graph.The application launches a job including those transformations that were needed to evaluate the final RDD that called the action.

Last updated