# Spark Execution Flow

HighPerformance Spark 1st Edition

**what happens when we start a** `SparkContext ?`

.

First, the driver program pings the cluster manager.

The cluster manager launches a number of Spark executors (JVMs shown as black boxes in the diagram) on the worker nodes of the cluster (shown as blue circles).

One node can have multiple Spark executors, but an executor cannot span multiple nodes.

An RDD will be evaluated across the executors in partitions (shown as red rectangles).

Each executor can have multiple partitions, but a partition cannot be spread across multiple executors.

![](/files/-LsbZ_RRzVJc0owyLn-C)In the Spark lazy evaluation paradigm, a Spark application doesn’t “do anything” until the driver program calls an action.With each action, the Spark scheduler builds an execution graph and launches a*Spark job*.Each job consists of*stages*, which are steps in the transformation of the data needed to materialize the final RDD. Each stage consists of a collection of\_tasks\_that represent each parallel computation and are performed on the executors.

[Figure 2-5](file:///G:/ScrapBook%20Data/Hadoop/data/20170703002220/index.html#spark_app_treefig) shows a tree of the different components of a Spark application and how these correspond to the API calls. An application corresponds to starting a `SparkContext`/`SparkSession`. Each *application* may contain many jobs that correspond to one RDD action. Each *job* may contain several stages that correspond to each wide transformation. Each *stage* is composed of one or many tasks that correspond to a parallelizable unit of computation done in each stage. There is one *task* for each partition in the resulting RDD of that stage.

![](/files/-LsbZ_RTWM1XSaeauF8y)

![](/files/-LsbZ_RV5pxuYR2Wtn3y)

<https://mapr.com/blog/getting-started-spark-web-ui/>

![](/files/-LsbZ_RXXzXiGgHCGuGS)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://nag-9-s.gitbook.io/spark-notes/spark-execution-flow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
