Spark Deployment Modes

Plz see Ch 10 and ch11 from spark in action book for more detailed discussion

Spark Deployment Modes Spark supports four cluster deployment modes, each with its own characteristics with respect to where Spark’s components run within a Spark cluster. Of all modes, the local mode, running on a single host, is by far the simplest.

As a beginner or intermediate developer you don’t need to know this elaborate matrix right away. It’s here for your reference, and the links provide additional information. Furthermore, Step 5 is a deep dive into all aspects of Spark Architecture.

Standalone Mode

a simple cluster manager included with Spark that makes it easy to set up a cluster.
In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided launch scripts . It is also possible to run these daemons on a single machine for testing.

Starting a Cluster Manually

You can start a standalone master server by executing:

./sbin/start-master.sh

Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default.

Similarly, you can start one or more workers and connect them to the master via:

./sbin/start-slave.sh <master-spark-URL>

Once you have started a worker, look at the master’s web UI (http://localhost:8080 by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).

Finally, the following configuration options can be passed to the master and worker:

From http://www.kdnuggets.com/2016/09/7-steps-mastering-apache-spark.html

PreviousPartitions NextRunning Modes

Last updated 5 years ago

Was this helpful?