spark notes
  • Introduction
  • Databricks
  • Concepts
  • Spark Execution Flow
    • SparkContext and SparkSession
  • Resilient Distributed Dataset (RDD)
    • Caching
    • Pair RDDs
    • Transformations
      • Depedency Resolution
    • Actions
    • Persistence
    • RDD lineage
    • Types of RDDs
    • Loading Data into RDDs
    • Data Locality with RDDs
    • How Many Partitions Does An RDD Have
  • Spark job submission breakdown
  • Why Cluster Manager
  • SparkContext and its components
  • Spark Architecture
    • Stages
    • Tasks
    • Executors
    • RDD
    • DAG
    • Jobs
    • Partitions
  • Spark Deployment Modes
  • Running Modes
  • Spark Execution Flow
  • DataFrames, Datasets,RDDs
  • SparkSQL
    • Architecture
    • Spark Session
  • Where Does Map Reduce Does not Fit
  • Actions
    • reduceByKey
    • count
    • collect, take, top, and first Actions
    • take
    • top
    • first
    • The reduce and fold Actions
  • DataSets
  • Spark Application Garbage Collector
  • How Mapreduce works in spark
  • Notes
  • Scala
  • Spark 2.0
  • Types Of RDDs
    • MapPartitionsRDD
  • Spark UI
  • Optimization
    • Tungsten
  • Spark Streaming
    • Notes
    • Flow
  • FlatMap - Different Variations
  • Examples
  • Testing Spark
  • Passing functions to Spark
  • CONFIGURATION, MONITORING, AND TUNING
  • References
Powered by GitBook
On this page
  • Standalone Mode
  • Starting a Cluster Manually

Was this helpful?

Spark Deployment Modes

PreviousPartitionsNextRunning Modes

Last updated 5 years ago

Was this helpful?

Plz see Ch 10 and ch11 from spark in action book for more detailed discussion

Spark Deployment Modes Spark supports four , each with its own characteristics with respect to where Spark’s components run within a Spark cluster. Of all modes, the local mode, running on a single host, is by far the simplest.

As a beginner or intermediate developer you don’t need to know this elaborate matrix right away. It’s here for your reference, and the links provide additional information. Furthermore, Step 5 is a deep dive into all aspects of Spark Architecture.

Standalone Mode

  • a simple cluster manager included with Spark that makes it easy to set up a cluster.

  • In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided . It is also possible to run these daemons on a single machine for testing.

Starting a Cluster Manually

You can start a standalone master server by executing:

./sbin/start-master.sh

Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is by default.

Similarly, you can start one or more workers and connect them to the master via:

./sbin/start-slave.sh <master-spark-URL>

Once you have started a worker, look at the master’s web UI ( by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).

Finally, the following configuration options can be passed to the master and worker:

From

cluster deployment modes
launch scripts
http://localhost:8080
http://localhost:8080
http://www.kdnuggets.com/2016/09/7-steps-mastering-apache-spark.html