spark notes
  • Introduction
  • Databricks
  • Concepts
  • Spark Execution Flow
    • SparkContext and SparkSession
  • Resilient Distributed Dataset (RDD)
    • Caching
    • Pair RDDs
    • Transformations
      • Depedency Resolution
    • Actions
    • Persistence
    • RDD lineage
    • Types of RDDs
    • Loading Data into RDDs
    • Data Locality with RDDs
    • How Many Partitions Does An RDD Have
  • Spark job submission breakdown
  • Why Cluster Manager
  • SparkContext and its components
  • Spark Architecture
    • Stages
    • Tasks
    • Executors
    • RDD
    • DAG
    • Jobs
    • Partitions
  • Spark Deployment Modes
  • Running Modes
  • Spark Execution Flow
  • DataFrames, Datasets,RDDs
  • SparkSQL
    • Architecture
    • Spark Session
  • Where Does Map Reduce Does not Fit
  • Actions
    • reduceByKey
    • count
    • collect, take, top, and first Actions
    • take
    • top
    • first
    • The reduce and fold Actions
  • DataSets
  • Spark Application Garbage Collector
  • How Mapreduce works in spark
  • Notes
  • Scala
  • Spark 2.0
  • Types Of RDDs
    • MapPartitionsRDD
  • Spark UI
  • Optimization
    • Tungsten
  • Spark Streaming
    • Notes
    • Flow
  • FlatMap - Different Variations
  • Examples
  • Testing Spark
  • Passing functions to Spark
  • CONFIGURATION, MONITORING, AND TUNING
  • References
Powered by GitBook
On this page

Was this helpful?

Notes

PreviousHow Mapreduce works in sparkNextScala

Last updated 5 years ago

Was this helpful?

Any Spark application consists of a single Driver process and one or more Executor processes. The Driver process will run on the Master node of your cluster and the Executor processes run on the Worker nodes.

Transformations run on executors & actions runs on driver because it needs to return value.

From Spark In action

You can access an accumulator’s value only from within the driver. If you try to access it from an executor, an exception will be thrown.

How to understand the DAG shown in Spark UI ?

ReduceByKey to find averages - the solution uses tuples, understand how it is used ..

https://gist.github.com/ytjia/2e42e3ccc0367c39afa7
https://stackoverflow.com/questions/38687053/how-to-know-which-piece-of-code-runs-on-driver-or-executor/46427703#46427703