spark notes
  • Introduction
  • Databricks
  • Concepts
  • Spark Execution Flow
    • SparkContext and SparkSession
  • Resilient Distributed Dataset (RDD)
    • Caching
    • Pair RDDs
    • Transformations
      • Depedency Resolution
    • Actions
    • Persistence
    • RDD lineage
    • Types of RDDs
    • Loading Data into RDDs
    • Data Locality with RDDs
    • How Many Partitions Does An RDD Have
  • Spark job submission breakdown
  • Why Cluster Manager
  • SparkContext and its components
  • Spark Architecture
    • Stages
    • Tasks
    • Executors
    • RDD
    • DAG
    • Jobs
    • Partitions
  • Spark Deployment Modes
  • Running Modes
  • Spark Execution Flow
  • DataFrames, Datasets,RDDs
  • SparkSQL
    • Architecture
    • Spark Session
  • Where Does Map Reduce Does not Fit
  • Actions
    • reduceByKey
    • count
    • collect, take, top, and first Actions
    • take
    • top
    • first
    • The reduce and fold Actions
  • DataSets
  • Spark Application Garbage Collector
  • How Mapreduce works in spark
  • Notes
  • Scala
  • Spark 2.0
  • Types Of RDDs
    • MapPartitionsRDD
  • Spark UI
  • Optimization
    • Tungsten
  • Spark Streaming
    • Notes
    • Flow
  • FlatMap - Different Variations
  • Examples
  • Testing Spark
  • Passing functions to Spark
  • CONFIGURATION, MONITORING, AND TUNING
  • References
Powered by GitBook
On this page

Was this helpful?

FlatMap - Different Variations

val wordCount = rawDstream

.flatMap(line => line.split(" "))

.map(word => (word,1))

.reduceByKey(_+_)

.map(item => item.swap)

.transform(rdd => rdd.sortByKey(false))

.foreachRDD( rdd =>

{ rdd.take(10).foreach(x=>println("List : " + x)) }

)

val wordsFlatMap = words.flatMap(_.split("\W+"))

val evens = x.filter(i => i % 2 == 0)

can be replaced with

val evens = x.filter(_ % 2 == 0)

Scala lets you use the _ wildcard instead of a variable name when the parameter appears only once in your function,

PreviousFlowNextExamples

Last updated 5 years ago

Was this helpful?