spark notes
  • Introduction
  • Databricks
  • Concepts
  • Spark Execution Flow
    • SparkContext and SparkSession
  • Resilient Distributed Dataset (RDD)
    • Caching
    • Pair RDDs
    • Transformations
      • Depedency Resolution
    • Actions
    • Persistence
    • RDD lineage
    • Types of RDDs
    • Loading Data into RDDs
    • Data Locality with RDDs
    • How Many Partitions Does An RDD Have
  • Spark job submission breakdown
  • Why Cluster Manager
  • SparkContext and its components
  • Spark Architecture
    • Stages
    • Tasks
    • Executors
    • RDD
    • DAG
    • Jobs
    • Partitions
  • Spark Deployment Modes
  • Running Modes
  • Spark Execution Flow
  • DataFrames, Datasets,RDDs
  • SparkSQL
    • Architecture
    • Spark Session
  • Where Does Map Reduce Does not Fit
  • Actions
    • reduceByKey
    • count
    • collect, take, top, and first Actions
    • take
    • top
    • first
    • The reduce and fold Actions
  • DataSets
  • Spark Application Garbage Collector
  • How Mapreduce works in spark
  • Notes
  • Scala
  • Spark 2.0
  • Types Of RDDs
    • MapPartitionsRDD
  • Spark UI
  • Optimization
    • Tungsten
  • Spark Streaming
    • Notes
    • Flow
  • FlatMap - Different Variations
  • Examples
  • Testing Spark
  • Passing functions to Spark
  • CONFIGURATION, MONITORING, AND TUNING
  • References
Powered by GitBook
On this page

Was this helpful?

  1. Actions

collect, take, top, and first Actions

Whereas the count action returns a count of data elements in the RDD, the collect, take, top, and first actions return data from the RDD.

collect()

Syntax:

RDD.collect()

The collect action returns a list that contains all of the elements in an RDD to the Spark driver. Because collect does not restrict the output, which can be quite large and can potentially cause out-of-memory errors on the driver, it is typically useful only for small RDDs or development.

PreviouscountNexttake

Last updated 5 years ago

Was this helpful?