spark notes
  • Introduction
  • Databricks
  • Concepts
  • Spark Execution Flow
    • SparkContext and SparkSession
  • Resilient Distributed Dataset (RDD)
    • Caching
    • Pair RDDs
    • Transformations
      • Depedency Resolution
    • Actions
    • Persistence
    • RDD lineage
    • Types of RDDs
    • Loading Data into RDDs
    • Data Locality with RDDs
    • How Many Partitions Does An RDD Have
  • Spark job submission breakdown
  • Why Cluster Manager
  • SparkContext and its components
  • Spark Architecture
    • Stages
    • Tasks
    • Executors
    • RDD
    • DAG
    • Jobs
    • Partitions
  • Spark Deployment Modes
  • Running Modes
  • Spark Execution Flow
  • DataFrames, Datasets,RDDs
  • SparkSQL
    • Architecture
    • Spark Session
  • Where Does Map Reduce Does not Fit
  • Actions
    • reduceByKey
    • count
    • collect, take, top, and first Actions
    • take
    • top
    • first
    • The reduce and fold Actions
  • DataSets
  • Spark Application Garbage Collector
  • How Mapreduce works in spark
  • Notes
  • Scala
  • Spark 2.0
  • Types Of RDDs
    • MapPartitionsRDD
  • Spark UI
  • Optimization
    • Tungsten
  • Spark Streaming
    • Notes
    • Flow
  • FlatMap - Different Variations
  • Examples
  • Testing Spark
  • Passing functions to Spark
  • CONFIGURATION, MONITORING, AND TUNING
  • References
Powered by GitBook
On this page

Was this helpful?

  1. SparkSQL

Spark Session

Much as the SparkContext is the entry point for all Spark applications, and the StreamingContext is for all streaming applications, the SparkSession serves as the entry point for Spark SQL.

If you are using the Spark Shell you will automatically get a SparkSession called spark to accompany the SparkContext called sc.

It’s preconfigured in a Spark shell and available as the variable spark. In your own programs, you should construct it yourself, just as you did in previous chapters.

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().getOrElse()

SparkSession is a wrapper around SparkContext and SQLContext, which was directly used for constructing DataFrames in the versions prior to Spark 2.0. The Builder object lets you specify master, appName, and other configuration options, but the defaults will do.

PreviousArchitectureNextWhere Does Map Reduce Does not Fit

Last updated 5 years ago

Was this helpful?