SparkContext and SparkSession

SparkContext

SparkContext is the first object that a Spark program must create to access the cluster. In spark-shell, it is directly accessible via spark.sparkContext.

Here's how you can programmatically create SparkContext in your Scala code:

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

val conf = new SparkConf().setAppName("my app").setMaster("master url")

new SparkContext(conf)

SparkSession - Starting from Spark 2.x

SparkContext, though still supported, was more relevant in the case of RDD . As you will see, different libraries have different wrappers around SparkContext, for example, HiveContext/SQLContext for Spark SQL, StreamingContext for Streaming, and

so on.

As all the libraries are moving toward DataSet/DataFrame, it makes sense to have a unified entry point for all these libraries as well, and that is SparkSession. SparkSession is available as spark in the spark-shell.

Here's how you do it:

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

val sparkSession = SparkSession.builder.master("master url").appName("my app").getOrCreate()

Last updated