SparkContext and SparkSession
SparkContext
SparkContext is the first object that a Spark program must create to access the cluster. In spark-shell, it is directly accessible via spark.sparkContext.
Here's how you can programmatically create SparkContext in your Scala code:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
val conf = new SparkConf().setAppName("my app").setMaster("master url")
new SparkContext(conf)
SparkSession - Starting from Spark 2.x
SparkContext, though still supported, was more relevant in the case of RDD . As you will see, different libraries have different wrappers around SparkContext, for example, HiveContext/SQLContext for Spark SQL, StreamingContext for Streaming, and
so on.
As all the libraries are moving toward DataSet/DataFrame, it makes sense to have a unified entry point for all these libraries as well, and that is SparkSession. SparkSession is available as spark in the spark-shell.
Here's how you do it:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
val sparkSession = SparkSession.builder.master("master url").appName("my app").getOrCreate()
Last updated
Was this helpful?