The reduce and fold Actions

The reduce and fold actions are aggregate actions, each of which executes a commutative and associative operation (such as summing a list of values, for example) against an RDD.

Commutative and associative are the operative terms here. This makes the operations independent of the order in which they run, and this is integral to distributed processing because the order cannot be guaranteed.

The commutative and associative characteristics are summarized here in general form:

commutative ⇒ x + y = y + x associative ⇒ (x + y) + z = x + (y + z)

Take a look at the main Spark actions that perform aggregations.

reduce()

Syntax:

RDD.reduce(<function>)

The reduce action reduces the elements of an RDD using a specified commutative and associative operator. The function supplied specifies two inputs (lambda x, y: ...) that represent values in a sequence from the specified RDD.

numbers = sc.parallelize([1,2,3,4,5,6,7,8,9])

numbers.reduce(lambda x, y: x + y)

# 45

reduceByKey is a transformation that returns an RDD, whereas reduce is an action that returns a value.

Last updated