take

The take action returns the first n elements of an RDD. The elements taken are not in any particular order; in fact, the elements returned from a take action are non-deterministic, meaning they can differ if the same action is run again (particularly in a fully distributed environment).

There is a similar Spark function, takeOrdered, which takes the first n elements ordered based upon a key supplied by a key function.

For RDDs that span more than one partition, take scans one partition and uses the results from that partition to estimate the number of additional partitions needed to satisfy the number requested.

RDD.take(n)

Last updated