↓Skip to main content

RDD Basic Transformations Operations

9 October 2017·127 words·1 min·

Spark CheatSheet Utils

Table of Contents

Table of Contents

Just a simple Basic CheatSheet on Spark RDD’s

Intro
#

Just a simple Basic CheatSheet on Spark RDD’s

Basic Transformations on a RDD containing: {1,2,3,3}

Function Name	Example	Result
map()	rdd.map(x => x +1)	{2,3,4,4}
flatmap()	rdd.flatMap(x => x.to(3))	{1,2,3,2,3,3,3}
filter()	rdd.filter(x => x != 1 )	{2,3,3}
distinct()	rdd.distinct()	{1,2,3}
sample()	rdd.sample(false,0.5)	Nondeterministic

Basic two-RDD transformations on RDDs: {1,2,3} and {3,4,5}

Function Name	Example	Result
union()	rdd.union(other)	{1,2,3,3,4,5}
intersection()	rdd.intersection(other)	{3}
subtract()	rdd.subtract(other)	{1,2}
cartesian()	rdd.cartesian(other)	{(1,3),(1,4)…(3,5)}

Basic Actions on RDD containing: {1,2,3,3}

Function Name	Example	Result
collect()	rdd.collect()	{1,2,3,3}
count()	rdd.count()	4
countByValue()	rdd.countByValue()	{(1,1),(2,1),(3,2)}
take()	rdd.take(2)	{1,2}
top()	rdd.top(2)	{3,3}
takeOrdered()	rdd.takeOrdered(2)(myOrdering)	{3,3}
takeSample()	rdd.takeSample(false,1)	Nondeterministic
reduce()	rdd.reduce((x,y) => x + y )	9
fold()	rdd.fold(0)((x,y) => x + y)	9
foreach()	rdd.foreach(func)	Nothing