rramos.github.io

09 Oct, 2017 - About 36 seconds

RDD Basic Transformations Operations

Intro

Just a simple Basic CheatSheet on Spark RDD’s

  • Basic Transformations on a RDD containing: {1,2,3,3}
Function Name Example Result
map() rdd.map(x => x +1) {2,3,4,4}
flatmap() rdd.flatMap(x => x.to(3)) {1,2,3,2,3,3,3}
filter() rdd.filter(x => x != 1 ) {2,3,3}
distinct() rdd.distinct() {1,2,3}
sample() rdd.sample(false,0.5) Nondeterministic
  • Basic two-RDD transformations on RDDs: {1,2,3} and {3,4,5}
Function Name Example Result
union() rdd.union(other) {1,2,3,3,4,5}
intersection() rdd.intersection(other) {3}
subtract() rdd.subtract(other) {1,2}
cartesian() rdd.cartesian(other) {(1,3),(1,4)…(3,5)}
  • Basic Actions on RDD containing: {1,2,3,3}
Function Name Example Result
collect() rdd.collect() {1,2,3,3}
count() rdd.count() 4
countByValue() rdd.countByValue() {(1,1),(2,1),(3,2)}
take() rdd.take(2) {1,2}
top() rdd.top(2) {3,3}
takeOrdered() rdd.takeOrdered(2)(myOrdering) {3,3}
takeSample() rdd.takeSample(false,1) Nondeterministic
reduce() rdd.reduce((x,y) => x + y ) 9
fold() rdd.fold(0)((x,y) => x + y) 9
foreach() rdd.foreach(func) Nothing
OLDER > < NEWER