Just a simple Basic CheatSheet on Spark RDD’s
Intro
#Just a simple Basic CheatSheet on Spark RDD’s
- Basic Transformations on a RDD containing: {1,2,3,3}
| Function Name | Example | Result |
|---|
| map() | rdd.map(x => x +1) | {2,3,4,4} |
| flatmap() | rdd.flatMap(x => x.to(3)) | {1,2,3,2,3,3,3} |
| filter() | rdd.filter(x => x != 1 ) | {2,3,3} |
| distinct() | rdd.distinct() | {1,2,3} |
| sample() | rdd.sample(false,0.5) | Nondeterministic |
- Basic two-RDD transformations on RDDs: {1,2,3} and {3,4,5}
| Function Name | Example | Result |
|---|
| union() | rdd.union(other) | {1,2,3,3,4,5} |
| intersection() | rdd.intersection(other) | {3} |
| subtract() | rdd.subtract(other) | {1,2} |
| cartesian() | rdd.cartesian(other) | {(1,3),(1,4)…(3,5)} |
- Basic Actions on RDD containing: {1,2,3,3}
| Function Name | Example | Result |
|---|
| collect() | rdd.collect() | {1,2,3,3} |
| count() | rdd.count() | 4 |
| countByValue() | rdd.countByValue() | {(1,1),(2,1),(3,2)} |
| take() | rdd.take(2) | {1,2} |
| top() | rdd.top(2) | {3,3} |
| takeOrdered() | rdd.takeOrdered(2)(myOrdering) | {3,3} |
| takeSample() | rdd.takeSample(false,1) | Nondeterministic |
| reduce() | rdd.reduce((x,y) => x + y ) | 9 |
| fold() | rdd.fold(0)((x,y) => x + y) | 9 |
| foreach() | rdd.foreach(func) | Nothing |