07-06-2016 01:17 AM
By adding real-time capabilities to Hadoop, Apache Spark is opening the world of big data to possibilities previously unheard of. Spark and Hadoop will empower companies of all sizes across all industries to convert streaming big data and sensor information into immediately actionable insights, enabling use cases such as personalized recommendations, predictive pricing, proactive patient care, and more.
In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark.
Download our complimentary book excerpt to read about:
10-09-2017 01:08 AM
First of all, before starting with any big data analytics tool like Spark, Flinketc., you need to be familiar with the concept of Map, Reduce and Filter operations [there are a lot more but these are the basics]. I'm just going to describe briefly what they are using the following example [Source: Examples | Apache Spark]
First of all, you specify the input data path where from the data will read when the Job is executed. Next, the flatMap operation splits each line on a space and returns the results as a collection of words. This is analogous to mapping functions you must've encountered. [For example, Python's list map]. The only difference is that the flatMap function can return several elements instead of just one, which map does.
Next, every word arriving from the previous stage is assigned a weight of 1 initially using the map operation, and the reduceByKey function call simply groups the same words and takes their sum using the _+_ operator.
I think that clarifies the most fundamental operations you would need to start doing anything on your data.
val textFile = spark.textFile("foo.txt")
val counts = textFile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _)
counts.saveAsTextFile("result.txt")
val data = // load data in the libsvm format
val model = SVMWithSGD(data, number of iterations)
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide