Processing ......
FreeComputerBooks.com
Links to Free Computer, Mathematics, Technical Books all over the World
 
The Internals of Apache Spark
GIS Visualizer - You provide data, we visualize them on 40+ maps, 100% free!
  • Title The Internals of Apache Spark
  • Author(s) Jacek Laskowski
  • Publisher: japila-books
  • Paperback: N/A
  • eBook HTML
  • Language: English
  • ISBN-10: N/A
  • ISBN-13: N/A
  • Share This:  

Book Description

This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Updated to include Spark 3.x, this book shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms.

  • How Spark SQL’s new interfaces improve performance over SQL's RDD data structure
  • The choice between data joins in Core Spark and Spark SQL
  • Techniques for getting the most out of standard RDD transformations
  • How to work around performance issues in Spark's key/value pair paradigm
  • Writing high-performance Spark code without Scala or the JVM
  • How to test for functionality and performance when applying suggested improvements
  • Using Spark MLlib and Spark ML machine learning libraries
  • Spark's Streaming components and external community packages
About the Authors
  • Jacek Laskowski is an independent consultant who is passionate about software development and teaching people in effective use of Apache Spark, Scala, sbt, and Apache Kafka (with a bit of Hadoop YARN, Apache Mesos, and Docker).
Reviews, Ratings, and Recommendations: Related Book Categories: Read and Download Links: Similar Books:
Book Categories
Other Categories
Resources and Links