Course Summary: Advanced Data Analytics with Spark
Our Advanced Data Analytics with Spark cohort is a 4 week evening course.
Apache Spark is a fast and general engine for large-scale data processing.
Spark was developed as an alternative to the traditional MapReduce processing paradigm. By using in
memory storage, Spark can achieve up to 100X the speed of Hadoop MapReduce and is 10X faster when
running on disk. Spark is preferred for iterative processing, which is being done by many machine
learning algorithms.
Sparks runs on top of Hadoop, as a standalone platform or in the cloud. It is easy to use, fast and has a
powerful stack of libraries including SQL and Dataframes. Our course will require that you have some experience
programming in python.
Course Details: Advanced Data Analytics with Spark
Week 1 : Spark Fundamentals
- C: Introduction to Spark
- C: Why Spark?
- C: Introduction to RDDs
- C: Data sharing
- C: Data Partitioning
Week 2 : Spark SQL
- C: Working with the Spark Shell
- C: What is Spark SQL?
- C: Spark SQL vs Spark Core
- C: DataFrames API
Week 3 : Spark Streaming
- C: DStreams
- C: Transformations: Stateless and Stateful Transformation
- C: Checkpointing and Output Operations
- C: Tuning and Debugging Spark
Learning Objectives: Advanced Data Analytics with Spark
- Become familiar with Spark fundamentals. Learn about the different components of Spark.
- Use Spark on a HDFS cluster. Gain experience working with RDDs.
- Learn how to tune and debug Spark.
- Tools used : Python, Spark
Next Steps:
- Drop us a note, to schedule an interview, and see if this course is a good fit for
you.
- Enroll@bitbootcamp.com
Campus
Next Cohort
- January 10th, 2017 - February 2nd, 2017
Tuesday and Thursday: 6:30 PM to 9:30 PM
Tuition
Financing
Financing Options available with:
Pave