Course Summary: Data Science & Machine Learning on Big Data
Our Data Science and Machine Learning fully immersive cohort is an 8 week course.
Decision science is the brains behind big data analytics. It's not a new field; people have been
doing decision science for decades.
In the "old days" data was limited, hence complex algorithms were needed to extract useful insights
from the data. Given the shift in the data paradigm, we no longer need very complex algorithms.
Instead, we need to run simple stuff at scale.
At the end of the day, data science is all about counting smart. In this course, we will learn the
essential coding skills for Data Science, implement various industry standard algorithms on large
datasets and learn to communicate the results through strong visualizations. And, all the while,
we will keep it simple.
Decision Science is a mix of Computer Science, Statistics, and Management Skills. In this bootcamp,
we will focus on Computer Science (C) and Statistics (S).
Course Details: Fully Immersive Data Science & Machine Learning on Big Data
Week 1 : Develop Essential Skills
- S: Introduction to Machine Learning
- S: Unix for Data Science
- C: SQL For Data Science
- C: Essential Python For Data Science
- C: Excel for Data Science
- C: Git & R for Data Science
Week 2 : Warm-up
- C: Advanced Unix/Linux
- C: Advanced Python for Data Science
- C: Data visualization with Python
- S: Classical Statistics
- S: Linear Algebra and Vector Mathematics
Week 3 : Getting Started with Machine Learning
- C: Machine Learning Ecosystem
- S: Decision Trees : Classification
- S: Naive Bayes : Text Mining
- S: Random Forest : Classification & Regression
- C: Review Core Programming
Week 4 : Introduction to Big Data
- C: Introduction to Big Data, HDFS
- C: Build a 5 node physical cluster in lab environment
- C: Core Java Map Reduce
- C: Advanced Java & Advanced Map Reduce
- C: Hive, Sql interface to Hadoop
- C: Hadoop ETL
Week 5 : Classical Model
- S: Apriori Algorithm
- S: Gradient Boosting Machines (GBM)
- S: Generalized Linear Models: Linear Regression, Regularization, Logistical Regression
- S: Clustering: Knn, K-Means
- Review Hadoop and ML Algorithms
Week 6 : Spark & data processing in memory
- C: Introduction to Scala
- C: Introduction to Spark, RDD
- C: Spark QL
- C: Spark with Python API
- C: Kafka and Hadoop Streaming
Week 7 : Models Models Models
- C: Stable Marriage Algorithm : Stable matching
- S: Principal Component Analysis : Dimensionality reduction
- C: Data Fustion and Fuzzy Matching
- C: Recommendation Engine
- Review: Spark and Machine Learning Algorithms
Week 8 : Putting it all together
- C: Graph Analysis
- C: Story telling with D3 & Seaborn
- S: Support Vector Machines
- C: Machine Learning in production
- Wrapup : Future of Data Science
Data Science Fully Immersive Learning Objectives
- Use Unix to manipulate data and solve problems in serial and parallel.
- Master basic coding problems typically asked during data science interviews.
- Learn to apply machine learning techniques to solve data-driven problems. And learn the
reasons why.
- Learn to apply machine learning techniques on large datsets, using industrial strength
solutions
- Tools used : Python, SciKit, H2o.ai, Skytree.net, SQL, and lastly Linux
- Most Important: Learn to converse with data and keep it simple. Use common
sense.
Next Steps:
- Drop us a note, to schedule an interview, and see if this course is a good fit for
you.
- Enroll@bitbootcamp.com
Campus
Next Cohort
- April 3rd, 2017 - May 26th, 2017
Monday - Friday, 9 AM to 5 PM
Tuition
Financing
Financing Options available with:
Pave