This repository provides a set of self-study tutorials on Machine Learning for big data using Apache Spark (PySpark) from basics (Dataframes and SQL) to advanced (Machine Learning Library (MLlib)) ...