Back by even MORE popular demand, we will be running our 6 part, 100-level series once again! Thanks to our friends at Mozilla for hosting.

Learn to use open-source Apache Spark and Python software to process large datasets with the ultimate goal of preparing predictive models from real-world data. In this lesson, we’ll introduce you to essential concepts and practices for using Apache Spark.

Before arriving, please follow the instructions in this document to set up your environment: https://docs.google.com/document/d/1ayIjeM0SUA3BuF2CTOhcsyVgL9MP13NVHkPlrw7_MSQ/edit

If you are having trouble with the above, please show up at 6:30pm. We will be available to help attendees in the half-hour before the lesson starts.

After attending Lesson 1, you’ll have learned:

• How to set up Spark with Zeppelin to pull data

• Basic operators: maps, joins, filters, etc.

• Spark as a tool for data exploration: notebooks and workflow

• Dataframes and SparkSQL

• Data visualization for dashboarding and analysis from Spark pipelines

Please bring your laptop and be on time. Food and drinks will be served.

Lesson 1/6: Apache Spark, Pyspark and Apache Zeppelin workshop

Post navigation