This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.
Before running these programs, you need to install Apache Spark and PySpark on your system. You can follow the instructions on the official Apache Spark website to download and install the latest version of Apache Spark: https://spark.apache.org/downloads.html
Once you have installed Apache Spark, you can install PySpark using pip:
pip install pyspark
To run any of the programs in this repository, navigate to the program's directory and run the following command:
spark-submit program-name.py
Make sure to replace program-name with the name of the program you want to run.
Here is a list of all the programs in this repository:
- Total Spent By customer (sorted and SparkSQL version)
- Calculate Average Friends By Age
- Filtering RDD's and finding Minimum Temperature
- Movie Ratings Counter
- Word Count using FlatMap
- Calculating Min and Max Temperature using DataFrames
- Social Graph Analysis using Marvel Superheroes
- Calculating Average Friends By Age using SparkSQL
- Calculating Total Spent By Customer using DataFrames
- Word Count using SparkSQL
- Calculating Average Friends By Age using DataFrames
If you have any suggestions or ideas for new Apache Spark programs, feel free to open an issue or submit a pull request.
This repository is licensed under the MIT License. See the LICENSE file for more information.