Skip to content
This repository has been archived by the owner on Apr 27, 2022. It is now read-only.
/ ATDS2022 Public archive

A repository for ATDS course semester project at ECE, NTUA, 2022

Notifications You must be signed in to change notification settings

kitsorfan/ATDS2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Topics in Database Systems

Exercises in Python/SQL, semester project for Advanced Topics in Database Systems course at ECE⚡, NTUA🎓, academic year 2021-2022

Python Spark SQL Hadoop Ubuntu Server

Byte Code Size # Lines of Code Last commit

📋Description

The dataset used for this project is Full MovieLens Dataset .

The project consists of two main parts:

  1. Implement and test 5 requested queries using RDD API and Spark SQL
  2. Do performance analysis for Reduce-Side join, Map-Side join implementations

Details:

  • We used 3 VMs for our cluster ( 1 NameNode , 2 DataNodes )
  • Dataset formats used: csv, dataframe, parquet

Project Goals

  • get familiar with Spark API
  • evaluate performance for a list of queries
  • compare different join algorithms in Spark Map-Reduce

Project's assignment and report are written in greek.

👔Team Members

Name - GitHub Email
Stylianos Kandylakis gmail
Kitsos Orfanopoulos protonmail
Christos Tsoufis gmail

🖥Specifications of VM

OS CPUs RAM Disk space
Ubuntu 16.04 LTS (Xenial) 2 2GB 30GB

🔗Sources

About

A repository for ATDS course semester project at ECE, NTUA, 2022

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages