YSU Big Data Milestone 2

Creator

Creator

Quinn Mong

Created

Created

2024 May 2 4:49

Editor

Editor

Quinn Mong

Edited

Edited

2024 May 7 5:42

Refs

Refs

for presentation:

observation from other groups, they talked about

Group 5:

Setting up EC2 instance

talk in detailed how they setup tpc on EC2
install dbgen on the directory in the system
generate dbgen with scale factor 100

Start their DB and create tables (MySQL, PostgreSQL, mongoDB, cassandra)

TLDR; Group 5 talks about setting up EC2, and loading the data to their database mostly.

Group 6:

talk about apache hadoop structure

HDFS and HBase
MapReduce

Apache Spark

Driver
Cluster
Executor
Worker node
DAG Scheduler
Task Scheduler

System Environment

Local Environment
They talked about single node only

Data Generation

Explained the structure of how data is generated
pdgf
schema definition, distribution setup (for config phase)

Group 6 talked about something that we already did in milestone 1? (Single node running and benchmarking)

Talk about Cloudera, (their database?)

They said they will use docker and aws ec2 (milestone 3)

Recommendations

////////