YSU Big Data Milestone 2

Creator
Creator
Quinn MongQuinn Mong
Created
Created
2024 May 2 4:49
Editor
Editor
Quinn MongQuinn Mong
Edited
Edited
2024 May 7 5:42
Refs
Refs
for presentation:
observation from other groups, they talked about
Group 5:
  • Setting up EC2 instance
    • talk in detailed how they setup tpc on EC2
    • install dbgen on the directory in the system
    • generate dbgen with scale factor 100
 
  • Start their DB and create tables (MySQL, PostgreSQL, mongoDB, cassandra)
TLDR; Group 5 talks about setting up EC2, and loading the data to their database mostly.
 
Group 6:
  • talk about apache hadoop structure
    • HDFS and HBase
    • MapReduce
  • Apache Spark
    • Driver
    • Cluster
    • Executor
    • Worker node
    • DAG Scheduler
    • Task Scheduler
  • System Environment
    • Local Environment
    • They talked about single node only
  • Data Generation
    • Explained the structure of how data is generated
    • pdgf
    • schema definition, distribution setup (for config phase)
Group 6 talked about something that we already did in milestone 1? (Single node running and benchmarking)
  • Talk about Cloudera, (their database?)
  • They said they will use docker and aws ec2 (milestone 3)
       
       

      Recommendations