for presentation:
observation from other groups, they talked about
Group 5:
- Setting up EC2 instance
- talk in detailed how they setup tpc on EC2
- install dbgen on the directory in the system
- generate dbgen with scale factor 100
- Start their DB and create tables (MySQL, PostgreSQL, mongoDB, cassandra)
TLDR; Group 5 talks about setting up EC2, and loading the data to their database mostly.
Group 6:
- talk about apache hadoop structure
- HDFS and HBase
- MapReduce
- Apache Spark
- Driver
- Cluster
- Executor
- Worker node
- DAG Scheduler
- Task Scheduler
- System Environment
- Local Environment
- They talked about single node only
- Data Generation
- Explained the structure of how data is generated
- pdgf
- schema definition, distribution setup (for config phase)
Group 6 talked about something that we already did in milestone 1? (Single node running and benchmarking)
- Talk about Cloudera, (their database?)
- They said they will use docker and aws ec2 (milestone 3)