Big Data distributed Parallel Computing Software Frameworkv
- The core of MapReduce is two functions
- Big data processing should be kept as simple as possible
- Commutative and associative properties must hold
- MapReduce is a method to distribute tasks across multiple nodes
- For large tasks, repeatedly fork to divide, and join to merge when small
Limitations of MapReduce
- Memory cannot be wasted to hold the metadata of a large number of smaller data sets.
- The reduce phase cannot start until the map task is complete
- starting a new map task before the completion of the reduce task in the previous application is not possible in standard MapReduce
MapReduce Notion

Seonglae Cho