From: A comparative study of big data use in Egyptian agriculture
Features | Hadoop | Spark |
---|---|---|
Processing type | Batch | Hybrid |
Computing cluster architecture | YARN | YARN and Mesos |
Data Flow | MapReduce data flow | A queue of RDDs called DStream processed one at-a-time using microbatching cluster |
Data Processing Model | MapReduce | exactly-once |
Fault Tolerance | Yes | Yes (using lineage) |
Latency | low | High |
Scalability | Yes | Yes (user demand) |
Back-pressure Mechanism | No | Yes |
Programming Languages | Java mostly | API for Scala, Java, Python, and R |
Support for Machine Learning | Yes | Yes (Spark MLlib) |