Powered by GitBook

An overview of data processing technologies and ecosystems that might be interesting for us.

Spark

Seems to be the current favorite. Everyone seems to recommend it over hadoop.
Has model for both streaming and batch (map-reduce)
Supports explorative queries. Spark SQL. Designed to support ML algorithms.
Supported on Amazon straight off the box (Elastic mapreduce)
Very strong community
No ruby. Scala or Java, and they don't seem to have a plan for JRuby
Has beautiful support for elasticsearch

Fluentd

Data collection framework made for logs.
can split data into several endpoints, one being hdfs
In memory aggregations?
http://docs.fluentd.org/articles/cep-norikra
Complex event processing with JRuby, including SQL queries of streams

Hadoop

No concept of streams
Old. Familiar. Mature.

Tutorials

https://www.youtube.com/watch?v=Txjp37mR7xw
Provides a tutorial on big data processing on google cloud with FluentD and Norikra. Go to 1hr 30min in it.

results matching ""

No results matching ""