An overview of data processing technologies and ecosystems that might be interesting for us.

Spark

  • Seems to be the current favorite. Everyone seems to recommend it over hadoop.
  • Has model for both streaming and batch (map-reduce)
  • Supports explorative queries. Spark SQL. Designed to support ML algorithms.
  • Supported on Amazon straight off the box (Elastic mapreduce)
  • Very strong community
  • No ruby. Scala or Java, and they don't seem to have a plan for JRuby
  • Has beautiful support for elasticsearch

Fluentd

  • Data collection framework made for logs.
  • can split data into several endpoints, one being hdfs
  • In memory aggregations?
  • http://docs.fluentd.org/articles/cep-norikra
  • Complex event processing with JRuby, including SQL queries of streams

Hadoop

  • No concept of streams
  • Old. Familiar. Mature.

Tutorials

results matching ""

    No results matching ""