A data engineering project steps

  1. define what data resources that is exciting. And think what service can you give user by your application.
  2. data ingestion:RabbitMQ and Kafka
  3. batch processing:Hadoop framework and used Hive and Yelp’s MrJob writing the aggregation functions.
  4. real-time processing:Storm and Spark.(the native Storm implementation does record-by-record processing, Spark does micro-batch processing.)
  5. database:hbase
  6. frond-end:flask