knowledgegigs: Big Data Project with Microservices (and Task) Description

28 June 2016

This description was collected and forwarded by Artech, a staffing firm.

Build the Event Hubs integration with Service Fabric microservices implementation. Streaming the processed files from blobs into EH for downstream processing.
1. Anonymized files (~1000 of them and to a size of ~GB) will be given as input
2. Service Fabric code portion will be provided.
Build the Spark processing reading off EventHubs, implementation in either Python or Scala would suffice.
1. Look at the caching needs; leverage .cache to retain appropriate results from Spark ‘Actions’ in Spark executors
Our team will evaluate a set of data store that would be a landing spot post Spark – Blobs being a required one. We will pick 1 or 2 from this list -- SQL DW, Azure SQL DB, Cassandra and DocumentDB being other candidate stores and we will have code snippets and/or guidance

Integration & Deployment

Integrate the items from above with completed items (Azure Data Factory with ARM provisioning, picking up from the ADF pipeline which lands files onto blobs)
Apply best practices for capacity planning, deployment for E2E
Integrate the deployment with existing set of tools and processes.

Testing

Build a unit test framework that can test each building block in isolation (ADF à Blobs, Blobs à Service Fabric, Service Fabric à EH, EH à Spark, Spark à <Data Store>
Build an E2E test environment with telemetry on latency, throughout with percentiles. *Leverage APM tools as appropriate

knowledgegigs