Framework for data acquistion, ingestion COE Setup Domain Consulting / Use Case Identification

Self Service Data

Framework for Data Discovery and Self Service. Analytics Enablement

Model Development

Pre Built Models in Customer and Marketing Domain.


Advanced Visualizations, Pre Built Dashboards for IT Operations.

Data Ingestion Framework

While implementing any kind of data platform, the very first task revolves around extracting data from a variety of sources. Starting point for enterprises scaling up their Big Data platform is to create large data lakes that house its internal and external data.

One of the key facets of delivering value through a data lake is ensuring you ingest data efficiently and build a sustainable automated process that can ingest all types of data sources – EDW, RDBMS, Flat Files, REST API’s, External data feeds.

To ensure that business analysts, data analysts, and data scientists can derive maximum benefit from data wrangling, it is critical that they start with a solid foundation of consistent data. Yet ingestion is often treated as an afterthought, and the complexity of moving data from source to store is often greatly underestimated. It becomes a time-consuming effort, as much as 80% of the efforts goes towards ingesting the data before data science, analytics efforts begin. Our foundational services are aimed at scaling Big Data platforms to be enterprise-ready with robust foundational capabilities that will accelerate business value realization. We are seeing 80% of the time spent on ingesting data from disparate data sources into a Big Data environment and only 20% spent on advanced analytics.

re:code approach shall reverse time allocation.Our data ingestion framework has re usable templates for varied file formats that can support dynamic and new data sources and templates for data transformation and aggregation. This helps in faster onboarding of use cases.

Not recognizing these challenges, the approach often taken to data lake ingestion is to assign a data engineer or two to code up some pipelines using Sqoop, Flume or Kafka. And that’s fine until the sources start proliferating and data starts changing. These changes combined with an explosion of sources means that data engineers spend all of their time patching their low level code.

The data ingestion layer in the data lake must be highly available and flexible enough to process data from any current and future data sources of any patterns (structured or un-structured) and any frequency (batch or incremental, including real-time) without compromising performance.This calls for building flexible data ingestion frameworks to ensure overall quality and success of ingestion process