Autogenic Data Platform

Introduction

Businesses have recognized the need to be data-obsessed to deliver unmatched customer experiences and product innovations at the velocity required by the modern marketplace.

The goal is to deliver transformational change and lasting capabilities to power the data innovation flywheel for generations. At the heart of this new data ecosystem is a masterfully crafted data platform with best-in-class experiences that unburden value creators while eliminating long-standing barriers that have hamstrung data organizations.

Leveraging innovative architecture, the platform needs to create easy-to-use experiences for a diverse data community that includes technical and non-technical users. Lowered skill barriers are required to instantly broaden the value creator tent bringing in a new pool of talent into the innovation ecosystem. New self-service capabilities are also needed to allow talent to focus on high-value work resulting in rapid productivity gains while automation significantly reduces time to value. Value creators will then have direct and simplified access to locate, understand, move, and refine data enabling value creation at a velocity never experienced before.

The foundational building blocks of a data-driven organization that can deliver products at an ever-increasing velocity include:

  1. Data – a wide variety of internal and external data
  2. Skills – a data community with diverse skills and experiences
  3. Data Platform – a hyper-automated, self-sustaining and integrated ecosystem
  4. Agile processes – that promote the rapid delivery of business value

This article dives into a new way of approaching the creation of the data platform that services modern enterprise data needs.


Data Platform Fundamentals

The foundational idea is – the data platform is a set of capabilities and services, the product is data, direct users are the data journey teams, and features are what the journey teams require to exploit the data and create business value.

Legacy approaches to solving the platform problem include warehouses, lakes, lake houses and more recently meshes. While these approaches have made incremental improvements, significant challenges related to velocity, complexity, usability continue to bedevil enterprises.

This is where the Autogenic Data Platform comes in


So, what exactly is the Autogenic Data Platform (ADP)?

No alt text provided for this image

The ADP is a context-oriented, self-creating and self-managing, data ecosystem. This is the much-needed innovation that solves the significant challenges associated with legacy architectures supporting multi-business, multi-source, multi-format data for rapid data exploitation by disparate data journey teams.


Who are the users of the ADP?

No alt text provided for this image

Data athletes can be technologists like business analysts, data scientists, data engineers, data stewards and data analysts or non-technical users like business managers and executives.


What does a user journey look like in the ADP?

In an ADP-based ecosystem, users are only responsible for the last mile of the data journey which is refinement of data for exploitation. All the complex, tedious, time-consuming, and costly process components are dynamically managed by the ADP hyper-automation.


What does ADP look like conceptually?

No alt text provided for this image

The ADP’s feature set includes:

1. Source Discovery – flexible source intake services supporting scanning, discovery, and acquisition of metadata from traditional source applications and novel sources of data in the future. Examples include application databases, file systems, SaaS applications, APIs etc.

2. Comprehension – services that make finding and understanding of data as simple as surfing popular e-commerce websites. Built-in support for both automated and manual curation of metadata further enriches the experience to the delight of users. In large enterprises, getting tribal source application knowledge centralized is key to reducing time-to-value for journey teams.

3. Order-Fulfillment – simple experiences mirroring popular e-commerce checkout process are presented to journey teams interested in onboarding new data onto the ADP. Complexities of source connectivity, transfer methods, processing, encryption, storage, and organization are all abstracted away from the user. The user is presented easy-to-follow, configurable options to customize their orders. Approval workflows are built into order fulfillment for governance and security.

4. Unassisted Processing – hidden inside the order fulfillment service for data onboarding are both the highest value features and toughest challenges for the ADP that free the journey teams from the tedious, costly and slow processes involved in sourcing data, shipping across platforms, organizing, cleansing, automating and disseminating data.

5. Access Provisioning– user requests for data access is a simple exercise mirroring common checkout processes utilized in online shopping. Abstracted away from the user is a robust governance protocol that rapidly provisions access to data in a secure manner.r.

6. Refinement – journey teams are provided a bring your code environment for refinement and mining of data. Tedious and time-consuming data engineering tasks have been eliminated to allow scarce, expensive, and talented resources to focus on high-value development while ADP’s backend engineering dynamically creates the storage and processing components with built-in audit, balance and control.

7. Interoperability – multitude of services to seamlessly operate with other platforms and applications including business intelligence, machine learning, chatbots, SaaS applications and API services.


What’s behind the magic?

The ADP requires a series of engineering innovations, each focused on bringing value to distinct stages of the data lifecycle. This includes:

1. Ingestion– an extensible ecosystem of pre-built patterns and dynamically created pipelines and infrastructure to transfer data from a variety of sources such as filesystems, databases, SaaS applications, SFTP servers and APIs in both batch and real time.

2. Discovery – provide an automated scanning and cataloging service for a variety of source application types to make searching and understanding data a simple browser-based experience.

3. Order fulfilment – marketplace-based, order-fulfilment services place control of data onboarding directly in the hands of the journey teams.

4. Autogenic backend – sophisticated, self-organizing ecosystem for organizing and storing data with dynamic-code engineering that auto-creates infrastructure and pipelines necessary to move and organize data within the ADP.

5. Change data– dynamically-generated change data detection for time-series data management.

6. Multi-mode – supports unassisted and assisted pipelines.

7. Multi-speed – supports both batch and near real-time processing.

8. Trust and Governance – centralized governance, coupled with stringent data quality processes for robust, fine-grained security and trusted data.


How does the journey team benefit from ADP?

True north star is to make the user experience delightful with greater understanding of their needs and friction points. Journey team benefits include:

  1. Velocity – faster innovation cycles for both technical and non-technical users.
  2. Ease of use – lower barriers to entry with everyday interactions supported via browser-based experiences.
  3. Ease of access – democratized access to data with increased autonomy and self-service experiences covering data access, data acquisition, movement, and organization.
  4. Efficiency – reduced development time with creative engineering abstracting and eliminating manual, tedious and time-consuming tasks.
  5. Costs – reduced tax burden across the entire data and analytics lifecycle with intelligent automation and backend engineering.
  6. Flexibility – open system with choice of connectors to sources, speed, formats, and processing environments.
  7. Unified Data– integrated data environment eliminates data silos for multi-source data exploitation.
  8. Safety – secure environment with built-in security and compliance guardrails.
  9. Reliability – continuous availability of data via robust business continuity support.

What about business value?

Below are a few quantitative and qualitative benefits:

No alt text provided for this image