What is a Data Mesh?

What is a Data Mesh?
July 1, 2021 | Data Quality

What is a Data Mesh?

Data Mesh Defined

In the era of data as a first-class citizen, every organization works toward being data-driven, using huge investments into data givers and enablers. However, the ever-increasing data demands are no match for the traditional data warehouse or data lake with limited real-time streaming capabilities.

The need for democratization and scalability in the data pipelines underpins the faults in the legacy systems and conflicting business priorities. Fortunately, a new enterprise data architecture has risen that brings in a new lease on the heavy and fragile data pipelines. Data Mesh gives a way of seeing data not as a by-product but as decentralized self-contained data products.

Data mesh is an architectural paradigm that unveils analytical data at scale, rapidly releasing access to an increasing number of distributed domain data sets for a proliferation of consumption scenarios such as machine learning, analytics, or data-intensive applications across the organization. It addresses the standard failure modes of the traditional centralized data lake or data platform architecture, shifting from the centralized paradigm of a lake, or its predecessor, the data warehouse. Data mesh shifts to a paradigm that draws from modern distributed architecture: considering domains as the first-class concern, applying platform thinking to create a self-serve data infrastructure, treating data as a product, and implementing open standardization to enable an ecosystem of interoperable distributed data products.

Data Mesh acquisition needs a very high level of automation regarding infrastructure provisioning, realizing the self-service infrastructure. Every Data Product team should manage to provide what it needs autonomously. However, even if units are independent in technology choices and provisioning, they cannot develop their product with the permission of entry to the full range of technologies that the landscape provides.

A critical point that makes a data mesh platform successful is the federated computational governance, which provides interoperability via global standardization. The “federated computational governance” is a group of data product owners with the challenging task of making rules and simplifying the conformity to such regulations. What is decided by the “federated computational governance” should follow DevOps and Infrastructure as Code conduct.

Each Data Product shows its abilities through a catalog by defining its input and output ports. A Data Mesh platform should nonetheless provide scaffolding to implement such input and output ports, choosing technology-agnostic standards wherever possible; this includes setting standards for analytical, as well as event-based access to data. Keep in mind that it should push and ease the internally agreed standards but never lock product teams in technology cages. The federated computational governance should also be very open to the change, letting the platform evolve with its users. With the help of a centralized data warehouse, data mesh solves these challenges;

  • Ownership of data
  • Poor data quality, thus enabling the infrastructure team to know the data they are handling
  • Scaling of a business or organization, thus enabling the central team to become the center point.

The goal of data mesh is to make organizational data a product. This way, every source of the organization’s data becomes autonomous, it is assigned a manager and becomes a building block of a mesh.

Data infrastructure is the other makeup of a data mesh. Data infrastructure entails the provision of access control to data, its storage, a pipeline, and a data catalog. The main goal of the data infrastructure is to avert any duplication of data in an organization. Every data product team focuses on building its own data products faster and independently. This way, the data infrastructure platform is compatible with different data domain types.

What are the advantages of using data mesh?

  • Allowing greater autonomy and flexibility for data owners, facilitating greater data experimentation and innovation while lessening the burden on data teams to field the needs of every data consumer through a single pipeline.
  • Data meshes’ self-serve infrastructure-as-a-platform provides data teams with a universal, domain-agnostic, and often automated approach to data standardization, data product lineage, data product monitoring, alerting, logging, and data product quality metrics (in other words, data collection, and sharing).
  • Provides a competitive edge compared to traditional data architectures, which are often hamstrung by the lack of data standardization between investors and consumers.

Data engineers of a data mesh node remain single-domain experts, but for the entire node, not just for the source system. Thus minimizes communication problems and misinterpretations of data. Furthermore, a data mesh node interface must support any form of data usage, from straightforward requests to advanced forms of analytics. Data virtualization technology has been developed to make interfaces on almost any kind of system. It makes it possible to get that data through multiple interfaces, including record-oriented and set-oriented interfaces.

Additionally, it helps the implementation of data security and privacy rules and provides users with metadata. Several technologies are accessible for developing those interfaces. Data virtualization brings to the table that it already supports most of the required technology, enabling the rapid development of data meshes. If you want to know more, reach us at dqlabs.ai, and we’ll be glad to get answers to all your queries.