What is data mesh?
Data mesh is an enterprise data management framework that defines how to manage business-domain-specific data in a way that allows business domains to own and operate their data. It empowers domain-specific data producers and consumers to collect, store, analyze, and manage data pipelines without the need for an intermediary data-management team.
Data mesh has its origins in distributed computing, where software components are shared among multiple computers running together as a system. With data mesh, the ownership of data is distributed across different business domains, and each domain is responsible for creating its data products. Data mesh also enables easier contextualization of data to generate deeper insights while concurrently facilitating more collaboration from domain owners to create solutions tailored to specific business needs.
How is data mesh defined?
Data mesh is a data platform architecture design approach for implementing a decentralized, distributed data analytics and data sharing architecture
How does data mesh work?
The architecture of data mesh has information stored across multiple sources, and a data formation service makes the data products available as permissioned tables. The data owner may also create and expose APIs that other users can consume. Data mesh also has a data catalog that stores metadata, such as table names, columns, and user-defined tags.
What are the data mesh principles?
The fundamental pillars of data mesh include four principles: decentralization via domain ownership, data as a product, self-serve data infrastructure, and federated computational governance. The four principles serve to describe data mesh and are important to produce the value from data and the agility from a modern architecture that companies seek as they grow.
Data mesh principle #1: domain ownership
This describes the decentralization of the ownership of data, i.e., the responsibility of the data, to the business domains that are closest to it. Essentially, business domains own their data rather than a centralized IT function. However, IT may play a role in helping business domains to harness and extract the power of its data. Domain ownership is critical for companies to realize scale and avoid bottlenecks through a centralized data flow structure.
Data mesh principle #2: data as a product
With a decentralized domain-owned (or domain-oriented) structure, data is shared with other users and consumers interested in the data. Examples of data as a product may include a data set for analytics or data for a delivered service. Domain owners of data may share data as they best see fit to produce a desired business outcome. Data as a product should have the minimum characteristics of being discoverable, addressable, understandable, trustworthy, truthful, and secure.
Data mesh principle #3: self-serve data platform
For business domains to realize data as a product, to share with others, business domains must be empowered to do so. The goal of self-service is to remove friction from the end-to-end data journey, from source to consumption. Business domains or individual data owners are then in the position to develop and enhance the data and define the parameters for which data is shared. Platform infrastructure capabilities and automated governance policies make self-service possible.
Data mesh principle #4: federated computational governance
A broad and encompassing principle that spells out the data governance operating model based on federated decision-making, accountability, security, legal, compliance policies and more. Motivations for this principle include the desire to attain a higher-order value from aggregated data and to counter potential undesirable consequences of a domain-oriented, decentralized infrastructure.
What are the benefits of data mesh?
- Decentralizing data ownership and data operations to accelerate the agility of business domains to make relevant decisions
- Providing domain teams with the independence to choose the data technology stack that best meets their needs
- Delivering transparency across cross-functional teams by reducing the likelihood of isolated data teams
- Facilitating data sovereignty and data residency to ensure alignment with data governance regulations
Frequently asked data mesh questions
1. How does data mesh address challenges related to data quality, consistency, and standardization across decentralized domains?
In the context of data mesh, ensuring data quality, consistency, and standardization across decentralized domains involves the implementation of robust data governance practices. This includes defining clear metadata standards, data validation processes, and collaborative efforts among domain owners to establish and adhere to common data quality metrics. While data ownership is distributed, collaborative frameworks and automated tools can be employed to enforce standardized data practices, ensuring that data remains accurate, trustworthy, and aligned with organizational standards.
2. What specific tools or technologies complement the implementation of a data mesh architecture?
The practical implementation of a data mesh architecture often involves a combination of various tools and technologies to support different principles. For domain ownership, tools that enable efficient data cataloging, metadata management, and access control are crucial. Self-serve data platforms can leverage data integration tools, cloud services, and automation solutions to empower business domains. Federated computational governance may involve the use of policy management tools, blockchain for accountability, and frameworks for legal and compliance adherence. The specific tooling may vary based on organizational requirements, technology stacks, and the nature of data products within each domain.
3. Are there any notable challenges or potential drawbacks associated with adopting a data mesh approach?
Potential challenges may arise in the transition to a decentralized model. Managing cultural shifts, ensuring consistent adoption of data standards across domains, and addressing potential data security concerns are crucial challenges. Additionally, organizations may face complexities in aligning federated decision-making processes, navigating legal and compliance requirements, and establishing effective communication channels among decentralized domain teams. It’s essential for organizations considering data mesh to conduct thorough assessments, invest in change management, and anticipate and address challenges throughout the implementation process.