A data mesh reverses the script on centralization and has a monolithic data structure by decentralizing data management to the various business areas of the enterprise.
The use of data is difficult (clearly illustrated in the Annual Survey of Business Leaders on Big Data and AI and other similar studies). Companies know this to be true and have spent the last three decades trying to make it easier, eagerly gravitating towards the next “platform of the day” that promises to enable greater access to data and analytical insights. First came enterprise data warehouses (EDWs), then the various cloud-built warehouses and lakes, and now data mesh is all the rage.
Time and time again, these approaches have led to different schools of thought, each with their own brand leaders and industry advocates, on how companies should manage and run their data. Separating the hype and what should be embraced can be overwhelming. And we can expect to only browse more emerging solutions in the future until organizations tackle a fundamental and overlooked challenge within their data stacks: data usability.
The centralized school of thought
Traditional EDW introduced the idea of integrating structured data into one place that would make it easier to access for Business Intelligence (BI) reports. The data would be highly organized, meaning that organizations would populate their EDWs only with data deemed necessary for specific BI reports. While this saved resources and costs, it also meant the removal of other valuable related data that could provide deeper and beneficial insights.
Seeking to aggregate and leverage even more of their data, companies have migrated the concept of EDW to the cloud. In large part, these companies saw and wanted to emulate the successes of the world’s digital native FAANGs, which outpaced the competition by using comprehensive cloud data to guide business decisions and hyper-personalize products and services for customers. But cloud-based EDWs were still limited to only structured data, leaving out the vast wealth of unstructured data in the modern enterprise. As a result, most organizations end up replicating their same on-premises BI reports rather than achieving anything transformational.
In 2010, data lakes emerged as a promising solution, where organizations consolidated all raw, unstructured, semi-structured and structured data into one central location, usable for analytics, predictive modeling, machine learning , etc. However, data lakes have also been equated with “data swamps” as they often end up becoming expensive dumping grounds for all data due to poor configuration, governance and management. The data would be far from usable, creating mistrust in the quality of the data and the resulting insights or solutions.
Anyone who has experienced limitations in BI reporting or a data swamp will not be surprised to learn that a TDWI research study of 244 enterprises using a cloud or lake data warehouse, 76% experienced most, if not all, of the same challenges as their on-premises environments.
See also: Data Fabric vs Data Mesh: Key Differences and Similarities
Decentralize with a data mesh
Originally proposed by Zhamak Dehghani of ThoughtWorks, the data mesh reverses the script on centralization and has a monolithic data structure by decentralizing data management to the various business areas of the enterprise. The goal of a data mesh is for each business domain to treat data as a commodity that it can transform, use, and make available to other cross-domain users.
The idea is that experts in your field of activity would know better if the information is up-to-date, accurate and reliable and can better deliver the right data at the right time. In a fully centralized approach, they would be dependent on data teams, who are often constrained in available resources and have to juggle many competing demands from other business units, which can lead to delays. With data mesh, however, there is no longer a need to query data from a huge data lake, so users can act on the data closer to their location, thus speeding up the time to obtaining information and value. Weaving the mesh constitutes federated IT governance – essential organization-wide standards, rules, and regulations to ensure interoperability between domain units and data-as-a-product.
It is important to note that data meshing is not a single out-of-the-box solution, but rather an organizational approach that can take multiple technologies and can even include a data lake. Since the approach is radically different from what organizations are used to, change management is necessary, including getting buy-in from experts in your field who are used to consuming reports rather than to do the data engineering and scientific work themselves. The increase in data skills within the units of the domain will therefore be necessary for this decentralized model to be a success.
See also: The promises of Data Fabric in digital transformation
Data usability is still a predominant issue
While data mesh may look fundamentally different from the cloud and lake data warehouses that have long dominated the industry, these approaches present similar challenges that underscore the need for data usability.
The fundamental problem is that data in its raw form is unusable. You have vast chunks of data filled with errors, duplicate information, inconsistencies, and various formats all floating in isolation across disparate systems. With cloud data warehouses and lakes, these bits are usually just moved along with their existing issues from their on-premises environments to the cloud – warts and all. In turn, the data is still isolated and siled, except now it’s all in one place. This is why people end up encountering the same challenges on-premises in the cloud. These floating bits must eventually be ingested, integrated, and enriched to become usable.
The same transformation needs to happen with a data mesh: only, rather than central data teams doing the work, each business domain becomes responsible for its own data. The decentralized nature of a data mesh can also introduce new complexities. For example, it can cause business areas to duplicate effort and resources on the same datasets. Additionally, data products from one business domain can and often do benefit other domains. Thus, beyond discovering relationships between datasets, users also need to reconcile data product entities across domains, for example when assembling data from different systems to form a complete picture of a customer.
We talked about the need to develop business users within a data mesh. A move towards more citizen data scientists may be needed, even among companies not adopting data mesh, simply because of the widespread shortage of data scientists, with the latest estimates indicating a gap of 250,000 between job offers and searches. The shortage of talent, coupled with the proliferating amount of data in modern enterprises, has left few organizations able to effectively use their data at scale.
Establish a data usability layer
Whether your organization chooses to take a centralized or decentralized approach to managing enterprise data, ultimately you need a way to connect, integrate, and make sense of all the information across your entire organization. company. If you don’t have the talent available to do this critical work and the volume of data is overwhelming, then automation is something to consider.
Today, AI can be applied to automate the ingestion, enrichment, and distribution of data from everything sources, managing every step necessary to obtain usable data assets. You move from fragmented and floating information to linked and merged information within a metadata layer, or data usability layer, in your data stack, preparing the data for use in reports, analytics , products and services by any user.
A data usability layer sits alongside any cloud data warehouse, data lake, or data mesh environment. It enables companies to optimize the strategy they choose for their organization by allowing you to understand, use and monetize every bit of data at absolute scale.