We’re excited to bring back Transform 2022 in person on July 19 and virtually from July 20-28. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Register today!
As more processes move online during the pandemic, businesses are embracing analytics to better understand their operations. According to a 2021 survey commissioned by Starburst and Red Hat, 53% of organizations believe access to data has become “more critical” throughout the pandemic. The findings align with findings from ManageEngine, Zoho’s IT division, which found in a 2021 survey that more than 20% of organizations increased their use of business analytics compared to the global average.
Thirty-five percent of Starburst and RedHat survey respondents said they seek real-time business risk analysis, while 36% said they seek growth and revenue generation through engagements “smarter” customers. But highlighting the challenges of analytics, more than 37% of respondents said they were not confident in their ability to access “timely and relevant data for decision-making”, whether due to disparate storage sources or data pipeline development issues.
Two emerging concepts were presented as the answer to obstacles in data analysis and management. One is “data fabrication,” a data integration approach that includes an architecture (and services running on that architecture) to help organizations orchestrate data. The other is a “data mesh,” which aims to mitigate data availability challenges by providing a layer of decentralized connectivity that allows businesses to access data from different sources across different locations.
Data fabrics and data meshes can serve a wide range of business, technical, and organizational purposes. For example, they can save data scientists time by automating repetitive data transformation tasks while powering self-service data access tools. Data structures and data meshes can also integrate and augment data management software already in use for increased cost effectiveness.
A combination of technologies including AI and machine learning, the data factory is like a weave that stretches to connect data sources, types, and locations with data access methods. Gartner describes it as an analysis of “existing, discoverable, and inferred metadata assets” to support the “design, deployment, and use” of data in on-premises, edge, and data center environments.
The data structure continuously identifies, connects, cleans and enriches real-time data from different applications to uncover relationships between data points. For example, a data structure can monitor various data pipelines — the set of actions that ingest raw data from a source and move it to a destination — to suggest better alternatives before automating the most repeatable tasks. A data structure can also “fix” failed data integration tasks, handle more complex data management aspects such as creating – and profiling – datasets, and offer ways to govern and secure data by limiting who can access what data and infrastructure.
To discover relationships between data, a data structure constructs a graph that stores interconnected descriptions of data such as objects, events, situations, and concepts. Algorithms can use this graph for different business analysis purposes, such as making predictions and bringing up stores of previously hard-to-find datasets.
As K2 View, a Data Fabric solution provider, Explain“The Data Factory continuously provisions…data based on a 360-degree view of business entities, such as a certain customer segment, a company’s product line, or all outlets in a specific geographic area… Using this data, data scientists create and refine machine learning models, while data analysts use business intelligence to analyze trends, segment customers, and perform root cause analysis. The refined machine learning model is deployed into the data structure, to be executed in real time for an individual entity (customer, product, location, etc.) — thus “operationalizing” the machine learning algorithm. The data structure runs the machine learning model on demand, in real time, feeding it the complete and current data of the individual entity. The machine learning output is instantly sent back to the requesting application and persisted in the data structure, as part of the entity, for future analysis. »
Data structures often work with a range of data types, including technical, business, and operational data. In the ideal scenario, they are also compatible with many different data delivery “styles”, such as replication, streaming, and virtualization. Beyond that, the best Data Fabric solutions provide robust visualization tools that make it easier to interpret their technical infrastructure, allowing companies to monitor storage costs, performance and efficiency, as well as security, no matter where their data and apps are.
In addition to analytics, a data structure offers a number of benefits to organizations, including minimizing disruption from switching between cloud providers and compute resources. The data structure also enables enterprises – and the data analytics, sales, marketing, network architects and security teams that work there – to adapt their infrastructure to changing technology needs, by connecting infrastructure endpoints regardless of data location.
In a 2020 report, Forrester found that IBM’s Data Fabric solution could accelerate data delivery 60x while driving a 459% increase in ROI. But the data structure has its drawbacks, the main of which is the complexity of the implementation. For example, data structures require exposing and integrating different systems and data, which can often format the data differently. This lack of native interoperability can add frictions like the need to harmonize and deduplicate data.
On the other hand, there is a data mesh, which breaks down large enterprise data architectures into subsystems managed by a dedicated team. Unlike a data structure, which relies on metadata to generate recommendations for things like data delivery, data meshes leverage the expertise of subject matter experts who oversee “domains”. within the mesh.
“Domains” are independently deployable clusters of associated microservices that communicate with users or other domains through different interfaces. Microservices are made up of many small, loosely coupled, independently deployable services.
Domains typically include code, workflows, a technical team and environment, and teams working within domains treat data as a product. Clean, fresh and complete data is provided to any data consumer based on permissions and roles, while “data products” are created to be used for specific analytical and operational purposes.
To add value to a data mesh, engineers must develop a deep understanding of data sets. They become responsible for maintaining data consumers and organizing around the domain, i.e. testing, deploying, monitoring, and maintaining the domain. Beyond that, they need to ensure that the different domains stay connected through a layer of interoperability and consistent data governance, standards, and observability.
Data meshes promote decentralization, on the plus side, allowing teams to focus on specific problem sets. They can also bolster analyzes by leading with business context instead of jargon, technical knowledge.
But data meshes have their drawbacks. For example, domains can unintentionally duplicate data, which wastes resources. The distributed structure of data meshes may, if the data mesh is not sufficiently independent of the infrastructure, require more technical experts to scale than centralized approaches. And technical debt can increase as domains create their own data pipelines.
Working with Data Meshes and Fabrics
When weighing the pros and cons, it’s important to keep in mind that data mesh and data structure are concepts, not technologies, and are not mutually exclusive. An organization can adopt both a data mesh and data structure approach in some or all departments, as appropriate. For James Serra, a former big data and data warehousing solutions architect at Microsoft, the difference between the two concepts lies in how users access the data.
“A data structure and a data mesh both provide an architecture for accessing data across multiple technologies and platforms, but a data structure is technology-centric, while a data mesh focuses on change. organizational,” he wrote in a blog post (Going through datanami). “[A] data mesh is more about people and processes than architecture, while a data fabric is an architectural approach that tackles the complexity of data and metadata in a smart way that works well together.
Eckerson Group analyst David Wells cautions against obsessing over differences, which he says are far less important than the components that need to be in place to achieve desired business goals. “These are architectural frameworks, not architectures,” Wells writes in a recent blog Publish (also through datanami). “You don’t have an architecture until the frameworks are adapted and customized to your needs, data, processes and terminology.”
All this to say that data fabrics and data meshes will remain just as relevant for the foreseeable future. Although each involves different elements, they serve the same purpose of bringing better analytics to an organization with a sprawling and growing data infrastructure.
VentureBeat’s mission is to be a digital public square for technical decision makers to learn about transformative enterprise technology and conduct transactions. Learn more about membership.