(amiak/Shutterstock)

In addition to exploding data volumes, many organizations are grappling with an explosion in the number of data sources and data silos. Managing data in this fluid and ever-changing environment is a major challenge for organizations that would be data-driven, but one pattern that offers potential salvation for the stressed-out data architect is data structure.

Data fabrics are not new. We’ve been writing about them for several years here at datanami. At first, the definition of a data structure was a bit vague. But lately this has started to harden, and the core elements of a data structure have coalesced into a configuration that finds real-world ground.

Forest Analyst Noel Yuhanna was an early proponent of the data factory. In the latest Forrester Wave: Enterprise Data Fabric, Q2 2022, Yuhanna dove into the benefits of Data Fabric and dissected the offerings of 15 Data Fabric vendors.

“Late information today can have a devastating effect on a company’s ability to win, serve and retain customers,” Yuhanna wrote in the Wave report. “Organizations want real-time, consistent, connected and reliable data to support their critical business operations and information. However, new data sources, slow movement of data between platforms, rigid data transformation workflows and governance rules, increasing data volume, and distribution of data across clouds and site can cause organizations to fail when executing their data strategy.

Centralizing all data in a data lake such as Hadoop or Amazon S3 was supposed to solve many of these problems, but it didn’t work that way. Not all data belongs to the lakes, thanks to bandwidth and storage costs as well as convenience. Technological advancement also continues to produce new digital innovations, and people are more than happy to try them out, which usually results in another silo of data.

Data fabrics connect disparate data sources in a federated fashion (agsandrew/Shutterstock)

Data silos seem to be permanent guests. Just like Edwin Hubble analogy with raisin pudding argued that the expansion of the universe is moving matter away, the big data boom seems to be driving data repositories away, even as the global volume of data continues to grow at a geometric rate. Data fabric is a way to layer connective tissue among these sweet nuggets of data.

As Yuhanna wrote:

“The Data Fabric provides a unified, integrated and intelligent end-to-end data platform to support new and emerging use cases,” he continued. “It automates all data management functions, including ingestion, transformation, orchestration, governance, security, preparation, quality, and curation, enabling insights and analytics to quickly accelerate use case.”

Data fabrics are essentially pre-integrated super-suites of data management tools. Instead of tinkering with separate products to manage the data functions mentioned above by Yuhanna (not to mention data catalogs), data structures provide these functions through a single product, ensuring consistency and repeatability of management processes. of Big Data, which helps to build trust in the data and the analyzes derived from it.

Yuhanna currently sees many data structures deployed in cloud and hybrid cloud environments, especially to support applications such as Customer 360, Business 360, fraud detection, IoT analytics, and real-time insights . Data fabrics are being deployed across multiple industries, including financial services, retail, healthcare, manufacturing, oil and gas, and energy, he wrote.

Data structures are also deployed in the life sciences industry, where they can help bring disparate data silos together into a seamless whole. A life science company betting big on data fabrics is eClinical Solutionsa Massachusetts-based software provider for conducting clinical trials.

In the past, clinical trials could involve three or four disparate data sources, according to Raj Indupuri, CEO of eClinical Solutions.

“But now with the research we end up with for every trial, you could have over 15 different sources, different data streams, different structures, different formats, different systems,” Indupuri said. “So the problem in terms of data chaos – we call it data chaos – has only exploded or increased.”

According to Indupuri, the data factory is a natural evolution of the data lake, or lake house. These flexible data repositories are capable of ingesting and storing just about any type of data, giving customers or stakeholders the ability to transform, prepare and analyze data when they need it. But when data spans multiple data lakes (or warehouses or lakehouses), that’s where data fabrics play an important role.

“A big difference would be, instead of having everything in one centralized place, with the data structure is how do you actually combine different stores,” he said. datanami in a recent interview. “They could be distributed. But in addition, we have a fabric that, through governance and other capabilities, allows us to effectively deliver analytics to end stakeholders, to deliver it downstream to different stakeholders in different systems.

eClinical Solutions has already integrated some components of a Data Fabric solution into its offering. According to Indupuri, he built an end-to-end data pipeline in AWS that automatically extracts metadata and catalogs it when new data comes into the system. The company’s solution also includes a data management workshop where data stewards can review and cleanse data.

“We have evolved considerably in ten years,” he said. “When we started, it was kind of a report. Then we evolved into a kind of data lake, a kind of ark cure, where you can stage any data, whatever Then we have built-in capabilities where it’s based on metadata, you can actually transform and publish data marts to our data cloud.

Where it gets tricky is dealing with the data repositories of eClinical Solutions’ own customers, which are pharmaceutical companies or drug discovery companies. These customers often have separate data lakes for clinical research, for operational data, for safety data, and for regulatory data, and are loath to move or copy data between them.

“You can actually allow them to access data from these data stores, or these distributed data clouds, data lakes or data warehouses,” Indupuri said. “This is where the data structure can help.”

Related articles:

Data Mesh Vs. Data Fabric: Understanding the Differences

Data Fabrics Emerge to Ease Cloud Data Management Nightmares

Big Data Fabrics Emerge to Relieve Hadoop Pain