If your organization spends more time scratching its head over data quality or operational issues than asking unique and interesting questions, then you’re not just wasting time, you’re actively eroding the future of your business. .

The problem with systems and applications isn’t just the amount of data your organization is handling today or will likely be handling in the future. It’s not even in the stability or resilience of the systems and applications that create, store, and manage all of this data. Observability can help.

Why? The problem is in the data and in knowing the answer to a deceptively simple question: can you trust it? For many organizations, the answer is a resounding Nope. According to a study by HFS Research, “75% of business leaders do not have a high level of trust in their data and 70% do not consider their data architecture to be world-class”. And the problem with this lack of trust is that if your employees don’t trust your data, they won’t trust any of the information it creates about their business. Such issues impact everything from reliability of results to application performance.

Whether your organization manages a few hundred gigabytes of data or several petabytes, you need to start building sets of systems, tools, and philosophical that will help your employees understand, manage and improve the health of your data. One of these sets is the growing world of data observability.

The State of Data Observability

In an interview with our Editor-in-Chief Salvatore Salamone, Rohit Choudhary, Founder and CEO of Acceldata.io, described data observability as an approach that “helps modern data businesses manage the complexity that so many different data brings into the business. This gives them the ability to control the quality, reliability and performance of their overall data and data systems.”

Evgeny Shulman, co-founder and CTO of Databand.ai, says, “Data observability goes beyond monitoring by adding more context to system metrics, providing deeper insight into system operations, and showing if engineers need to step in and apply a fix… the observability tells you that its current state is associated with critical failures and you need to take action.

According to Choudhary, the world of data observability is a direct descendant of the rapid deployment of computer applications between 2000 and 2010, followed by a wave of more data-intensive applications between 2010 and 2015. Now that many applications collect data for a decade or more, organizations need new operational and analytical techniques to understand the vastness of what they’ve already gathered and prepare their pipelines for what’s next.

The industry around data observability is also snowballing. There are already established players, like Monte Carlo and Bigeye, with a host of new startups like Cribl, Acceldata, Databand, Datafold and Soda, some of which also have or support open source tools. They each try to address the “black box” feeling of complex data pipelines and architectures that can move data but not be able to be monitored.

Overcoming cultural and philosophical pitfalls

While many organizations might just get in touch with one of these data observability vendors and create a new suite of tools, your goal isn’t just to create new tools because everyone else is doing it. . In our conversations with observability experts, we’ve uncovered a few flawed assumptions that have led other organizations into situations where they know how to look into the black box that is their data pipeline, but have no way to. “translate” this to the rest of their organization.

  • There is no end goal for data. In the past, data was collected as a prerequisite for a specific question an organization wanted to ask. For example, marketing teams use tools like Google Analytics to understand the demographics of those who visit their website. But now our ability to derive new questions from old data is unprecedented, hidden under layers of data that need to be correlated. The observability of your data must take into account that no dataset is “complete”: it simply waits for new questions to provide new information.
  • There is no end goal to the quality of your data. Choudhary says that in the past, data quality efforts “felt like a centralized, once-a-year goal managed by the CTO’s office.” This has now completely changed as confidence in data quality erodes and the speed of analysis increases. Data quality becomes a real-time concern, the kind of metrics executives might want to see splashed across a monitor in the office or in an easily accessible dashboard.
  • Bad data is everyone’s problem. And it’s not about blaming it all on DataOps or data science teams – it’s a reminder that almost everyone in an organization these days runs data-driven analysis, not just data scientists. .
  • Downtime is greater than inaccessible data. In observability, we usually think of downtime as anything that affects customers or end users, but data downtime is described as any time your data is “partial, erroneous, missing or otherwise inaccurate”. He may still be approachable, but his condition is still actively eroding that already unstable trust.
  • APM ≠ data observability. If you thought you could adopt your existing application performance monitoring (APM) solutions for data observability, you’d probably be disappointed. When it comes to data pipelines, you need to know more than whether an application is running or not. If a data set doesn’t arrive when you expect it, you need to trace the typical lineage of that data to understand which step went wrong, something endpoint monitoring tools aren’t able to do.

Ultimately, the goal should be to make your data engineers shine. If your organization spends more time scratching its head over data quality or operational issues than asking unique and interesting questions, then you’re not just wasting time, you’re actively eroding the future of your business. . A team of data engineers that is free to work on complex but valuable problems is a team that will generate more value than your competitors could derive from similar information. And to get there, they will first rebuild trust in the data they observe.