Couldn’t attend Transform 2022? Discover all the summit sessions now in our on-demand library! Look here.


The modern data stack (MDS) is fundamental to digital disruptors. Think Netflix. The company pioneered a new business model around video as a service, but much of its success relies on real-time streaming data.

They use analytics to offer highly relevant recommendations to viewers. They monitor data in real time to maintain constant visibility into network performance. They sync their database of movies and shows with Elasticsearch to make it quick and easy for users to find what they’re looking for.

It has to be real-time, and it has to be 100% accurate. The old fashioned Extract, Transform, Load (ETL) is just too slow. To meet this need, Netflix has created a Change Data Capture (CDC) tool called DB Log which captures changes in MySQL, PostgreSQL, and other data sources, then streams those changes to target data stores for search and analysis.

Netflix required high availability and real-time synchronization. They also needed to minimize the impact on operational databases. CDC deletes database logs, replicating changes to target databases in the order in which they occur, so it captures changes as they occur, without lock records or bog down the source database.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to advise on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.

register here

Data is at the heart of what Netflix does, but they are not alone in this regard. Companies like Uber, Amazon, Airbnb and Meta thrive because they really know how to leverage data. Data management and analytics are strategic pillars for these organizations, and CDC technology plays a central role in their ability to carry out their core missions.

The same can be said of almost any business operating at the top of its game in today’s business environment. If you want your business to perform like a top performer, you need to modernize and master your data. Your competitors are certainly already doing this.

Onboarding in less than a second is the new normal at Airbnb and Uber

In today’s world, a strong customer experience requires real-time data feeds. Airbnb has recognized the value of CDC technology in creating excellent CX for its customers and hosts. They too have built their own CDC platform, which they call Spinal Tap. Airbnb’s dynamic pricing, listing availability, and reservation status require uncompromising accuracy and consistency across all systems. When an Airbnb guest books a visit, they expect workflows to be very fast and 100% accurate.

For Uber, immediacy is arguably even more important. Whether a customer is waiting to be driven to the airport or ordering food delivery, timing is everything. Just like Netflix and Airbnb, they have developed their own CDC Platform to synchronize data across multiple data stores in real time. Again, a common set of requirements emerged. Uber needed their solution to be extremely fast and fault-tolerant, with no data loss. They also needed a solution that wouldn’t reduce the performance of their source databases.

Change data capture for the rest of us

Once again, CDC fits the bill. Previously, overnight batch mode ETL might have been sufficient to provide daily management update or operational reports. Today, real time is increasingly the norm. If information is power, then immediate access to information is turbo power.

This is why CDC is quickly becoming a fundamental requirement for the modern data stack. It’s all well and good, though, that big companies like Netflix, Airbnb, and Uber have the resources to build custom CDC platforms — but what about the others?

Off-the-shelf CDC solutions fill this gap by providing the same low-latency, high-quality streaming pipelines without the need to build from scratch.

Unfortunately, they are not all created equal. Most companies operate a set of systems that manage enterprise resource planning (ERP), customer relationship management (CRM), or specialized operational functions such as purchasing or HR. These run on different database platforms, with incongruous data models. If a company operates mainframe systems, it is likely dealing with obscure data structures that do not easily integrate with modern relational data.

This makes heterogeneous integration particularly important. This requires a connection to multiple data sources and targets, including transactional databases such as SAP, Oracle, IBM Db2, and Salesforce. This means delivering real-time streaming data to platforms like Databricks, Kafka, Snowflake, Amazon DocumentDB, and Azure Synapse Analytics.

Real-Time CDC Automation

To drive artificial intelligence (AI) and advanced analytics, companies need to move their data to a common MDS platform. This means ingesting information from various sources, transforming it to fit a unified analytics model, and delivering it to a modern cloud-based data platform.

Change data capture technology serves as a vital link in the data-driven value chain, first by automating the ingestion of data from source systems, then transforming it on the fly and delivering it to a cloud data platform. Real-time CDC automation ensures the right information gets to the right place immediately.

Because they focus only on data that has changed, streaming CDC pipelines offer huge efficiency advantages over the batch mode operations of the past. The best CDC solutions can deliver over 100 terabytes of data from source to target in less than 30 minutes, without any data loss.

The transition to cloud computing is well underway. Cloud analytics, in particular, offers distinct advantages to companies that truly understand the transformational role of data. Leading companies across industries are aligning their strategic visions around data analytics. They digitize their interactions with customers and use algorithms to study data, extract insights and take action. AI and machine learning ingest large amounts of information, uncover correlations and identify anomalies.

Whether you’re leading the way in digital disruption or simply trying to keep pace, CDC technology will play a pivotal role in making the modern data stack a reality and opening the door to digital transformation.

Gary Hagmueller is CEO of Arcion.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.

If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider writing your own article!

Learn more about DataDecisionMakers