A DataOps strategy relies heavily on collaboration, as data flows between managers and consumers across the enterprise. Collaboration is critical to the success of DataOps, so it’s important to start with the right team to drive these initiatives.

It’s natural to think of DataOps as just DevOps for data – not quite. It would be more accurate to say that DataOps tries to achieve for data what DevOps achieves for coding: a dramatic improvement in productivity and quality. However, DataOps has other issues to address, including how to keep a critical system in continuous production.

The distinction is important when thinking about building a DataOps team. While the DevOps approach is a model, with product managers, scrum masters, and developers, the focus will ultimately be on delivery. DataOps also needs to focus on ongoing maintenance and requires other frameworks to work with.

One of the main influences on DataOps has been lean manufacturing techniques. Managers often use terms taken from Toyota’s classic production system, which has been much studied and imitated. There are also terms like data factory when talking about data pipelines in production.

This approach requires a separate team structure. Let’s first look at some roles within a DataOps team.

Key roles for DataOps

The roles described here are for a DataOps team deploying data science in critical production.

What about less data science-focused teams? Do they also need DataOps, for example for a data warehouse? Granted, some of the techniques may be similar, but a traditional team of extract, transform, and load (ETL) developers and data architects will likely work well. A data warehouse, by its nature, is less dynamic and more constant than an Agile pipelined data environment. The following DataOps team roles manage the rather more volatile world of pipelines, algorithms, and self-service users.

Nonetheless, DataOps techniques are becoming increasingly relevant as data warehouse teams push to be increasingly agile, especially with cloud deployments and data lakehouse architectures.

Let’s start by defining the roles required for these new analysis techniques.

The data scientist

Data scientists do research. If an organization knows what it wants and just needs someone to implement a predictive process, hire a developer who is well versed in algorithms. The data scientist, on the other hand, explores for a living, discovering what is relevant and meaningful like him.

During exploration, a data scientist may try many algorithms, often in diverse model sets. They can even write their own algorithms.

The DataOps team can be the difference between a company that sometimes does cool stuff with data and a company that runs efficiently and reliably on data, analytics and insights.

Key attributes of this role are an insatiable curiosity and interest in the field, as well as technical acumen – especially in statistics – to understand the significance of what they are discovering and the real impact of their work.

This diligence matters. It is not enough to find a good model and stop there because business areas are changing rapidly. Additionally, while not everyone works in fields with compelling ethical dilemmas, data scientists in all fields sooner or later encounter personal or commercial privacy issues.

This is a technical role, but don’t overlook the human side, especially if the organization only hires one data scientist. A good data scientist is a good communicator who can explain results to a non-technical audience, often executives, while being upfront about what is and isn’t possible.

Finally, the data scientist, especially one working in a field that is new to them, is unlikely to know all operational data sources – ERP, CRM, HR systems, etc. – but it definitely needs to work with data. In a well-governed system, they may not have direct access to all of a company’s raw data. They must work with other roles who better understand the source systems.

The Data Engineer

Typically, it is the data engineer who moves data between operational systems and the data lake – and from there between areas of the lake such as raw, cleansed, and production areas.

The data engineer also supports the data warehouse, which can be a demanding task on its own as they must maintain a history for reporting and analysis while ensuring ongoing development.

At one time, the data engineer may have been called a data warehouse architect or ETL developer, depending on their expertise. But data engineer is the new term in the art, and it better captures the operational focus of the role in DataOps.

The DataOps engineer

Another engineer? Yes and one focused on operations. But the DataOps engineer has another area of ​​expertise: supporting the data scientist.

Data scientist skills focus on modeling and extracting insights from data. However, it is common to find that what works well on the bench can be difficult or expensive to deploy in production. Sometimes an algorithm runs too slowly on a production data set, but also uses too much compute or storage to scale efficiently. The DataOps engineer helps here by testing, tweaking, and maintaining models for production.

Within this framework, the DataOps engineer knows how to maintain a sufficiently accurate model score over time as the data drifts. They also know when to recycle the model or reconceptualize it, although that job falls to the data scientist.

The DataOps engineer keeps the models running within budget and resource constraints which he probably understands better than anyone else on the team.

The data analyst

In a modern organization, the data analyst can have a wide range of skills, ranging from technical knowledge to aesthetic understanding of visualization to so-called soft skills, such as collaboration. They are also less likely to have extensive technical training compared to, say, a database developer.

Their ownership of data – and their influence – may depend less on their position in the organizational hierarchy and more on their personal commitment and willingness to take ownership of an issue.

These people are in all departments. Look around you. Someone is the “data steward”, who, regardless of their job title, knows where the data is, how to use it, and how to present it effectively.

To be fair, this role is becoming more formalized today, but there are still a large number of data analysts who grew up in the role from a business background rather than a technical one.

The Executive Sponsor

Is the executive sponsor part of the team? Maybe not directly, but the team won’t get far without her. A C-level sponsor can be essential in aligning the specific work of a DataOps team with the company’s strategic vision and tactical decisions. They can also make sure the team has a budget and resources with long-term goals.

The difference between DevOps and DataOps

Adapt the team to the organization

Few organizations can, or will, immediately build a team of four or more just for DataOps. The capabilities and value of the team must grow over time.

How, then, should a team grow? Who should be the first employee? It all depends on where the organization starts. But there has to be an executive sponsor from day zero.

The team is unlikely to start from scratch. Organizations need DataOps precisely because they already have work in progress that needs to be better operationalized. They may have started looking at DataOps because they have data scientists pushing the boundaries of what they can handle today.

If so, the first employee should be a DataOps engineer, as it is their role to operationalize data science and make it manageable, scalable, and comprehensive enough to be mission critical.

On the other hand, it is possible that an organization has a traditional data warehouse, and there are data engineers involved and data analysts downstream. In this case, the first position on the DataOps team would be a data scientist for advanced analytics.

An important question is whether to create a formal organization or a virtual team. This is another important reason for the executive sponsor, which may have a lot to say in the answer. Many DataOps teams start as virtual groups that work across organizational boundaries to ensure that data and data streams are reliable and trustworthy.

Whether loosely or tightly organized, these discrete disciplines grow in strength and impact over time, and their strategic focus and use of resources will be cohesive within a cohesive framework of exploration and delivery. When this happens, the organization can add more engineering for scale and governance and more scientists and analysts for insight. At this point, regardless of where the organization started, the team is likely to become more formally organized and recognized.

It is an exciting process. The DataOps team can be the difference between a company that sometimes does cool stuff with data and a company that runs efficiently and reliably on data, analytics and insights.