Does data lineage matter?

by | Jul 13, 2022

From the humble beginnings of ARPANET, the development of the computer and the internet has revolutionized the ways in which we perceive the world. As the endless deluge of information streams across our screens, one can’t help but feel overwhelmed. Indeed, one of the challenges brought on by the Information Age is the exponential growth of data. In 2020, the total amount of data created and consumed by humans was estimated to have reached around 60 Zettabytes. That’s 60 Billion Terabytes! By some estimates, this figure will more than triple in the next 5 years!

Like the rest of us, businesses are increasingly finding themselves drowning in a sea of data. Technological advancements have allowed both for the capture of more data and the creation of new sources of data. In the modern utilities industry, for example, the rising number IoT sensors, smart assets, and DERs (Distributed Energy Resources) contribute to a never-ceasing torrent of information.

The potential for chaos is significant. Without data lineage, we run the risk of bad data. Similar to a game of “Telephone” where a message gets progressively more garbled as it passes from person to person, the same can happen in an enterprise with poor data management. As the data travels and morphs through an enterprise’s systems and applications, how does a business user understand and trust the data? Can they quickly determine its origin and history? These questions are at the heart of what data lineage seeks to address.

What is data lineage?

Simply put, data lineage is ability to track the lifecycle of data. Where did it come from? Where is it going? What is changing the data, and how is it changing? By capturing metadata along the way, we can understand the full data flow. A modern lineage implementation will allow the user to monitor, collect, analyze, and visualize the data as it travels through the processes of an enterprise while also managing the complete data lifecycle.

 

How do I benefit?

Because data lineage keeps tracks of the origin, destination, and intermediate transformation logic, it allows users to trace both upstream and downstream and gain a better vantage point of the entire story behind the data.

Data quality

Data is not useful unless it can be trusted. By knowing where the data came from and how it was transformed, businesses can have increased confidence in its meaning and validity. This then in turn helps to optimize business systems and processes, enabling smarter strategic decisions.

Data presentation

Being able to run analytics and report on the data is critical for a multitude of reasons such as compliance, auditing, risk management, error tracking, and discovery.

Impact analysis

Data requirements and processes evolve over time. When a change or data migration occurs, how do we reconcile it with the existing data that may already be in use by other systems? By tracing the downstream processes, data lineage can easily provide this insight and help minimize the risk of the change.

Governance

By providing the ability to navigate, query, and focus on the data and metadata, lineage tools enable more effective data governance and gives a better view and understanding of the enterprise processes.

Implementation considerations

There’s likely some tradeoff when implementing lineage on un-managed or poorly-managed data. The truest form of lineage requires parsing through the transformation logic between systems. While the result of this process is a reliable data map, it is nevertheless often an expensive and complex undertaking that requires significant manual effort.

On the other end of the spectrum, rules and pattern matching can be employed to implement automated lineage analysis. The result is likely to be less accurate and the understanding not as deep, but the reduced cost and effort may be a worthwhile compromise. Advancements in machine learning can further boost the accuracy of such a process and produce a better outcome.

Xtensible Solutions recognizes these rising challenges that organizations are facing and are actively making advancements in this area through Affirma incorporating lineage to deliver both useful and reliable data.

Interested in learning more about data lineage and data management practices? Contact us at Sales@Xtensible.net to learn more.

John Wang is a Senior Developer at Xtensible.