#5 - The why, how and what of metadata

Identify why Metadata is important and learn the best ways to utilise them

Nov 26, 2023

Hi there,

Following last week’s mini-literature review on Data Contracts, this week, let's continue with another stream of reading focusing on Metadata.

My top-of-mind voice on this topic is no one else other than Prukalpa - a veteran data leader, passionate advocate of active metadata platforms, co-founder of Atlan and author of Metadata Weekly.

This Week’s Top Picks, therefore, will organize and summarize Prukalpa’s best articles to help you grasp the most important pieces of knowledge on metadata. (Yes, it’s another mini-literature review!)

But first, let’s take a moment to reflect on This Week’s Wisdom.

This Week’s Wisdom

“New opinions are always suspected, and usually opposed, without any other reason but because they are not already common.”
― John Locke, An Essay concerning Human Understanding (1690)

Data leaders, if you are bringing new opinions and proposals to the board room, have you addressed the most common reasons that they might be opposed?

This Week’s Top Picks

Dive into the Why, How, and What of Metadata - summarised insights from Prukalpa

THE PROBLEM (WHY):

Data Governance has a serious branding problem

Traditional data governance is distant, driven by regulations, and often an afterthought. However, data governance should be celebrated, as it’s about creating a better data team rather than controlling them.

Data Documentation Woes? Here’s a Framework

Without good documentation on data, the team relies on a small number of data engineers and data owners to run the show.

The future of data catalogs

The traditional data catalog is passive, expensive, and makes it hard to drive user adoption.

THE SOLUTION (HOW):

The rise of the metadata lake

Metadata can change how data systems operate (i.e. automatic error detection, automatic data pipeline tuning).

The Gartner magic quadrant for metadata management was just scrapped.

Traditional metadata platforms are passive and don’t drive actions. The new era sees the emergence of active metadata platforms.

What is active metadata and why does it matter

Active meta-data is always on, intelligent, action-oriented, open, and accessible by default. This facilitates embedded collaboration and enhances productivity.

Use cases of active meta-data include purging unused data assets, dynamically allocating compute resources, enriching user experience in BI tools, identifying popular assets and automatically notifying downstream consumers.

The future of data catalogs

The secret to magical data experiences lies in flow (i.e. no switching context). Therefore, the future of data catalogs involves active metadata that’s embedded in the workflows of data teams.

Forrester changed the way they think about data catalogs and here’s what you need to know

A modern data catalog should be built to enable DataOps, which is a modern framework for managing the data and analytics product portfolio, as well as the provisioning of data policies and controls.

The metadata foundation that your data mesh needs

One of the key concepts behind a Data Mesh is federated computational governance, which is a system that uses feedback loops and bottom-up input from across the organization to naturally federate and govern data products. Metadata is the key to producing those feedback loops and bottom-up inputs.

THE EXECUTION (WHAT):

The anatomy of an active metadata platform

An active metadata platform consists of the metadata lake, programmable intelligence bots, embedded collaboration plugins, data process automation, and reverse metadata.

Our learnings from 3 failures over 5 years to set up a data catalog:

The real challenge lies in building relevancy into data discovery — i.e. meaningful relationships between data that enable the algorithm to discover what data was actually relevant to a user.
No one person ever has full context about data.
Designing the interface and user experience of a data tool should not be an afterthought
Embedded collaboration is about work happening where you are, with the least amount of friction.

Editor’s takeaway:

This week, I’m also quite excited to share my takeaway after diving into this topic - as it is perhaps simpler than you would expect:

“Treat data about data (metadata) the same as data about other operational processes or entities. Use it for descriptive, diagnostic, predictive, and prescriptive purposes to enhance your data products, assets, and cultures. Where possible, use data to enable automation (process optimisation), decision making, and monetisation. ”

That’s it for this week! If you enjoy the newsletter and the publication, subscribe and stay tuned for more

Data & Beyond

Discussion about this post