I am reading the book The Information: A History, a Theory, a Flood by James Gleick, which recounts a dialogue written by the ancient Chinese philosopher Gongsun Long known as When a White Horse is Not a Horse:
“Horses certainly have color. Hence, there are white horses. If it were the case that horses had no color, there would simply be horses, and then how could one select a white horse? And so it follows that a horse and a white horse are different. Hence, I say that a white horse is not a horse.
Furthermore, a white horse is a horse and white, but horse is that by means of which one names the shape, and white is that by means of which one names the color. What names the color is not what names the shape. Hence, I say that a white horse is not a horse.”
“On its face, this is unfathomable,” explained Gleick, “but it begins to come into focus as a statement about language and logic. Paradoxes like this formed part of what Chinese historians called the language crisis, a running debate over the nature of language. Names are not the things they name.”
One of my favorite topics is how data is not the real world it describes. But perhaps a better data management example of how “names are not the things they name” is metadata, which Julie Hunt blogged about in her post Stumbling Over Metadata, which explored better definitions than the oversimplified “metadata is data about data.”
Metadata can be thought of as a label that provides a definition, description, and context for data. Common examples include relational table definitions and flat file layouts. More detailed examples of metadata include conceptual and logical data models.
Therefore, metadata—among its many other uses—often plays an integral role in determining your data usage. Although it’s often overlooked, there is a strong relationship between metadata and data quality, and by extension, between metadata and data-driven decision making, since a business intelligence report’s metadata often provides the framing effect for its data.
I have often witnessed what could be called the metadata crisis, a running debate within many organizations over the meaning of commonly used terms like revenue, which complicates what on the surface seem like straightforward business questions, such as how much revenue was generated during a particular fiscal reporting period.
A metadata management version of When a White Horse is Not a Horse might be When Recognized Revenue is Not Revenue.
However, the complexities of revenue recognition probably pale in comparison with the metadata crisis that can be caused by what David Loshin calls the most dangerous question in data management: What is the definition of customer?
What examples of the metadata crisis have you encountered in your organization?