December 13, 2012

Open MIKE Podcast — Episode 09

December 13, 2012/ Jim Harris

Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management, which provides a comprehensive methodology that can be applied across a number of different projects within the Information Management space. For more information, click on this link: openmethodology.org/wiki/What_is_MIKE2.0

The Open MIKE Podcast is a video podcast show, hosted by Jim Harris, which discusses aspects of the MIKE2.0 framework, and features content contributed to MIKE 2.0 Wiki Articles, Blog Posts, and Discussion Forums.

Episode 09: Enterprise Data Management Strategy

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Enterprise Data Management Strategy: openmethodology.org/wiki/Enterprise_Data_Management_Strategy_Solution_Offering

Executive Overview on EDM Strategy: openmethodology.org/w/images/6/6c/Executive_Overview_on_EDM_Strategy.pdf

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

December 04, 2012

The Wisdom of Crowds, Friends, and Experts

December 04, 2012/ Jim Harris

I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.

These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.

The Wisdom of Crowds

In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation. Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.

For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott. However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.

However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.

As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.

The Wisdom of Friends

Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.

But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.

Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it. Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.

Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives. But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.

The Wisdom of Experts

Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts. Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.

As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.

Essentially, experts form the smallest crowd of wisdom. Of course, experts are not always right. At the very least, experts are not right about every one of their predictions. Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.

Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.

The Future of Wisdom

It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices. But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.

In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized. But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts. And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.

I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.

Sometimes it’s Okay to be Shallow

Word of Mouth has become Word of Data

The Wisdom of the Social Media Crowd

Data Management: The Next Generation

Exercise Better Data Management

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

The Big Data Theory

Finding a Needle in a Needle Stack

Big Data, Predictive Analytics, and the Ideal Chronicler

The Limitations of Historical Analysis

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

A Tale of Two Datas

November 29, 2012

Open MIKE Podcast — Episode 08

November 29, 2012/ Jim Harris

Episode 08: Information Lifecycle Management

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Information Asset Management: openmethodology.org/wiki/Information_Asset_Management_Offering_Group

Information Lifecycle Management: openmethodology.org/wiki/Information_Lifecycle_Management_Solution_Offering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

November 23, 2012

The Limitations of Historical Analysis

November 23, 2012/ Jim Harris

This blog post is sponsored by the Enterprise CIO Forum and HP.

“Those who cannot remember the past are condemned to repeat it,” wrote George Santayana in the early 20th century to caution us about not learning the lessons of history. But with the arrival of the era of big data and dawn of the data scientist in the early 21st century, it seems like we no longer have to worry about this problem since not only is big data allowing us to digitize history, data science is also building us sophisticated statistical models from which we can analyze history in order to predict the future.

However, “every model is based on historical assumptions and perceptual biases,” Daniel Rasmus blogged. “Regardless of the sophistication of the science, we often create models that help us see what we want to see, using data selected as a good indicator of such a perception.” Although perceptual bias is a form of the data silence I previously blogged about, even absent such a bias, there are limitations to what we can predict about the future based on our analysis of the past.

“We must remember that all data is historical,” Rasmus continued. “There is no data from or about the future. Future context changes cannot be built into a model because they cannot be anticipated.” Rasmus used the example that no models of retail supply chains in 1962 could have predicted the disruption eventually caused by that year’s debut of a small retailer in Arkansas called Wal-Mart. And no models of retail supply chains in 1995 could have predicted the disruption eventually caused by that year’s debut of an online retailer called Amazon. “Not only must we remember that all data is historical,” Rasmus explained, “but we must also remember that at some point historical data becomes irrelevant when the context changes.”

As I previously blogged, despite what its name implies, predictive analytics can’t predict what’s going to happen with certainty, but it can predict some of the possible things that could happen with a certain probability. Another important distinction is that “there is a difference between being uncertain about the future and the future itself being uncertain,” Duncan Watts explained in his book Everything is Obvious (Once You Know the Answer). “The former is really just a lack of information — something we don’t know — whereas the latter implies that the information is, in principle, unknowable. The former is an orderly universe, where if we just try hard enough, if we’re just smart enough, we can predict the future. The latter is an essentially random world, where the best we can ever hope for is to express our predictions of various outcomes as probabilities.”

“When we look back to the past,” Watts explained, “we do not wish that we had predicted what the search market share for Google would be in 1999. Instead we would end up wishing we’d been able to predict on the day of Google’s IPO that within a few years its stock price would peak above $500, because then we could have invested in it and become rich. If our prediction does not somehow help to bring about larger results, then it is of little interest or value to us. We care about things that matter, yet it is precisely these larger, more significant predictions about the future that pose the greatest difficulties.”

Although we should heed Santayana’s caution and try to learn history’s lessons in order to factor into our predictions about the future what was relevant from the past, as Watts cautioned, there will be many times when “what is relevant can’t be known until later, and this fundamental relevance problem can’t be eliminated simply by having more information or a smarter algorithm.”

Although big data and data science can certainly help enterprises learn from the past in order to predict some probable futures, the future does not always resemble the past. So, remember the past, but also remember the limitations of historical analysis.

This blog post is sponsored by the Enterprise CIO Forum and HP.

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

WYSIWYG and WYSIATI

Will Big Data be Blinded by Data Science?

Big Data el Memorioso

Information Overload Revisited

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

The Big Data Theory

Finding a Needle in a Needle Stack

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

A Tale of Two Datas

November 15, 2012

Open MIKE Podcast — Episode 07

November 15, 2012/ Jim Harris

Episode 07: Guiding Principles for the Open Semantic Enterprise

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Semantic Enterprise Guiding Principles: openmethodology.org/wiki/Guiding_Principles_for_the_Open_Semantic_Enterprise *

* Based on Mike Bergman’s article: mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise

Semantic Enterprise Composite Offering: openmethodology.org/wiki/Semantic_Enterprise_Composite_Offering

Semantic Enterprise Wiki Category: openmethodology.org/wiki/Category:Semantic_Enterprise

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

Open MIKE Podcast — Episode 04: Metadata Management

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

DQ-View: MetaData makes BettahMusic

Metadata, Data Quality, and the Stroop Test

Data Quality and the Q Test

Data and its Relationships with Quality

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

November 13, 2012

Data Silence

November 13, 2012/ Jim Harris

This blog post is sponsored by the Enterprise CIO Forum and HP.

In the era of big data, information optimization is becoming a major topic of discussion. But when some people discuss the big potential of big data analytics under the umbrella term of data science, they make it sound like since we have access to all the data we would ever need, all we have to do is ask the Data Psychic the right question and then listen intently to the answer.

However, in his recent blog post Silence Isn’t Always Golden, Bradley S. Fordham, PhD explained that “listening to what the data does not say is often as important as listening to what it does. There can be various types of silences in data that we must get past to take the right actions.” Fordham described these data silences as various potential gaps in our analysis.

One data silence is syntactic gaps, which is a proportionately small amount of data in a very large data set that “will not parse (be converted from raw data into meaningful observations with semantics or meaning) in the standard way. A common response is to ignore them under the assumption there are too few to really matter. The problem is that oftentimes these items fail to parse for similar reasons and therefore bear relationships to each other. So, even though it may only be .1% of the overall population, it is a coherent sub-population that could be telling us something if we took the time to fix the syntactic problems.”

This data silence reminded me of my podcast discussion with Thomas C. Redman, PhD about big data and data quality, during which we discussed how some people erroneously assume that data quality issues can be ignored in larger data sets.

Another data silence is inferential gaps, which is basing an inference on only one variable in a data set. The example Fordham uses is from a data set showing that 41% of the cars sold during the first quarter of the year were blue, from which we might be tempted to infer that customers bought more blue cars because they preferred blue. However, by looking at additional variables in the data set and noticing that “70% of the blue cars sold were from the previous model year, it is likely they were discounted to clear them off the lots, thereby inflating the proportion of blue cars sold. So, maybe blue wasn’t so popular after all.”

Another data silence Fordham described using the same data set is gaps in field of view. “At first glance, knowing everything on the window sticker of every car sold in the first quarter seems to provide a great set of data to understand what customers wanted and therefore were buying. At least it did until we got a sinking feeling in our stomachs because we realized that this data only considers what the auto manufacturer actually built. That field of view is too limited to answer the important customer desire and motivation questions being asked. We need to break the silence around all the things customers wanted that were not built.”

This data silence reminded me of WYSIATI, which is an acronym coined by Daniel Kahneman to describe how the data you are looking at can greatly influence you to jump to the comforting, but false, conclusion that “what you see is all there is,” thereby preventing you from expanding your field of view to notice what data might be missing from your analysis.

As Fordham concluded, “we need to be careful to listen to all the relevant data, especially the data that is silent within our current analyses. Applying that discipline will help avoid many costly mistakes that companies make by taking the wrong actions from data even with the best of techniques and intentions.”

Therefore, in order for your enterprise to leverage big data analytics for business success, you not only need to adopt a mindset that embraces the principles of data science, you also need to make sure that your ears are set to listen for data silence.

This blog post is sponsored by the Enterprise CIO Forum and HP.

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

WYSIWYG and WYSIATI

Will Big Data be Blinded by Data Science?

Big Data el Memorioso

Information Overload Revisited

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

The Big Data Theory

Finding a Needle in a Needle Stack

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

A Tale of Two Datas

October 31, 2012

Open MIKE Podcast — Episode 06

October 31, 2012/ Jim Harris

Episode 06: Getting to Know NoSQL

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Big Data Solution Offering: openmethodology.org/wiki/Big_Data_Solution_Offering

Preparing for NoSQL: openmethodology.org/wiki/Preparing_for_NoSQL

Hadoop and the Enterprise Debates: openmethodology.org/wiki/Hadoop_and_the_Enterprise_Debates

Big Data Definition: openmethodology.org/wiki/Big_Data_Definition

Big Sensor Data: openmethodology.org/wiki/Big_sensor_data

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

Data Management: The Next Generation

Is DW before BI going Bye-Bye?

Our Increasingly Data-Constructed World

Dot Collectors and Dot Connectors

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Exercise Better Data Management

A Tale of Two Datas

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Will Big Data be Blinded by Data Science?

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

Information Overload Revisited

Finding a Needle in a Needle Stack

Darth Vader, Big Data, and Predictive Analytics

Swimming in Big Data

The Big Data Theory

Big Data: Structure and Quality

Sometimes it’s Okay to be Shallow

October 23, 2012

Availability Bias and Data Quality Improvement

October 23, 2012/ Jim Harris

The availability heuristic is a mental shortcut that occurs when people make judgments based on the ease with which examples come to mind. Although this heuristic can be beneficial, such as when it helps us recall examples of a dangerous activity to avoid, sometimes it leads to availability bias, where we’re affected more strongly by the ease of retrieval than by the content retrieved.

In his thought-provoking book Thinking, Fast and Slow, Daniel Kahneman explained how availability bias works by recounting an experiment where different groups of college students were asked to rate a course they had taken the previous semester by listing ways to improve the course — while varying the number of improvements that different groups were required to list.

Counterintuitively, students in the group required to list more necessary improvements gave the course a higher rating, whereas students in the group required to list fewer necessary improvements gave the course a lower rating.

According to Kahneman, the extra cognitive effort expended by the students required to list more improvements biased them into believing it was difficult to list necessary improvements, leading them to conclude that the course didn’t need much improvement, and conversely, the little cognitive effort expended by the students required to list few improvements biased them into concluding, since it was so easy to list necessary improvements, that the course obviously needed improvement.

This is counterintuitive because you’d think that the students would rate the course based on an assessment of the information retrieved from their memory regardless of how easy that information was to retrieve. It would have made more sense for the course to be rated higher for needing fewer improvements, but availability bias lead the students to the opposite conclusion.

Availability bias can also affect an organization’s discussions about the need for data quality improvement.

If you asked stakeholders to rate the organization’s data quality by listing business-impacting incidents of poor data quality, would they reach a different conclusion if you asked them to list one incident versus asking them to list at least ten incidents?

In my experience, an event where poor data quality negatively impacted the organization, such as a regulatory compliance failure, is often easily dismissed by stakeholders as an isolated incident to be corrected by a one-time data cleansing project.

But would forcing stakeholders to list ten business-impacting incidents of poor data quality make them concede that data quality improvement should be supported by an ongoing program? Or would the extra cognitive effort bias them into concluding, since it was so difficult to list ten incidents, that the organization’s data quality doesn’t really need much improvement?

I think that the availability heuristic helps explain why most organizations easily approve reactive data cleansing projects, and availability bias helps explain why most organizations usually resist proactively initiating a data quality improvement program.

DQ-View: The Five Stages of Data Quality

Data Quality: Quo Vadimus?

Data Quality and Chicken Little Syndrome

The Data Quality Wager

You only get a Return from something you actually Invest in

“Some is not a number and soon is not a time”

Why isn’t our data quality worse?

Data Quality and the Bystander Effect

Data Quality and the Q Test

Perception Filters and Data Quality

Predictably Poor Data Quality

WYSIWYG and WYSIATI

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Data Driven — Guest Tom Redman (aka the “Data Doc”) discusses concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

October 18, 2012

Open MIKE Podcast — Episode 05

October 18, 2012/ Jim Harris

Episode 05: Defining Big Data

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Big Data Definition: openmethodology.org/wiki/Big_Data_Definition

Big Sensor Data: openmethodology.org/wiki/Big_sensor_data

Hadoop and the Enterprise Debates: openmethodology.org/wiki/Hadoop_and_the_Enterprise_Debates

Preparing for NoSQL: openmethodology.org/wiki/Preparing_for_NoSQL

Big Data Solution Offering: openmethodology.org/wiki/Big_Data_Solution_Offering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

Our Increasingly Data-Constructed World

Dot Collectors and Dot Connectors

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Exercise Better Data Management

A Tale of Two Datas

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Will Big Data be Blinded by Data Science?

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

Information Overload Revisited

Finding a Needle in a Needle Stack

Darth Vader, Big Data, and Predictive Analytics

Why Can’t We Predict the Weather?

Swimming in Big Data

The Big Data Theory

Big Data: Structure and Quality

Sometimes it’s Okay to be Shallow

Small Data and VRM

October 04, 2012

A Tale of Two Datas

October 04, 2012/ Jim Harris

Is big data more than just lots and lots of data? Is big data unstructured and not-so-big data structured? Malcolm Chisholm explored these questions in his recent Information Management column, where he posited that there are, in fact, two datas.

“One type of data,” Chisholm explained, “represents non-material entities in vast computerized ecosystems that humans create and manage. The other data consists of observations of events, which may concern material or non-material entities.”

Providing an example of the first type, Chisholm explained, “my bank account is not a physical thing at all; it is essentially an agreed upon idea between myself, the bank, the legal system, and the regulatory authorities. It only exists insofar as it is represented, and it is represented in data. The balance in my bank account is not some estimate with a positive and negative tolerance; it is exact. The non-material entities of the financial sector are orderly human constructs. Because they are orderly, we can more easily manage them in computerized environments.”

The orderly human constructs that are represented in data, in the stories told by data (including the stories data tell about us and the stories we tell data) is one of my favorite topics. In our increasingly data-constructed world, it’s important to occasionally remind ourselves that data and the real world are not the same thing, especially when data represents non-material entities since, with the possible exception of Makers using 3-D printers, data-represented entities do not re-materialize into the real world.

Describing the second type, Chisholm explained, “a measurement is usually a comparison of a characteristic using some criteria, a count of certain instances, or the comparison of two characteristics. A measurement can generally be quantified, although sometimes it’s expressed in a qualitative manner. I think that big data goes beyond mere measurement, to observations.”

Chisholm called the first type the Data of Representation, and the second type the Data of Observation.

The data of representation tends to be structured, in the relational sense, but doesn’t need to be (e.g., graph databases) and the data of observation tends to be unstructured, but it can also be structured (e.g., the structured observations generated by either a data profiling tool analyzing structured relational tables or flat files, or a word-counting algorithm analyzing unstructured text).

“Structured and unstructured,” Chisholm concluded, “describe form, not essence, and I suggest that representation and observation describe the essences of the two datas. I would also submit that both datas need different data management approaches. We have a good idea what these are for the data of representation, but much less so for the data of observation.”

I agree that there are two types of data (i.e., representation and observation, not big and not-so-big) and that different data uses will require different data management approaches. Although data modeling is still important and data quality still matters, how much data modeling and data quality is needed before data can be effectively used for specific business purposes will vary.

In order to move our discussions forward regarding “big data” and its data management and business intelligence challenges, we have to stop fiercely defending our traditional perspectives about structure and quality in order to effectively manage both the form and essence of the two datas. We also have to stop fiercely defending our traditional perspectives about data analytics, since there will be some data use cases where depth and detailed analysis may not be necessary to provide business insight.

A Tale of Two Datas

In conclusion, and with apologies to Charles Dickens and his A Tale of Two Cities, I offer the following A Tale of Two Datas:

It was the best of times, it was the worst of times.
It was the age of Structured Data, it was the age of Unstructured Data.
It was the epoch of SQL, it was the epoch of NoSQL.
It was the season of Representation, it was the season of Observation.
It was the spring of Big Data Myth, it was the winter of Big Data Reality.
We had everything before us, we had nothing before us,
We were all going direct to hoarding data, we were all going direct the other way.
In short, the period was so far like the present period, that some of its noisiest authorities insisted on its being signaled, for Big Data or for not-so-big data, in the superlative degree of comparison only.

HoardaBytes and the Big Data Lebowski

The Idea of Order in Data

The Most August Imagination

Song of My Data

The Lies We Tell Data

Our Increasingly Data-Constructed World

Plato’s Data

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Darth Vader, Big Data, and Predictive Analytics

The Big Data Theory

Finding a Needle in a Needle Stack

Exercise Better Data Management

Magic Elephants, Data Psychics, and Invisible Gorillas

Why Can’t We Predict the Weather?

Data and its Relationships with Quality

A Tale of Two Q’s

A Tale of Two G’s

September 27, 2012

Open MIKE Podcast — Episode 04

September 27, 2012/ Jim Harris

Episode 04: Metadata Management

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Information Asset Management: openmethodology.org/wiki/Information_Asset_Management_Offering_Group

Metadata Management Solution Offering: openmethodology.org/wiki/Metadata_Management_Solution_Offering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

DQ-View: MetaData makes BettahMusic

Metadata, Data Quality, and the Stroop Test

Data Quality and the Q Test

Data and its Relationships with Quality

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

September 18, 2012

Turning the M Upside Down

September 18, 2012/ Jim Harris

I am often asked about the critical success factors for enterprise initiatives, such as data quality, master data management, and data governance.

Although there is no one thing that can guarantee success, if forced to choose one critical success factor to rule them all, I would choose collaboration.

But, of course, when I say this everyone rolls their eyes at me (yes, I can see you doing it now through the computer) since it sounds like I’m avoiding the complex concepts underlying enterprise initiatives by choosing collaboration.

The importance of collaboration is a very simple concept but, as Amy Ray and Emily Saliers taught me, “the hardest to learn was the least complicated.”

The Pronoun Test

Although all organizations must define the success of enterprise initiatives in business terms (e.g., mitigated risks, reduced costs, or increased revenue), collaborative organizations understand that the most important factor for enduring business success is the willingness of people all across the enterprise to mutually pledge to each other their communication, cooperation, and trust.

These organizations pass what Robert Reich calls the Pronoun Test. When their employees make references to the company, it’s done with the pronoun We and not They. The latter suggests at least some amount of disengagement, and perhaps even alienation, whereas the former suggests the opposite — employees feel like part of something significant and meaningful.

An even more basic form of the Pronoun Test is whether or not people can look beyond their too often self-centered motivations and selflessly include themselves in a collaborative effort. “It’s amazing how much can be accomplished if no one cares who gets the credit” is an old quote for which, with an appropriate irony, it is rather difficult to identify the original source.

Collaboration requires a simple, but powerful, paradigm shift that I call Turning the M Upside Down — turning Me into We.

The Algebra of Collaboration

The Business versus IT—Tear down this wall!

The Road of Collaboration

Dot Collectors and Dot Connectors

No Datum is an Island of Serendip

The Three Most Important Letters in Data Governance

The Stakeholder’s Dilemma

Shining a Social Light on Data Quality

Data Quality and the Bystander Effect

The Family Circus and Data Quality

The Year of the Datechnibus

Being Horizontally Vertical

The Collaborative Culture of Data Governance

Collaboration isn’t Brain Surgery

Are you Building Bridges or Digging Moats?

September 13, 2012

Open MIKE Podcast — Episode 03

September 13, 2012/ Jim Harris

Episode 03: Data Quality Improvement and Data Investigation

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Enterprise Data Management: openmethodology.org/wiki/Enterprise_Data_Management_Offering_Group

Data Quality Improvement: openmethodology.org/wiki/Data_Quality_Improvement_Solution_Offering

Data Investigation: openmethodology.org/wiki/Category:Data_Investigation_and_Re-Engineering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

September 04, 2012

Cooks, Chefs, and Data Governance

September 04, 2012/ Jim Harris

In their book Practical Wisdom, Barry Schwartz and Kenneth Sharpe quoted retired Lieutenant Colonel Leonard Wong, who is a Research Professor of Military Strategy in the Strategic Studies Institute at the United States Army War College, focusing on the human and organizational dimensions of the military.

“Innovation,” Wong explained, “develops when an officer is given a minimal number of parameters (e.g., task, condition, and standards) and the requisite time to plan and execute the training. Giving the commanders time to create their own training develops confidence in operating within the boundaries of a higher commander’s intent without constant supervision.”

According to Wong, too many rules and requirements “remove all discretion, resulting in reactive instead of proactive thought, compliance instead of creativity, and adherence instead of audacity.” Wong believed that it came down to a difference between cooks, those who are quite adept at carrying out a recipe, and chefs, those who can look at the ingredients available to them and create a meal. A successful military strategy is executed by officers who are trained to be chefs, not cooks.

Data Governance’s Kitchen

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

Because of this complexity, many organizations think the only way to run data governance’s kitchen is to institute a bureaucracy that dictates policies and demands compliance. In other words, data governance policies are recipes and employees are cooks.

Although implementing data governance policies does occasionally require a cook-adept-at-carrying-out-a-recipe mindset, the long-term success of a data governance program is going to also require chefs since the dynamic challenges faced, and overcome daily, by business analysts, data stewards, technical architects, and others, exemplify today’s constantly changing business world, which can not be successfully governed by forcing employees to systematically apply rules or follow rigid procedures.

Data governance requires chefs who are empowered with an understanding of the principles of the policies, and who are trusted to figure out how to best implement the policies in a particular business context by combining rules with the organizational ingredients available to them, and creating a flexible procedure that operates within the boundaries of the policy’s principles.

But, of course, just like a military can not be staffed entirely by officers, and a kitchen can not be staffed entirely by chefs, in order to implement a data governance program successfully, an organization needs both cooks and chefs.

Similar to how data governance is neither all-top-down nor all-bottom-up, it’s also neither all-cook nor all-chef.

Only the unique corporate culture of your organization can determine how to best staff your data governance kitchen.

August 30, 2012

Open MIKE Podcast — Episode 02

August 30, 2012/ Jim Harris

Episode 02: Information Governance and Distributing Power

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

MIKE2.0 Content Featured in or Related to this Podcast

Information Governance: openmethodology.org/wiki/Information_Governance_Solution_Offering

Governance 2.0: openmethodology.org/wiki/Governance_2.0_Solution_Offering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

OCDQ Blog

Episode 09: Enterprise Data Management Strategy

MIKE2.0 Content Featured in or Related to this Podcast

The Wisdom of Crowds

The Wisdom of Friends

The Wisdom of Experts

The Future of Wisdom

Related Posts

Episode 08: Information Lifecycle Management

MIKE2.0 Content Featured in or Related to this Podcast

Related Posts

Episode 07: Guiding Principles for the Open Semantic Enterprise

MIKE2.0 Content Featured in or Related to this Podcast

Related Posts

Related Posts

Episode 06: Getting to Know NoSQL

MIKE2.0 Content Featured in or Related to this Podcast

Related Posts

Related Posts

Related OCDQ Radio Episodes

Episode 05: Defining Big Data

MIKE2.0 Content Featured in or Related to this Podcast

Related Posts

A Tale of Two Datas

Related Posts

Episode 04: Metadata Management

MIKE2.0 Content Featured in or Related to this Podcast

Related Posts

The Pronoun Test

Related Posts

Episode 03: Data Quality Improvement and Data Investigation

MIKE2.0 Content Featured in or Related to this Podcast

Data Governance’s Kitchen

Episode 02: Information Governance and Distributing Power

MIKE2.0 Content Featured in or Related to this Podcast

OCDQ Blog