11 dark secrets of data management | 7wData

11 dark secrets of data management | 7wData

Some call data the new oil. Others call it the new gold. Philosophers and economists may argue about the quality of the metaphor, but there’s no doubt that organizing and analyzing data is a vital endeavor for any enterprise looking to deliver on the promise of data-driven decision-making.

And to do so, a solid data management strategy is key. Encompassing data governance, data ops, data warehousing, data engineering, data analytics, data science, and more, data management, when done right, can provide businesses in every industry a competitive edge.

The good news is that many facets of data management are well-understood and are grounded in sound principles that have evolved over decades. For example, they may not be easy to apply or simple to comprehend but thanks to bench scientists and mathematicians alike, companies now have a range of logistical frameworks for analyzing data and coming to conclusions. More importantly, we also have statistical models that draw error bars that delineate the limits of our analysis.

But for all the good that’s come out of the study of data science and the various disciplines that fuel it, sometimes we’re still left scratching our heads. Enterprises are often bumping up to the limits of the field. Some of the paradoxes relate to the practical challenges of gathering and organizing so much data. Others are philosophical, testing our ability to reason about abstract qualities. And then there is the rise of privacy concerns around so much data being collected in the first place.

Following are some of the dark secrets that make data management such a challenge for so many enterprises.

Much of the data stored away in the corporate archives doesn’t have much structure at all. One of my friends yearns to use an AI to search through the text notes taken by call center staff at his bank. These sentences may contain insights that could help improve the bank’s lending and services. Perhaps. But the notes were taken by hundreds of different people with different ideas of what to write down about a given call. Moreover, staff members have different writing styles and abilities. Some didn’t write much at all. Some write down too much information about their given calls. Text by itself doesn’t have much structure to begin with, but when you’ve got a pile of text written by hundreds or thousands of employees over dozens of years, then whatever structure there is might be even weaker.

Good scientists and database administrators guide databases by specifying the type and structure of each field. Sometimes, in the name of even more structure, they limit the values in a given field to integers in certain ranges or to predefined choices. Even then, the people filling out the forms that the database stores find ways to add wrinkles and glitches. Sometimes fields are left empty. Other people put in a dash or the initials “n.a.” when they think a question doesn’t apply. People even spell their names differently from year to year, day to day, or even line to line on the same form. Good developers can catch some of these issues through validation. Good data scientists can also reduce some of this uncertainty through cleansing. But it’s still maddening that even the most structured tables have questionable entries — and that those questionable entries can introduce unknowns and even errors in analysis.

No matter how hard data teams try to spell out schema constraints, the resulting schemas for defining the values in the various data fields are either too strict or too loose. If the data team adds tight constraints, users complain that their answers aren’t found on the narrow list of acceptable values. If the schema is too accommodating, users can add strange values with little consistency. It’s almost impossible to tune the schema just right.

Laws about privacy and data protection are strong and are only getting stronger.

Images Powered by Shutterstock