Data Governance and AI Model Drift

February 5, 2024
Data Governance and AI Model Drift

By Ronald Zurawski, Prolifics Data Governance Strategist and Solution Architect 

One of the new terms spreading into Data Governance vernacular is “Drift.” How do we, as Data Governance Specialists, begin to adjust our Data Governance programs to begin to accommodate drift and the possible regulatory and compliance issues that we may face? 

First, let’s take a look at what “Drift” is. Let’s say the developers have put together an AI data model and have trained it to do a specific task. As time passes, new input may need to be added for the model to evaluate. For example, in the Health Insurance industry, claims are evaluated against IDC-10 codes. These codes have been known to change over time. If developers simply add the new codes to the training set and place the model back into learning mode, how will the future output of the model change? It will change. This change is called drift. The business will need to evaluate if this drift is good or bad for the intended purpose. 

From a Data Governance perspective, we might profile the output data from the AI Model and then work with the SMEs (subject matter experts) and data owners to determine if the results are acceptable. Going back to our Health Insurance industry example, some of the IDC-10 codes may be added. Some may be deprecated. We know that smaller providers may not update their system immediately, yet some claims may need to be forwarded to entities for reimbursement, like the government.  

How do we as Data Governance specialists interact with these AI Models? Do we mandate that AI Model Code and Data Training Sets be secured and archived before the model is placed into production? Will this become a regulatory requirement? Do we laboriously go through multiple iterations of large output results from the AI Model and confirm Drift has not caused the model to stray from its initial accuracy or purpose? 

In the past our software, OS and any underlying components of the infrastructure have been built to always produce the same result. We were assured that an If-Then statement would return the same result every time and could build test systems that confirmed this. Now we are working with interactions between numerical numbers. If we were to change the order the underlying AI Model evaluates its neural network, can we guarantee the same results? 

AI Drift is opening up a whole new set of challenges for Data Governance. We will need to take those lessons learned from Glossary builds, Data Quality assessments and Data Lineage pulls, applying them to the whole new world of AI. This is going to be fun! 

 

Ronald Zurawski

Ron is Data Governance Strategist and Solution Architect at Prolifics. His experience includes more than 10 years of working policy-driven data governance and more than 20 years in enterprise database and data warehousing systems. His industry expertise includes finance, health care and consumer product goods. Ron’s expertise is in strategic planning, systems architecture, program and project management. His tactical experience includes analytics development, architecture, ETL and database administration. Ron has experience in both Big 4 and boutique professional services organizations. Ron holds an MBA from the University of California and an MSCS from the University of
Colorado.