
By: Chris Bednar, CRM, IGP l Managing Director, IG Consulting, Knowledge Preservation, LLC
We all know the big rage within the business world over the past year plus has been the use and implications of using generative AI tools such as Microsoft Copilot and ChatGPT. As with most large scale “transformative” technologies that have come before it such as Teams/SharePoint, Blockchain (to an extent), intranets, internet, and email, many organizations are jumping in with their eyes closed without remembering the lessons learned with these other tools. In most of the rollouts of these tools, it took firms years to realize that they needed some form of lifecycle management.
Where GenAI is a bit different is that AI technologies rely on data to generate insights – and that is what is hampering many who try to run it within their corporate environments.
According to a quarterly survey of IT executives conducted by Bain & Company, many organizations find themselves unable to successfully implement a generative AI program due to multiple, related challenges such as “company data not ready” (32% of respondents), “data quality and accuracy concerns” (44%), “data security/privacy” (38%), and “lack of expertise or resources” (44%). Looking at these challenges as a whole suggests that these organizations may not have the internal expertise or bandwidth to review, clean, and tag data within their company.
History of poor information management
The old adage, “garbage in, garbage out”, applies in droves to the use of GenAI. When used in closed, corporate environments this is relatively fixable. To get the results you want, the data you feed the tool needs to be clean and accurate.
When looking at the data residing inside your organization, you’re likely to find 3 issues you may not have planned for:
There are a LOT of copies and numerous versions of documents, including email, sitting in your repositories
Depending on the nature and age of your business there may be a vast mine of data only in paper form (especially thinking about R&D/lab notebooks within the LifeScience industries)
Proper access controls may not be in place, which could lead to instances where employees could see information that should be restricted to a certain department or handful of individuals
What measures can companies take to address these issues and help ensure that your GenAI program is utilizing clean, accurate data?
Identify and classify the data in your repositories
Identify and remove ROT/junk data
Tighten up access controls
Consider a strategic approach to digitization of paper-only data (lab notebooks, batch records, tax records, membership documents, etc.)
Identify and classify your data
Your data may be classified by your IT security team as Highly Confidential, Moderately Private, or Public, but is its lifecycle managed as per legal or regulatory requirements? Is it then purged when its lifecycle has expired (or moved to an archive)? Can you develop a data map that accurately reflects the data within your organization today? This changes rapidly. The more you know about where your data is and how it’s structured, the more specifically you can point your AI capture tool.
Clean up your junk
How is your version control? Can you exclude an email thread about where to go for lunch from 2022? Are you retaining, and therefore making available to your AI capture, first drafts of documents that might have seen significant changes along its development? In fact, some early drafts may contain inaccurate information and/or came to wrong conclusions.
Tighten your access controls
Do you restrict access to certain types of data and does that restriction stay with a document no matter where it might be copied to? Do you review your restrictions periodically so that if an employee changes roles, they don’t retain access to their old department’s data? You probably don’t want the latest memo listing employees being RIFed to be available to your entire firm.
Evaluate your paper documents
Does your data scientist group want to establish historical trending analysis that goes beyond the time when this information was kept electronically? How much of your company’s intellectual property resides solely in paper lab notebooks – accessible by no-one?
Summary
Through a combination of my own experiences and the Bain team’s survey, it’s apparent that a good portion of the bottlenecks preventing organizations from successfully implementing GenAI stem from poor data management which likely includes the lack of qualified personnel to review and fix the data issues.
A properly functioning information governance program, which includes your records, data governance, IT, privacy, legal, and risk groups, is critical to ensuring that your data is accurate, complete, and continues to be managed as laws and business requirements change.
Many organizations do not have enough qualified experts on staff to take on this type of program, if any. If your company has someone with this knowledge, bring them into your circle without delay! Otherwise, this is a perfect scenario to bring in outside help.
Feel free to reach out to me directly for further dialog and/or to arrange a discussion on how we can help you solve these challenges. Also, please follow ARMA.org and related associations as we truly are THE subject matter experts in this area and are honest, caring, helpful folks 😊
Image courtesy of: bottlenek_James_Steidl_dreamstime_promo2.603566abeafa0.png (1200×630)
Comentários