Skip to content

Ensuring Data Integrity in the AI Era

Over the last six months, I’ve had maybe twenty to thirty different conversations with people about...

Over the last six months, I’ve had maybe twenty to thirty different conversations with people about AI, data governance, data ownership, and protecting our data in this new landscape. You’ve probably had some of the same discussions. I’ve boiled all of them down to four key questions you’ll probably get asked and the answers you should provide.

As artificial intelligence (AI) continues to forge new frontiers in data utilization, our organizations must stay ahead in safeguarding our informational assets. Below is a detailed dialogue that elucidating the synergy between AI and data protection strategies to keep your organizational data secure and compliant.


Are AI engines possibly utilizing our company data beyond our control?

Absolutely. AI systems inherently require a breadth of data to function optimally, which could encompass a range of data types, including company-specific information. Instances might arise where your data is assimilated into AI platforms via indirect means such as web scraping or third-party collaborations. Like all first steps in multi-step programs, acknowledging this possibility is the first step towards crafting stringent data governance frameworks that ensure vigilant monitoring and stringent control over your data's journey in the AI domain.

What in the heck is ‘federated data’ and how does it apply in the AI paradigm?

'Federated data' refers to a decentralized data management scheme that allows for the collective training of AI models while data custodianship remains with the original owner. 

More specifically, knowing that at least some of your data is heading out of the barn whether you want it to or not, you will be contributing to the world of AI. Therefore, you must attempt to add levels of protection that you can control.

In the AI milieu, your organization can contribute to AI developments without forfeiting data autonomy, thus preserving confidentiality and adhering to regulatory mandates. Federated data systems epitomize a conscientious approach to AI integration, balancing innovation with data privacy.

How do different licenses intersect with federated data?

These Creative Commons (CC) license clauses offer a suite of protections for federated data:

- Attribution (BY): Mandates acknowledging your data's origin, safeguarding your brand's reputation.
- ShareAlike (SA): Encourages a symbiotic ecosystem where enhancements made to your data are reciprocally shared, fostering a community of shared progress. 
- NonDerivative (ND): Prevents modifications to your data, thus preserving its originality. 
- NonCommercial (NC): Restricts the exploitation of your data for profit without consent, protecting your commercial interests. These conditions serve as a bulwark in the stewardship of federated data, delineating its usage in an interconnected AI environment.

This means that for any data (including the prompts your staff will undoubtedly create), you must think about how you want attribution to apply once the data leaves your site and then what you want to happen with whatever data is leaving your site (the SA, ND, and NC clauses).

Once you’ve thought this through, you’ll want to apply for an extended license, such as the Common Generative License as is, or add restrictions to it.

 What are the constraints of CC licenses for AI use, and what makes a Common Generative License more apt for our organization?

While pioneering in digital content sharing, CC licenses fall short in addressing the complexities of AI, such as the creation of derivatives and data amalgamation. Enter the Common Generative License, a bespoke solution for the AI sphere that provides:

- Adaptability: Tailored to accommodate AI’s dynamic and generative essence, it allows for derivative innovation while protecting data provenance. 
- Simplicity: It simplifies licensing within AI frameworks, ensuring straightforward, comprehensible agreements for all participants. 
- Regulatory alignment: Conceived with global data protection regulations in mind, it helps maintain compliance across international AI collaborations. Opting for a Common Generative License could offer a refined and robust legal structure for your organization's data, championing innovation and AI protection.

 What do we do now?

As AI continues to evolve, it's imperative for you to preemptively address these concerns with forward-thinking policies and licenses that resonate with the nuances of AI technology. Employing visual aids such as data flow diagrams and compliance checklists can further elucidate these concepts, providing a clear roadmap for protecting your data in the AI age.

Here’s the path we recommend:

1. Make a license. We have worked directly with Perkins Couie to create a Federated Data License that you can fill out and download for your business. Fill it out and route it around so that you are protected!

2. Sign up for the CCH and start learning PlantUML (a simple text-based system) to create data flow diagrams to help you understand where your data is going. To help you get started, we’ve created a diagram that walks you through questions you need to ask about your content and what you need to think about regarding your content’s usage for AI purposes. Check out the diagram.