As developers, our role extends beyond coding to ensuring compliance, particularly when working with Authority Documents in Governance, Risk, and Compliance (GRC) systems. The increasing use of Language Learning Models (LLMs) to analyze these documents underscores the importance of managing strict licensing requirements. JSON-LD is a crucial tool that represents licensing terms in a machine-readable format, automates compliance, and mitigates legal risks. Given the evolving legal landscape for Text and Data Mining, where there is no automatic ‘right to mine’ data, licenses must explicitly permit such activities. This makes understanding and adhering to legal requirements essential to avoid the risks associated with unauthorized data usage.
One of the biggest challenges in LLM development is determining whether a model can be trained on a given dataset without explicit permission, and this complexity is heightened by licensing issues for derived models created from mined data. JSON-LD’s standardized format for reporting licensing metadata offers a crucial solution, enabling clear communication of licenses and license types. Understanding the different types of licenses—such as Public Domain, Attribution, ShareAlike, NoDerivatives, NonCommercial, Copyright, and Federated Data License—is essential for managing content legally and ethically.
The process of determining the appropriate license for each authority document begins with an API call that analyzes the document’s origin, rights statements, and usage rights. If cleared, the document is added to the corpus, tokenized, and meticulously tracked to ensure compliance. JSON-LD provides unique identifiers for licensing elements and Boolean fields for rights declarations, making terms machine-readable and enforceable, which is vital for automated systems in GRC environments.
As GRC systems scale, managing thousands of documents becomes daunting, but JSON-LD offers a scalable solution for consistent licensing management across vast datasets, ensuring robust legal safeguards. Proper license reporting is necessary for GRC compliance, and JSON-LD structures enable automated license management, streamlining processes, reducing manual oversight, and ensuring the use of only properly licensed data, making these capabilities invaluable in the fast-paced world of GRC.
Here’s the path we recommend:
1. Make a license. We have worked directly with Perkins Couie to create a Federated Data License that you can fill out and download for your business. Fill it out and route it around so that you are protected!
2. Sign up for the CCH and start learning PlantUML (a simple text-based system) to create data flow diagrams to help you understand where your data is going. To help you get started, we’ve created a diagram that walks you through questions you need to ask about your content and what you need to think about regarding your content’s usage for AI purposes. Check out the diagram and the code to create it.