Official UC blog

Data Protections and Licenses Affecting Text and Data Mining for Machine Learning

Written by Dorian C. | Dec 21, 2023 8:04:02 PM

Machine learning has become an increasingly important tool in many fields, including natural language processing, text mining, and data analysis. However, the legal implications of using machine learning on copyrighted materials are still unclear. This post will explore the current licensing standards and international laws about machine readability and the rights you have to tagged data and trained models.

Introduction

The use of machine learning in natural language processing, text mining, and data analysis has recently become increasingly popular. However, the legal implications of using machine learning on copyrighted materials are still unclear. In this post, we'll also explore the current licensing standards and international laws about machine readability and the rights you have to tagged data and trained models.

Machine Readability and Copyright

According to our paper titled "Data Protections and Licenses Affecting Text and Data Mining for Machine Learning," from which this article is drawn, it is currently unclear whether any publicly available copyright or licensing models exist that cover machine readability, including Text and Data Mining, Natural Language Processing, and Machine Learning. This means that it's unclear whether you have the right to use copyrighted materials for machine learning purposes.

However, the paper suggests that it is possible to write permissions into data and work licensing to specifically allow for machine readability of that data or those works. This means that you may be able to obtain approval for machine readability through licensing, even if existing copyright or licensing models do not currently cpver it.

Rights to Tagged Data and Trained Models

Once you have permission to use copyrighted materials for machine learning, you may wonder what rights you have to the tagged data and trained models once completed. According to the same paper, the rights to tagged data and trained models boil down to a three-fold question:

- Can the model be trained from a corpus without specific authorization in the first place?

- Which license (if any) can or must be assigned to the model (or distinct parts thereof)?

- In which cases do the license(s) of the original corpus and attributions affect the licensing of the model?

The paper suggests that current licensing standards and international laws focus on attribution and licensing of existing content, either allowing for or forbidding derivatives of the original. However, no licensing or law can prohibit transforming work creation. This means that while there may be limitations on what you can do with the original copyrighted materials, you may be able to create a transformed work using machine learning processes.

Obtaining Permission for Machine Readability

As we mentioned earlier, obtaining permission for machine readability may be possible through licensing. However, obtaining permission can be complex and may require legal expertise. Here are some steps you can take to obtain permission for machine readability:

  1. Identify the copyrighted materials you want to use for machine learning.
  2. Determine who owns the copyright to those materials.
  3. Contact the copyright owner and request permission for machine readability.
  4. Negotiate the terms of the permission, including any licensing fees or restrictions.
  5. Obtain written permission from the copyright owner.

Assigning Licenses to Tagged Data and Trained Models

Once you have permission to use copyrighted materials for machine learning, you may need to assign licenses to the tagged data and trained models you create. The paper suggests that there are several factors to consider when assigning licenses, including:

  1. The original license(s) of the copyrighted materials.
  2. The attribution requirements of the original license(s).
  3. The intended use of the tagged data and trained models.

The paper suggests that it may be possible to assign a Creative Commons license to the tagged data and trained models, allowing others to use and build upon your work as long as they give you credit. However, the specific license you choose will depend on abovementioned factors.

Creating Transformed Works

As mentioned earlier, no licensing or law can prohibit transforming work creation. This means that while there may be limitations on what you can do with the original copyrighted materials, you may be able to create a transformed work using machine learning processes.

The paper suggests that creating a transformed work using machine learning processes is a gray area of copyright law. However, the report also indicates that creating a transformed work may be protected under fair use or fair dealing statutes, which allow for using copyrighted materials for specific purposes, such as criticism, commentary, news reporting, teaching, scholarship, or research.

Where do we go from here?

The legal implications of using machine learning on copyrighted materials are still unclear, but obtaining permission for machine readability through licensing may be possible. Once you have permission, the rights to tagged data and trained models depend on various factors, including the original license(s) and attributions. While there may be limitations on what can do with the original materials, you may be able to create a transformed work using machine learning processes. The specific license you choose for the tagged data and trained models will depend on the factors mentioned above, but a Creative Commons license may be a good option.

It's important to note that obtaining permission for machine readability and assigning licenses to tagged data and trained models can be complex and may require legal expertise. If you're unsure about the legal implications of using machine learning on copyrighted materials, consulting with a lawyer specializing in intellectual property law is a good idea.

Machine learning in natural language processing, text mining, and data analysis has become increasingly popular in recent years. However, the legal implications of using machine learning on copyrighted materials are still unclear. While obtaining permission for machine readability through licensing may be possible, the rights to tagged data and trained models depend on various factors, including the original license(s) and attributions. If you're unsure about the legal implications of using machine learning on copyrighted materials, consulting with a lawyer specializing in intellectual property law is a good idea.

Here’s the path we recommend:

1. Make a license. We have worked directly with Perkins Couie to create a Federated Data License that you can fill out and download for your business. Fill it out and route it around so that you are protected!

2. Sign up for the CCH and start learning PlantUML (a simple text-based system) to create data flow diagrams to help you understand where your data is going. To help you get started, we’ve created a diagram that walks you through questions you need to ask about your content and what you need to think about regarding your content’s usage for AI purposes. Check out the diagram.