A practical approach to creative content and AI training

by oqtey
A practical approach to creative content and AI training

Artificial intelligence is accelerating progress in profound ways, reshaping everything from our daily routines to the frontiers of scientific discovery and creativity. But as AI breakthroughs accelerate, how should we promote a balanced approach to the use of creative content in training AI models? This goes beyond legal technicalities to touch on the future of AI innovation and human creativity.

Every new technology for the creation or transmission of knowledge and art, from the printing press to the internet and cable TV, has raised questions about how to create and share value. In the case of AI, developers can take a number of steps to support the creative industries and help build a thriving AI ecosystem that benefits everyone. What approaches make sense for the outputs of AI models, the training of those models, and the new ways AI can create shared value?

Assessing AI outputs

Whether words are created with a pen, a typewriter or AI, or artwork is created with a paintbrush, computer graphics or AI, the question is whether a new work infringes the copyright of an earlier one. This judgment can be complex, depending on factors like how similar the new work is to the older one, the nature of the two works and whether the new competes in the market for the original. Tools like output filters can help restrict substantially similar outputs even as models themselves learn to make more nuanced assessments of these factors.

And provenance information, like watermarks or metadata, can reduce the risk of deception about the creator of particular material. For example, Google pioneered the industry-leading SynthID tool, and has joined the steering committee of the Coalition for Content Provenance and Authenticity (C2PA). These kinds of efforts can help consumers make informed assessments about the content they see.

Training AI models responsibly

While training foundational AI models on the content available on the open web is a transformational fair use under U.S. copyright law, and many other countries have text and data mining exceptions that similarly promote new uses of information, good practices can help build acceptance of new AI uses of existing content.

It’s important to acquire content responsibly and lawfully, such as by giving web sites the ability to opt out of having content or information on their sites used for AI training. Existing industry standards governing web crawling are an important way to accomplish this. These standards are simple and scalable, and build on long-established machine-readable robot.txt protocols widely used across the web to control how their content is accessed by web crawlers. And now thousands of web publishers are also using the Google-Extended protocol and similar AI-specific protocols offered by other companies. AI developers should remain open to evolving those standards as the ecosystem progresses, and should take reasonable steps to avoid improperly training general purpose AI models in ways that circumvent those standards or similar technical measures like paywalls.

When it comes to avoiding use of individuals’ voices and likenesses, legislative frameworks can build on existing “notice-and-removal” systems for copyright, including proper safeguards to prevent abuse. New tools can also help creators harness AI’s creative potential while letting them keep control over their voice and likeness.

Sharing value, expanding opportunity

AI has the potential to benefit everyone, and collaboration between AI developers and content publishers can expand the market and generate new income for creative industries.

AI developers are looking to share the value of outputs by sending related traffic to content providers. And the ecosystem is working together to find new ways to create value from emerging AI applications. For example, there may be opportunities for commercial partnerships when AI services “ground” responses on facts from web sites.

AI developers and content publishers are also working together on new content agreements for the use of specialized or non-public data for training purposes. AI developers are increasingly learning how to assess the usefulness of individual content for different AI applications. For our part, Google has already entered into agreements with several publishers for broad data rights and we continue to explore new opportunities.

AI developers are actively working with media and creative industries to design new generative AI tools that add value to these industries. For example Pinpoint, an AI tool for journalists, helps reporters search through text, audio, image, and video files to see patterns in data, identify new angles, or find a quote in a video or audio file.

AI is a shared opportunity, with the potential to expand the realms of science, commerce and creativity. We’re committed to working with all the stakeholders in the ecosystem to create a shared framework where both creators’ rights and innovation flourish.

Related Posts

Leave a Comment