Should AI-Generated Content Be Watermarked?

Since November of 2022, the world has been captivated by ChatGPT, the artificial intelligence chatbot created by OpenAI. ChatGPT’s meteoric rise in popularity – reaching 100 million users in just two months – has brought attention to generative AI in general. As you would expect, generative AI is capable of creating text, images, and other forms of content in seconds that some consider indistinguishable from what a human would create.

Unsurprisingly, generative AI has been hailed as both a productivity enhancer and a job destroyer, sometimes simultaneously. It has raised serious issues in academia, with some people suggesting that the college essay has become obsolete. In addition, AI’s generative capabilities may fundamentally change the way white collar professionals work and may even threaten their jobs.

Should Readers Know Whether Content is AI-Generated?

One issue that appears to surface regularly in the conversations around AI is whether people should know whether content was created by a human or AI. Knowing the provenance of content seems like a fair request, especially if an individual is relying on the information for a serious matter such as their health, financial well-being, or safety.

One solution that has been thrown around is the idea of watermarking AI content, allowing people and search engines to recognize it as such. In fact, at Google I/O , the company said that it would voluntarily watermark images created by its generative AI so that people could spot fakes. Microsoft made a similar announcement a few weeks later.

Inaccurate and Fake Content Can Have Real World Effects

It is becoming more and more clear that misleading content generated by AI can have real world effects – and cause real word harm. For example, on May 22 of this year, a false report of an explosion at the Pentagon accompanied by an image likely generated by AI caused a significant dip in the stock market.

Similarly, many experts consider content containing misinformation to pose a risk to elections. Speaking at a World Economic Forum event earlier this year, Microsoft’s chief economist, Micahel Schwartz cautioned that “Before AI could take all your jobs, it could certainly do a lot of damage in the hands of spammers, people who want to manipulate elections.”

Bad actors could generate misinformation at a scale never seen before in the form of social media posts, fake news stories, fake images, and even deep fake videos of candidates that are indistinguishable from reality.

Perhaps most troublingly, some observers think that the rise of generative AI risks a future of human incompetence. What does the world look like if all we have to do to demonstrate competence is to ask an AI to do it for us? As put by US DOJ National Security & Cybercrime Coordinator Matt Cronin recently in The Hill:

For even the most brilliant minds, mastering a domain and deeply understanding a topic takes significant time and effort. While ultimately rewarding, this stressful process risks failure and often takes thousands of hours. For the first time in history, an entire generation can skip this process and still progress (at least for a time) in school and work. They can press the magic box and suddenly have work product that rivals the best in their cohort. That is a tempting arrangement, particularly since their peers will likely use AI even if they do not.

Like most Faustian bargains, however, reliance on generative AI comes with a hidden price. Every time you press the box, you are not truly learning — at least not in a way that meaningfully benefits you. You are developing the AI’s neural network, not your own.

Cronin argues that incompetence will increase over time as we use AI, comparing using it to having someone else work out for you and expecting to get fit as a result.

Consider a hypothetical generation of surgeons who have been raised on AI and suddenly do not have internet access – do you want them operating on you? Do you want a lawyer who got through law school learning how to correctly “prompt” AI representing you in court? Of course, for most of us, the answer is “no.”

The fact is that generative AI allows people to seemingly demonstrate knowledge or expertise they do not have. While this clearly presents an issue in academia, where students are expected to demonstrate knowledge in writing assignments, it also raises an issue as to whether consumers can trust that knowledge-based professionals like lawyers, physicians, and mental health providers actually possess the skills they claim to have in their website content.

What Does Watermarking AI-Generated Content Look Like?

You are probably already familiar with the idea of watermarking as it relates to visual content. For an example, go to iStock and see how they display the pictures they have for sale. In order to prevent you from simply right-clicking and saving the image to your desktop, each image has “iStock by Getty Images” superimposed on top of it.

Google is taking watermarking AI-generated images a step further and embedding data that will mark them as AI-generated. In a May 10th blog post on The Keyword, Google explained that:

“. . .as we begin to roll out generative image capabilities, we will ensure that every one of our AI-generated images has a markup in the original file to give you context if you come across it outside of our platforms. Creators and publishers will be able to add similar markups, so you’ll be able to see a label in images in Google Search, marking them as AI-generated. You can expect to see these from several publishers including Midjourney, Shutterstock, and others in the coming months.

Watermarking Content Presents Special Challenges

Of course, watermarking AI-generated text would be different from watermarking images. One idea that has been discussed by AI-creators like OpenAI and other stakeholders is the idea of cryptographic watermarking. This type of watermarking involves embedding a pattern or code into the text in a way that allows software to detect whether content is generated by AI.

Hany Farid, a Professor of Computer Science at the University of California, Berkeley, recently explained how watermarking text may work in a piece for GCN:

Generated text can be watermarked by secretly tagging a subset of words and then biasing the selection of a word to be a synonymous tagged word. For example, the tagged word “comprehend” can be used instead of “understand.” By periodically biasing word selection in this way, a body of text is watermarked based on a particular distribution of tagged words. This approach won’t work for short tweets but is generally effective with text of 800 or more words depending on the specific watermark details.

This idea has gained traction in many circles. Professor Farid believes that all AI-generated content should be watermarked, as does Matt Cronin (mentioned earlier in this article). Additionally, Fedscoop’s Nihal Krishan reports that Deputy National Security Adviser for Cyber and Emerging Technology met privately with tech executives at the RSA Conference – including those from OpenAI and Microsoft – and urged them to consider watermarking any content their AI models generate.

Conclusion

While the future of AI-content watermarking remains unclear, what is clear is that generative AI can pose risks to individuals as well as society as a whole. Misinformation has been a problem before, but the difference now is the scale and speed with which it can be produced.

One way to handle the issue would be for AI companies to watermark all of the content they create so that everyone has a clear idea of its provenance. This would allow for the use of AI in academia without the fear of an incompetent workforce, the use of AI in journalism without eroding the public trust, and the use of AI in marketing with transparency.

In light of the risks posed by the proliferation of AI-generated content and the potential erosion of human competence, watermarking provides a practical measure to ensure transparency and accountability. By implementing watermarking practices, content creators and publishers can contribute to a more informed and discerning society, enabling individuals to make better decisions based on the origin and authenticity of the content they encounter.