AI search could break the web

Oct 31, 2024

In late October, News Corp filed a lawsuit against Perplexity AI, a popular AI search engine. At first glance, this might seem unremarkable. After all, the lawsuit joins more than two dozen similar cases seeking credit, consent, or compensation for the use of data by AI developers. Yet this particular dispute is different, and it might be the most consequential of them all. At stake is the future of AI search—that is, chatbots that summarize information from across the web. If their growing popularity is any indication, these AI “answer engines” could replace traditional search engines as our default gateway to the internet. While ordinary AI chatbots can reproduce—often unreliably—information learned through training, AI search tools like Perplexity, Google’s Gemini, or OpenAI’s now-public SearchGPT aim to retrieve and repackage information from third-party websites. They return a short digest to users along with links to a handful of sources, ranging from research papers to Wikipedia articles and YouTube transcripts. The AI system does the reading and writing, but the information comes from outside. At its best, AI search can better infer a user’s intent, amplify quality content, and synthesize information from diverse sources. But if AI search becomes our primary portal to the web, it threatens to disrupt an already precarious digital economy. Today, the production of content online depends on a fragile set of incentives tied to virtual foot traffic: ads, subscriptions, donations, sales, or brand exposure. By shielding the web behind an all-knowing chatbot, AI search could deprive creators of the visits and “eyeballs” they need to survive. If AI search breaks up this ecosystem, existing law is unlikely to help. Governments already believe that content is falling through cracks in the legal system, and they are learning to regulate the flow of value across the web in other ways. The AI industry should use this narrow window of opportunity to build a smarter content marketplace before governments fall back on interventions that are ineffective, benefit only a select few, or hamper the free flow of ideas across the web. Copyright isn’t the answer to AI search disruption News Corp argues that using its content to extract information for AI search amounts to copyright infringement, claiming that Perplexity AI “compete[s] for readers while simultaneously freeriding” on publishers.That sentiment is likely shared by the New York Times, which sent a cease-and-desist letter to Perplexity AI in mid-October. In some respects, the case against AI search is stronger than other cases that involve AI training. In training, content has the biggest impact when it is unexceptional and repetitive; an AI model learns generalizable behaviors by observing recurring patterns in vast data sets, and the contribution of any single piece of content is limited. In search, content has the most impact when it is novel or distinctive, or when the creator is uniquely authoritative. By design, AI search aims to reproduce specific features from that underlying data, invoke the credentials of the original creator, and stand in place of the original content. Even so, News Corp faces an uphill battle to prove that Perplexity AI infringes copyright when it processes and summarizes information. Copyright doesn’t protect mere facts, or the creative, journalistic, and academic labor needed to produce them. US courts have historically favored tech defendants who use content for sufficiently transformative purposes, and this pattern seems likely to continue. And if News Corp were to succeed, the implications would extend far beyond Perplexity AI. Restricting the use of information-rich content for noncreative or nonexpressive purposes could limit access to abundant, diverse, and high-quality data, hindering wider efforts to improve the safety and reliability of AI systems. Governments are learning to regulate the distribution of value online If existing law is unable to resolve these challenges, governments may look to new laws. Emboldened by recent disputes with traditional search and social media platforms, governments could pursue aggressive reforms modeled on the media bargaining codes enacted in Australia and Canada or proposed in California and the US Congress. These reforms compel designated platforms to pay certain media organizations for displaying their content, such as in news snippets or knowledge panels. The EU imposed similar obligations through copyright reform, while the UK has introduced broad competition powers that could be used to enforce bargaining. In short, governments have shown they are willing to regulate the flow of value between content producers and content aggregators, abandoning their traditional reluctance to interfere with the internet.However, mandatory bargaining is a blunt solution for a complex problem. These reforms favor a narrow class of news organizations, operating on the assumption that platforms like Google and Meta exploit publishers. In practice, it’s unclear how much of their platform traffic is truly attributable to news, with estimates ranging from 2% to 35% of search queries and just 3% of social media feeds. At the same time, platforms offer significant benefit to publishers by amplifying their content, and there is little consensus about the fair apportionment of this two-way value. Controversially, the four bargaining codes regulate simply indexing or linking to news content, not just reproducing it. This threatens the “ability to link freely” that underpins the web. Moreover, bargaining rules focused on legacy media—just 1,400 publications in Canada, 1,500 in the EU, and 62 organizations in Australia—ignore countless everyday creators and users who contribute the posts, blogs, images, videos, podcasts, and comments that drive platform traffic. Yet for all its pitfalls, mandatory bargaining may become an attractive response to AI search. For one thing, the case is stronger. Unlike traditional search—which indexes, links, and displays brief snippets from sources to help a user decide whether to click through—AI search could directly substitute generated summaries for the underlying source material, potentially draining traffic, eyeballs, and exposure from downstream websites. More than a third of Google sessions end without a click, and the proportion is likely to be significantly higher in AI search. AI search also simplifies the economic calculus: Since only a few sources contribute to each response, platforms—and arbitrators—can more accurately track how much specific creators drive engagement and revenue. Ultimately, the devil is in the details. Well-meaning but poorly designed mandatory bargaining rules might do little to fix the problem, protect only a select few, and potentially cripple the free exchange of information across the web. Industry has a narrow window to build a fairer reward system However, the mere threat of intervention could have a bigger impact than actual reform. AI firms quietly recognize the risk that litigation will escalate into regulation. For example, Perplexity AI, OpenAI, and Google are already striking deals with publishers and content platforms, some covering AI training and others focusing on AI search. But like early bargaining laws, these agreements benefit only a handful of firms, some of which (such as Reddit) haven’t yet committed to sharing that revenue with their own creators. This policy of selective appeasement is untenable. It neglects the vast majority of creators online, who cannot readily opt out of AI search and who do not have the bargaining power of a legacy publisher. It takes the urgency out of reform by mollifying the loudest critics. It legitimizes a few AI firms through confidential and intricate commercial deals, making it difficult for new entrants to obtain equal terms or equal indemnity and potentially entrenching a new wave of search monopolists. In the long term, it could create perverse incentives for AI firms to favor low-cost and low-quality sources over high-quality but more expensive news or content, fostering a culture of uncritical information consumption in the process.Instead, the AI industry should invest in frameworks that reward creators of all kinds for sharing valuable content. From YouTube to TikTok to X, tech platforms have proven they can administer novel rewards for distributed creators in complex content marketplaces. Indeed, fairer monetization of everyday content is a core objective of the “web3” movement celebrated by venture capitalists. The same reasoning carries over to AI search. If queries yield lucrative engagement but users don’t click through to sources, commercial AI search platforms should find ways to attribute that value to creators and share it back at scale. Of course, it’s possible that our digital economy was broken from the start. Subsistence on trickle-down ad revenue may be unsustainable, and the attention economy has inflicted real harm to privacy, integrity, and democracy online. Supporting quality news and fresh content may require other forms of investment or incentives. But we shouldn’t give up on the prospect of a fairer digital economy. If anything, while AI search makes content bargaining more urgent, it also makes it more feasible than ever before. AI pioneers should seize this opportunity to lay the foundations for a smart, equitable, and scalable reward system. If they don’t, governments now have the frameworks—and confidence—to impose their own vision of shared value. Benjamin Brooks is a fellow at the Berkman Klein Center at Harvard scrutinizing the regulatory and legislative response to AI. He previously led public policy for Stability AI, a developer of open models for image, language, audio, and video generation. His views do not necessarily represent those of any affiliated organization, past or present.

https://www.technologyreview.com/2024/10/31/1106504/ai-search-could-break-the-web/

Respond, make new discussions, see other discussions and customize your news... Log in.

Respond, make new discussions, see other discussions and customize your news...
Log in.