A few weeks ago, when I was at the digital rights conference RightsCon in Taiwan, I watched in real time as civil society organizations from around the world, including the US, grappled with the loss of one of the biggest funders of global digital rights work: the United States government.
As I w
rote in my dispatch, the Trump administration’s shocking, rapid gutting of the US government (and its push into what some prominent political scientists call “competitive authoritarianism”) also affects the operations and policies of American tech companies—many of which, of course, have users far beyond US borders. People at RightsCon said they were already seeing changes in these companies’ willingness to engage with and invest in communities that have smaller user bases—especially non-English-speaking ones.
As a result, some policymakers and business leaders—in Europe, in particular—are reconsidering their reliance on US-based tech and asking whether they can quickly spin up better, homegrown alternatives. This is particularly true for AI.
One of the clearest examples of this is in social media. Yasmin Curzi, a Brazilian law professor who researches domestic tech policy, put it to me this way: “Since Trump’s second administration, we cannot count on [American social media platforms] to do even the bare minimum anymore.”
Social media content moderation systems—which already use automation and are also experimenting with deploying large language models to flag problematic posts—are failing to detect gender-based violence in places as varied as India, South Africa, and Brazil. If platforms begin to rely even more on LLMs for content moderation, this problem will likely get worse, says Marlena Wisniak, a human rights lawyer who focuses on AI governance at the European Center for Not-for-Profit Law. “The LLMs are moderated poorly, and the poorly moderated LLMs are then also used to moderate other content,” she tells me. “It’s so circular, and the errors just keep repeating and amplifying.”
Part of the problem is that the systems are trained primarily on data from the English-speaking world (and American English at that), and as a result, they perform less well with local languages and context.
Even multilingual language models, which are meant to process multiple languages at once, still perform poorly with non-Western languages. For instance, one evaluation of ChatGPT’s response to health-care queries found that results were far worse in Chinese and Hindi, which are less well represented in North American data sets, than in English and Spanish.
For many at RightsCon, this validates their calls for more community-driven approaches to AI—both in and out of the social media context. These could include small language models, chatbots, and data sets designed for particular uses and specific to particular languages and cultural contexts. These systems could be trained to recognize slang usages and slurs, interpret words or phrases written in a mix of languages and even alphabets, and identify “reclaimed language” (onetime slurs that the targeted group has decided to embrace). All of these tend to be missed or miscategorized by language models and automated systems trained primarily on Anglo-American English. The founder of the startup Shhor AI, for example, hosted a panel at RightsCon and talked about its new content moderation API focused on Indian vernacular languages.
Many similar solutions have been in development for years—and we’ve covered a number of them, including a Mozilla-facilitated volunteer-led effort to collect training data in languages other than English, and promising startups like Lelapa AI, which is building AI for African languages. Earlier this year, we even included small language models on our 2025 list of top 10 breakthrough technologies.
Still, this moment feels a little different. The second Trump administration, which shapes the actions and policies of American tech companies, is obviously a major factor. But there are others at play.
First, recent research and development on language models has reached the point where data set size is no longer a predictor of performance, meaning that more people can create them. In fact, “smaller language models might be worthy competitors of multilingual language models in specific, low-resource languages,” says Aliya Bhatia, a visiting fellow at the Center for Democracy & Technology who researches automated content moderation.
Then there’s the global landscape. AI competition was a major theme of the recent Paris AI Summit, which took place the week before RightsCon. Since then, there’s been a steady stream of announcements about “sovereign AI” initiatives that aim to give a country (or organization) full control over all aspects of AI development.
AI sovereignty is just one part of the desire for broader “tech sovereignty” that’s also been gaining steam, growing out of more sweeping concerns about the privacy and security of data transferred to the United States. The European Union appointed its first commissioner for tech sovereignty, security, and democracy last November and has been working on plans for a “Euro Stack,” or “digital public infrastructure.” The definition of this is still somewhat fluid, but it could include the energy, water, chips, cloud services, software, data, and AI needed to support modern society and future innovation. All these are largely provided by US tech companies today. Europe’s efforts are partly modeled after “India Stack,” that country’s digital infrastructure that includes the biometric identity system Aadhaar. Just last week, Dutch lawmakers passed several motions to untangle the country from US tech providers.
This all fits in with what Andy Yen, CEO of the Switzerland-based digital privacy company Proton, told me at RightsCon. Trump, he said, is “causing Europe to move faster … to come to the realization that Europe needs to regain its tech sovereignty.” This is partly because of the leverage that the president has over tech CEOs, Yen said, and also simply “because tech is where the future economic growth of any country is.”
But just because governments get involved doesn’t mean that issues around inclusion in language models will go away. “I think there needs to be guardrails about what the role of the government here is. Where it gets tricky is if the government decides ‘These are the languages we want to advance’ or ‘These are the types of views we want represented in a data set,’” Bhatia says. “Fundamentally, the training data a model trains on is akin to the worldview it develops.”
It’s still too early to know what this will all look like, and how much of it will prove to be hype. But no matter what happens, this is a space we’ll be watching.
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. ...read more read less