Study: AI models still susceptible to propaganda

A new Estonian study has found that many AI systems can still be driven by propaganda-style requests, with results varying widely depending on the model and the host, writes ERR News.

As part of a broader study on the reliability of large language models (LLMs) in the Estonian context, researchers at the Estonian Language Institute (EKI) tested how artificial intelligence systems respond to narratives related to Kremlin information campaigns in the Estonian context. The benchmarking was conducted together with disinformation experts from Propastop, who helped identify key messages in Russian influence operations.

While LLMs often appear reliable in a neutral environment, biased or targeted requests can cause some models to make misleading statements. The main conclusion is that the models’ weaknesses only become apparent when users drive conversations with specifically colored questions or deliberately search for biased content. In some cases, the AI ​​was almost twice as likely to generate propaganda-style responses, echoing Kremlin rhetoric.

The EKI researchers tested the systems in Estonian, English and Russian, and while the higher-end models were largely resistant to manipulation in all three languages, the cheaper and more open-source models performed significantly worse when Russian was used. EKI researcher Krister Kruusma said that open-source models are the only option for many institutions, but they still fall short of meeting the needs of Estonia’s information space, and that this is a gap that needs to be addressed.

The tests found that the more expensive commercial systems were the most resistant to propaganda,

with Anthropic showing some of the best results. Other systems are more unstable. The performance of Google Gemini’s models was inconsistent, despite its ability in Estonian on other issues. Older systems, such as GPT-3.5 and GPT-4o Mini, received lower ratings, as did open-source models, such as Meta’s Llama model and the French-developed Mistral.

In the higher-rated models, the responses varied slightly depending on the language used. In the weaker systems, the differences reached as much as 15%. Arvi Tavast, director of the Estonian Institute of Information and Communication Technologies, warned that foreign troll farms could create false content that is used to distort the information provided by artificial intelligence models. He said this was a dangerous trend and called for action to maintain a balanced information space in Estonia.

Kruusma stressed that Russia is systematically trying to turn the data used to train artificial intelligence models into propaganda. Currently, a large amount of content is being created that is not intended for people at all, but for bots that search for information on online pages. However, the problem can be solved by creating better filters, evaluating data and coordinating the actions of developers.

The EKI also assessed how well the LLMs are able to navigate the Estonian language and culture. So far, the results show uneven quality and little improvement, with some of the latest models performing even worse than their previous versions. Kruusma said that smaller languages ​​are not a priority for big tech companies. He explained that even if developers wanted to use Estonian, they still depend on data available online, and Estonia has still not made enough language model training materials available.

Read also: Experts: Artificial intelligence is not the best friend