Падение умственных способностей ИИ: как соцсети вносят хаос в модели машинного обучения Headline: The Decline of AI Intelligence: How Social Media Introduces Chaos into Machine Learning Models

Subpar content leads to the degradation of large language models (LLMs). This conclusion was reached by researchers from the University of Texas and Purdue University.

The scientists fed a selection of viral posts from X to four popular AI models over the course of a month and noted the following changes:

The effect intensified in proportion to the amount of low-quality data. Remarkably, even after retraining on clean, high-quality content, it was not possible to fully eliminate cognitive biases.

In the course of the experiment, the authors proposed and tested the «decay hypothesis for AI models.» This posits that continuous exposure to «garbage» information results in persistent deterioration of large language models.

To identify low-grade content, the researchers developed two metrics:

By maintaining a consistent number of tokens and training operations, the results indicated that continuous fine-tuning of four LLMs on a low-quality dataset, compared to a control group, led to declines in reasoning capabilities, long text comprehension, and safety.

Gradually mixing the «garbage» dataset with the control group also caused a drop in cognitive abilities. For example, with M1, as the share of low-quality data increased from 0% to 100%, the result on the ARC-Challenge decreased from 74.9 to 57.2, while on RULER-CWE it fell from 84.4 to 52.3.

The models also exhibited a decrease in ethical consistency. The researchers noted that AI subjected to low-quality data became less reliable and more overconfident in their incorrect responses.

LLMs began to skip logical steps in reasoning, providing superficial results instead of detailed explanations.

The researchers urged AI developers to systematically monitor the cognitive health of their models and recommended three key steps:

They stated that such measures are essential to prevent significant damage—currently, models continue to be trained on data from the internet. Without proper oversight, AI risks inheriting biases from generative content, thus initiating a cycle of degradation.

Furthermore, experts from NewsGuard identified a tendency in Sora 2 from OpenAI to create deepfakes.