China’s Ministry of Public Security warns of cyber data pollution by unsafe, questionable AI trainings
(Global Times/BRNN) -- China’s Ministry of Public Security (MPS) on Tuesday issued a safety advisory, warning that artificial intelligence (AI) training data varies widely in quality, which often contains false information, fabricated content, and biased views.
The advisory noted that if just 0.01 percent of training data includes false text, harmful AI model output can increase it by 11.2 percent.
The three core elements of AI are algorithms, computing power, and data. Among them, data is the fundamental element for training AI models and a key resource for AI applications. It provides the raw material for AI models, influences AI performance, and drives AI usage, said the MPS in an article published on its official WeChat account.
High-quality data can significantly enhance the accuracy and reliability of AI models. However, compromised data may lead to faulty decisions or even system failures, posing serious safety risks, it noted.
Studies show that even a tiny amount of false text in training data can sharply increase harmful outputs. For instance, just 0.001 percent of false text can raise harmful output by 7.2 percent, and at 0.01 percent, the increase reaches 11.2 percent, said the article.
False content generated by polluted data can be re-used into future training, creating a lasting “pollution legacy effect.” AI-generated content now far exceeds human-created content in volume, and the prevalence of low-quality, biased data will result in compounding errors in training, ultimately distorting the model’s understanding over time, it said.
Data pollution can lead to real-world risks, the ministry warned. In finance, it may trigger abnormal market fluctuations; in public safety, it can mislead public opinion and spark panic; and in health care, it may result in incorrect diagnoses, endanger lives, and promote pseudoscience, it said.
To enhance oversight and prevent data pollution at the source, China has implemented a classification and grading system for AI data, based on legislations such as the Cyber-security Law, Data Security Law, and Personal Information Protection Law.
The goal is to curb the generation of polluted data at the source and mitigate AI-related data security risks. Authorities are now enhancing risk assessments, improving safeguards for data flow, and implementing end-point correction mechanisms within a structured framework, noted the article.
(Latest Update August 6, 2025) |