A malicious repository on Hugging Face managed to climb to the top of the platform’s trending list after impersonating a legitimate model released by OpenAI, exposing a growing risk within the artificial intelligence supply chain ecosystem. The project, titled Open OSS privacy filter, mimicked OpenAI’s official Privacy Filter open weight model and replicated its description to appear authentic, ultimately tricking users into downloading a harmful payload. Before it was taken down, the repository had already attracted significant attention, reaching the number one trending position with around 244,000 downloads and hundreds of user engagements within a short span. Access to the repository has since been disabled, but the incident highlights how easily threat actors can exploit trust within widely used developer platforms.
The legitimate Privacy Filter model, introduced by OpenAI in April 2026, was designed to identify and redact personally identifiable information in unstructured text, supporting stronger privacy and security integration within applications. However, the malicious version leveraged this credibility by copying its model card almost word for word while embedding hidden functionality. According to findings from HiddenLayer, the fake repository included a Python based loader that executed an information stealer targeting Windows systems. Users were instructed to clone the repository and run scripts such as a batch file or a Python loader to configure dependencies and launch the model, but these actions instead initiated a chain of malicious activity.
Once executed, the loader disabled SSL verification and decoded a Base64 encoded link hosted on a public JSON paste service, which then delivered commands for execution via PowerShell. This approach allowed attackers to dynamically modify payloads without altering the repository itself. The infection chain involved downloading additional scripts from a remote domain and escalating privileges through User Account Control prompts. It also modified Microsoft Defender settings to avoid detection and established a scheduled task to run further malicious code. The final payload functioned as an information stealer capable of capturing screenshots and extracting sensitive data from various sources, including Discord accounts, cryptocurrency wallets, browser data, system files, and configuration details. It also attempted to evade detection by identifying sandbox environments, disabling security interfaces such as Antimalware Scan Interface and Event Tracing for Windows, and ensuring it was not running in a virtual machine.
Further investigation revealed that the attack extended beyond a single repository, with at least six additional projects identified using similar techniques to deploy the same stealer. The infrastructure supporting the operation included domains used both for payload delivery and command and control communications, some of which were previously linked to other malware campaigns. In one instance, a related domain was connected to the distribution of a Windows executable tied to earlier activity involving a malicious npm package used to deliver ValleyRAT malware. Researchers noted that the high download numbers associated with the fake model may have been artificially inflated to create a sense of legitimacy and encourage adoption. The incident underscores the need for increased scrutiny when downloading machine learning models and highlights how threat actors are adapting traditional malware delivery tactics to target developers and AI practitioners through trusted platforms.
Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.