In a move towards fostering responsible development of generative AI models, Meta has launched the Purple Llama project, introducing open-source tools for developers to assess and enhance the trustworthiness and safety of AI models prior to public use.
Emphasizing the collaborative nature required for addressing AI safety challenges, Meta envisions Purple Llama as a platform to establish a shared foundation for the development of safer generative AI. The initiative responds to growing concerns surrounding the deployment of large language models and other AI technologies.
In a blog post, Meta stated, “The people building AI systems can’t address the challenges of AI in a vacuum, which is why we want to level the playing field and create a center of mass for open trust and safety.”
Gareth Lindahl-Wise, Chief Information Security Officer at cybersecurity firm Ontinue, applauded Purple Llama as a “positive and proactive” step towards safer AI. While acknowledging potential skepticism, he highlighted the benefit of improved consumer-level protection and the positive impact on the ecosystem.
Purple Llama’s collaborative efforts involve partnerships with AI developers, cloud services like AWS and Google Cloud, semiconductor companies including Intel, AMD, and Nvidia, as well as software firms like Microsoft. This collective effort aims to produce tools for both research and commercial applications, facilitating the testing of AI models’ capabilities and the identification of safety risks.
The initial release of tools under Purple Llama includes CyberSecEval, designed to assess cybersecurity risks in AI-generated software. Featuring a language model, CyberSecEval identifies inappropriate or harmful text, enabling developers to test the susceptibility of their AI models to generating insecure code or aiding cyberattacks. Meta’s research underscores the significance of continuous testing and improvement for AI security, as large language models often suggest vulnerable code.
Another tool in the Purple Llama suite is Llama Guard, a large language model trained to detect potentially harmful or offensive language. Developers can leverage Llama Guard to evaluate whether their models produce or accept unsafe content, assisting in filtering out prompts that might lead to inappropriate outputs.