Research
Aug 2, 2024

Dynamo AI Research Introduces PrimeGuard, a New Method for Improving the Safety and Quality of LLM Outputs

Our team of AI experts are proud to present PrimeGuard, a groundbreaking approach that balances LM safety and helpfulness

Dynamo AI Research Introduces PrimeGuard, a New Method for Improving the Safety and Quality of LLM Outputs

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

SAN FRANCISCO, CA – WEDNESDAY, JULY 31 – Today, Dynamo AI Research unveils a groundbreaking method called PrimeGuard, which enhances the safety and helpfulness of language models (LMs) without the need for extensive tuning.

Developed by a team of AI experts at Dynamo AI, including Blazej Manczak, Eliott Zemour, Eric Lin, and Vaikkunth Mugunthan, PrimeGuard addresses the critical issue of balancing model safety with helpfulness. Traditional Inference-Time Guardrails (ITG) often face a trade-off known as the “guardrail tax,” where prioritizing safety can reduce helpfulness, and vice versa. 

From our findings on helpfulness-safety trade-offs, we pose this research question: How can we maintain usefulness while maximizing adherence to custom safety guidelines? 

PrimeGuard (Performance Routing at Inference-time Method for Effective Guardrailing) introduces a novel approach using structured control flow to mitigate this issue. Requests are routed to different instances of the LM, each with tailored instructions, leveraging the model's inherent instruction-following and in-context learning capabilities. This tuning-free solution dynamically adjusts to system-designer guidelines for each query. 

To validate this approach, Dynamo AI introduces safe-eval, a comprehensive red-team safety benchmark consisting of 1,741 non-compliant prompts classified into 15 categories.

PrimeGuard significantly surpasses existing methods, setting new benchmarks in safety and helpfulness across various model sizes. 

For instance, applying PrimeGuard to Mixtral-8x22B has shown to:

  • Improve the proportion of safe responses from 61% to 97% 
  • Enhance average helpfulness scores from 4.17 to 4.29 compared to alignment-tuned models
  • Reduce attack success rates from 100% to 8%, demonstrating robust protection against iterative jailbreak attacks

Notably, without supervised tuning, PrimeGuard allows Mistral-7B to exceed Llama-3-8B in both resilience to automated jailbreaks and helpfulness, establishing a new standard in model safety and effectiveness.

"Together with our brilliant Dynamo ML leads, Elliot and Eric, we had the pleasure of presenting our work at the ICML 2024 Next GenAI Safety Workshop organized by safety leads at OpenAI and Google DeepMind in Vienna. This represents Dynamo AI's unique insight by operating at the forefront of the intersection of both LLM red-teaming and defense." — Blazej Manczak

Read the full PrimeGuard research paper: https://www.arxiv.org/abs/2407.16318 

See the PrimeGuard code on GitHub: https://github.com/dynamofl/primeguard 

See a PrimeGuard demo (for a limited time) on HuggingFace:https://huggingface.co/spaces/dynamoai/PrimeGuard

About Dynamo AI 

Dynamo AI enables compliant-ready generative artificial intelligence (GenAI) for the enterprise. At the forefront of the latest in machine learning development, Dynamo AI provides end-to-end solutions that evaluate closed- or open-source large language models (LLMs) for the most critical AI risks or vulnerabilities. For more information, visit https://dynamo.ai/