Product
Mar 27, 2025

Frustrated by Model Refusals? Your Users are Too.

Frustrated by Model Refusals? Your Users are Too.

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

While working with enterprises to deploy AI-powered tools, we've found that poorly designed guardrails can lead to high rates of incorrect refusals, driving end users away from otherwise valuable AI applications.

Many companies implement AI guardrails to protect against risks, only to discover that their end users are constantly being blocked by false flags and model refusals. Users feel like they're talking to a brick wall, and may churn from the AI application altogether. This happens because when companies first deploy an AI use case, they often implement broad guardrails that fail to capture the nuances of real-world LLM interactions.

At Dynamo AI, we've developed a three-pronged approach to optimize guardrails and significantly reduce false refusals while maintaining appropriate safety boundaries: finetuning on golden datasets, continuous guardrail improvement through human-in-the-loop monitoring, and thresholding.

In this article, we will refer to guardrail false positives and incorrect refusals as the same, since when a guardrail incorrectly flags and blocks a prompt, it will result in the end user experiencing a refusal message, such as “Sorry, I am unable to help with this request.”

Fine-Tuning on Golden Datasets to Align Guardrail Behavior

Many guardrail providers provide their own generic definitions of toxicity to deploy, with no ability to customize what they will flag. Using a non-customizable, generic guardrail means enterprise users have no ability to fine-tune to their specific use case. This often leads to high false positive rates, since an enterprise definition of acceptable content might differ significantly from these standard guardrail definitions. As an enterprise, you don't want to be beholden to how a third-party defines toxicity—you need the ability to customize guardrails to your specific use case.

Dynamo AI solves this by enabling enterprises to both define their own guardrail parameters using natural language and fine-tune the guardrail model on domain-specific golden datasets. We recommend fine-tuning on high-quality, manually reviewed and annotated data, typically sourced from usage logs or previously submitted user interactions. In our work with a major financial institution, we saw false positive rates from guardrails drop from 20+% to less than 2% after fine-tuning on golden datasets.

Continuous Guardrail Improvement Through Human-in-the-Loop Monitoring

After deploying a guardrail, enterprises also need to continuously audit and correct guardrail behavior to improve performance. This is important because when you initially design and implement a guardrail, you might not know all of the ways that it might be used in production or what the edge cases you may encounter in the real world, so continuous improvement allows you to adjust how the guardrail behaves once you have more user insights.

There are two primary approaches to modifying a guardrail:

  1. Adjusting the training data for fine-tuning
  2. Modifying the guardrail definition

Dynamo's guardrail monitoring platform allows users to track when guardrails flag content, then audit and correct whether the guardrail flag was correct or incorrect. This user feedback is then used to further finetune the guardrail, creating a cycle that continually improves guardrail performance over time.

Thresholding to Balance Safety and Usability

After fine-tuning and continuous improvement, enterprises can further optimize guardrail performance by adjusting the threshold, or sensitivity, of the guardrail. However, enterprises should carefully consider the tradeoff when they adjust thresholds since reducing false positive rates (incorrect refusals) will generally also lead to false negatives (when the guardrail fails to catch noncompliant content).

When implementing thresholding:

  • Determine the costs and benefits of having a lower false positive rate versus lower recall
  • Use ROC curves or similar visualizations to understand potential tradeoff
  • We commonly see customers implement thresholding by first identifying an acceptable false positive rate for your use-case. For example, if you are deploying a customer-facing chatbot into production, you may want to target a false positive rate of < 5% to avoid low customer satisfaction, then improve your guardrail until you reach a satisfactory recall to capture threats. On the other hand, if you are leveraging strong human-in-the-loop workflows in your production application (i.e. AI assisted call center agent), a higher false positive rate may be acceptable since the human can intervene when the model refuses to generate a response.

Conclusion

Despite the widespread implementation of AI guardrails, many enterprises that Dynamo AI work with struggle with high rates of false refusals that frustrate their end users. Dynamo AI's three-pronged approach - combining fine-tuning on golden datasets, continuous improvement through human-in-the-loop monitoring, and strategic thresholding - ensures that guardrails remain both effective and minimally disruptive to the user experience.

These capabilities are live on the Dynamo AI platform. If you'd like to experiment with these features in our Dynamo AI demo sandbox environment or schedule a live demo, contact us here.