May 1, 2024

Unlock Differential Privacy for >7B Parameter LLMs with DynamoEnhance

Unlock Differential Privacy for >7B Parameter LLMs with DynamoEnhance

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

Motivation: Why differential privacy (DP)?

Recent research on large language models (LLMs) have revealed that they are often prone to memorizing training and fine-tuning datasets. This weakness can be exploited by adversarial attacks, where malicious actors meticulously craft prompts to extract sensitive information from models. For enterprises developing and deploying LLMs, this risk poses a major threat to data security and privacy. To mitigate this danger, differential privacy emerges as a powerful technique, strategically injecting statistical noise during the training process to control the risk of memorization while carefully managing the tradeoff between privacy and performance.

Due to its promise, differential privacy is being thoroughly investigated by federal agencies as a powerful technique for defending against adversarial attacks and managing data leakage risks in LLMs. Just last year, the National Institute of Standards and Technology (NIST), who authored the widely used NIST AI Risk Management Framework, stated that differential privacy is currently the best known method for providing robust privacy protection against known and future attacks, even in the face of multiple data releases.

“differential privacy is currently the best known method for providing robust privacy protection against known and future attacks, even in the face of multiple data releases.” -- National Institute of Standards and Technology (NIST)

Furthermore, governments and organizations like the U.S. Census Bureau have begun adopting Differential Privacy as part of their workflow. Developing scalable DP solutions is necessary to increase the adoption of privacy-preserving machine learning.

Differential privacy has faced critical scaling bottlenecks with LLMs

Despite the promise of differential privacy in safeguarding LLMs, DP’s adoption has been fraught with challenges. The sheer magnitude of LLMs, with upwards of trillions of parameters, has posed significant hurdles for engineers. Differentially-Private Stochastic Gradient Descent (DP-SGD), the classic DP learning algorithm for neural networks, needs to compute individual, per-sample gradients. Compared to learning algorithms in the non-private setting (aka. standard neural network training), this causes slow-downs, because it loses the advantage of parallel processing provided by GPUs. Without algorithmic and hardware optimizations, this requires a large amount of GPU memory. As the previous state-of-the-art in differentially private fine-tuning struggled to accommodate models surpassing roughly 1.5 billion parameters, practitioners found themselves grappling with the limitations of throughput, resulting in excruciatingly prolonged training durations. Moreover, the memory constraints inherent in these methods have rendered training on anything other than high-end A100 GPUs (40GB, 80GB) a formidable task, leading to costly and arduous implementations for enterprise engineers.

Moreover, today’s popular differential privacy frameworks don’t effectively support larger LLM workloads. The Opacus library currently supports Distributed Data Parallel (DDP) training, but doesn’t support model sharding. DDP replicates the entire model on each GPU, which can lead to memory constraints when training large models. This limitation made it difficult or nearly impossible to efficiently train LLMs with billions of parameters using differential privacy across multiple GPUs, as the model had to fit within the memory of a single GPU. Consequently, the lack of support for model sharding in Opacus hindered the scalability and practicality of differentially private training for large-scale deep learning models.

Apply differential privacy at scale with DynamoEnhance

Bu et al. developed a new approach called DP-ZeRO to enable large-scale differentially private deep learning using the DeepSpeed library. DeepSpeed is a deep learning optimization library that uses techniques like the Zero Redundancy Optimizer (ZeRO) to improve training speed and reduce memory usage when training large models across multiple GPUs. The researchers have made a significant advancement by extending DeepSpeed to support differentially private training. Their approach has demonstrated that, with the right techniques, DeepSpeed can effectively provide the necessary injection points for implementing differential privacy.

DP-ZeRO opens up exciting opportunities for Dynamo to build upon their work and integrate scalable differential privacy in DynamoEnhance. By leveraging DeepSpeed's multi-GPU model-sharding capabilities and injecting differential privacy into the distributed training process, we aim to provide our customers with enhanced data protection and privacy while still harnessing the power of large-scale models.

This is where we come in. DynamoEnhance’s MultiGPU privacy framework, built upon the DeepSpeed library, seamlessly integrates differential privacy by exposing easy-to-use Trainers inspired by transformers and TRL(Transformer Reinforcement library) libraries.

from dynamofl.privacy import DPTrainer, PrivacyArguments

# model, tokenizer = ...
# train_dataset, eval_dataset = ...

privacy_args = PrivacyArguments(target_epsilon=1.0)
trainer = DPTrainer(

Here we set the target epsilon value in our PrivacyArguments. Epsilon is often referred to as the “privacy budget.” The lower an epsilon value is, the less of our privacy budget we’re spending, and the more noise that gets added to the gradients. As target epsilon gets bigger and we increase the privacy budget, we expend more privacy by reducing the amount of noise that gets added to gradients.

By leveraging the power of DeepSpeed and incorporating novel techniques, DynamoEnhance enables efficient and scalable training of large language models while ensuring robust privacy guarantees and allowing for bigger batch sizes. This innovative approach sets our solution apart, offering enterprise customers a unique, effective, and easy-to-use approach to safeguarding sensitive data with Differential Privacy, while harnessing the power of LLMs.

This technology allows for MultiGPU model sharding in a way previously unsupported by existing Differential Privacy libraries. Our MultiGPU Differential Privacy SDK supports training with Hugging Face, mixed precision, quantized training like BitsAndBytes, Mixture of Quantization (MoQ), LoRA Fine-Tuning, flash attention, accelerate, and more popular training libraries. We support popular LLMs like Llama-70B, Mistral-8x7B, and others.

Benefits for Enterprise Customers: Trust, Safety, and Compliance

At Dynamo AI, our priority is to empower our enterprise customers with the tools and knowledge they need to harness the potential of Differential Privacy. We provide comprehensive documentation and QuickStart guides, allowing users to quickly and easily experiment with DP fine-tuning LLMs, regardless of their technical expertise. By focusing on accessibility and usability, we seek to make privacy-enhancing technologies accessible to a broader audience — not just those with a formal background in privacy-preserving machine learning.

Contact Us

As LLMs become increasingly powerful and widely adopted, the risk of exposing sensitive information from training datasets grows. With Dynamo AI's comprehensive privacy solutions, teams can effectively measure, address, and prevent data leakage, ensuring the responsible deployment and use of LLMs while safeguarding sensitive information.

We also offer a range of AI privacy and security solutions to help you build trustworthy and responsible AI systems. To learn more about Dynamo AI and to explore our AI privacy and security offerings, please reach out to us by requesting a demo.