May 6, 2024

Integrate Explainable LLM Data Leakage Testing into your CI/CD Pipeline with DynamoEval

Integrate Explainable LLM Data Leakage Testing into your CI/CD Pipeline with DynamoEval

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

Generative AI brings new challenges to safeguarding the privacy of sensitive data, including the potential risk of LLMs memorizing and leaking Personally Identifiable Information (PII) or copyrighted data in training datasets. Despite generative AI’s novelty, regulators have highlighted that enterprises deploying generative AI will still need to adhere to the comprehensive data privacy and security requirements in existing laws and regulations. 

In the machine learning literature, researchers have long studied privacy vulnerabilities that arise from model training, such as data leakage in large language models (Carlini, et al.). These vulnerabilities can pose significant compliance, financial, and reputational risks. Regulators have emphasized the need for explainable red-teaming of these vulnerabilities and appropriate compensating controls to manage risks. DynamoEval’s test suite addresses regulator requirements for explainable risk assessments by going beyond simply detecting PII leakage. DynamoEval’s privacy testing suite generates reports that explain the conditions that make these data leakage incidents common. In this post, we show how these tests can be combined into a CI/CD pipeline with DynamoEnhance to rapidly integrate and test compensating controls and risk mitigation techniques like differential privacy to address the risk of data leakage. 

At Dynamo AI, we stay at the forefront of privacy research, rapidly integrating the latest techniques into DynamoEval. DynamoEval is our end-to-end tool that red-teams models for vulnerabilities using attacks like Membership Inference, PII extraction, PII Inference and Data Extraction, and provides comprehensive, automated reports, dashboards, and deep dive applications to identify and mitigate risks.

This blog will explore how DynamoEval employs explainable testing to tackle PII extraction and membership inference vulnerabilities in LLMs. In this specific walkthrough, we will showcase red-teaming LLMs by simulating a privacy attack recently described by Lukas et al. from Microsoft Research; we’ve productionized these attacks to offer enterprises the ability to monitor and enhance data privacy in their machine learning applications.

Evaluating Model Vulnerability to PII Extraction and Membership Inference

PII extraction attacks and Membership Inference attacks are two types of privacy attacks that can expose sensitive information in machine learning models. We’ll demonstrate how we can evaluate models against these attacks, and subsequently interpret the results. 

  • In the PII Extraction attack setting, an adversary attempts to extract sensitive pieces of information (e.g., names, addresses, phone numbers) that the model might have memorized during fine-tuning or training. 
  • In the Membership Inference attack setting, the adversary tries to infer whether or not an already-known data point was used for training.

With DynamoEval, we evaluate our models against these attacks by simulating an adversary with access to the model and a set of data records. DynamoEval runs multiple iterations of an attack using different splits of the data and hyperparameter settings. We provide detailed metrics like ROC curves, AUC scores, PII extraction, Precision, and Recall to quantify the model's vulnerability to these attacks.

PII Extraction

The PII Extraction attack measures how much PIIs are at risk of extraction by attackers, with various levels of knowledge about the training dataset. The PII extraction attack is conducted by prompting the model with a series of inputs and checking whether PII is present in the outputs. PII extraction attacks report three metrics: PII extracted, recall and precision.

PII Extracted: The number of PII successfully extracted from the model responses.

Recall: Recall measures how much PII is at risk of extraction, and is measured as the percentage of PII in the training dataset that were successfully extracted.

Precision: Precision measures an attacker’s confidence that a piece of emitted PII appears in the training dataset, and is measured as the percentage of PII emitted by the model during the attack that exists in the training dataset.

Membership Inference

The goal of the Membership Inference attack is to determine whether specific data records can be inferred to be a part of the model’s training dataset. The attack is conducted by simulating an attacker with access to the model and a dataset, of which some records were used to train the model. We simulate the attacker building a classifier that predicts whether a data record was part of the training dataset. The performance of this classifier reveals its susceptibility to membership inference attacks, which indicates how much the model exposes about the data on which it was trained.

True Positive Rate: In this attack, the true positive rate represents the percentage of data records correctly predicted to be members of the training dataset. We look at the true positive rate at a variety of low false positive rates to determine the attacker’s success in high-confidence scenarios.

ROC-AUC: In this attack, we can also use the Receiver Operating Characteristic (ROC) curve to define vulnerability, which demonstrates the performance of the attack as a tradeoff between the True Positive Rate (TPR) and False Positive Rate (FPR) at various thresholds. We can then use the Area Under the ROC Curve (AUC) to measure the aggregate performance across all thresholds. Recent research by Carlini et al. also suggests evaluating the attack’s TPR in the low FPR regime (the 3 percentages shown at the top), in order to characterize whether the attack can confidently identify members of the training set.

DynamoEval UI Walkthrough

In this section, we’re going to walk through the DynamoEval product step by step.

Curate a train/test dataset

Selecting a training and test dataset to be used for evaluating the model's privacy vulnerabilities, and specify which column contains the text data. Datasets are uploaded as CSV files. For PII extraction and Membership Inference attacks, this is typically the dataset used to fine tune the model.

Uploading model and dataset to Dynamo AI

Upload the dataset used for fine-tuning and the model to the platform

Upload your trained model and the dataset to the Dynamo AI platform, making sure to specify any relevant files like LoRA adapter configurations.

Choosing tests

Next, we’ll select the specific attack to run, such as PII Extraction or Membership Inference. In the below screenshot, you can see all the privacy attacks available on our platform.

Analyzing results

After the tests are complete, we’ll analyze the results to understand the model's vulnerability to the attacks.In this example, interpreting the ROC curve in the below figure,  the straight, gray line where X = Y indicates a random guessing baseline. The AUC, or Area Under the Curve, is a measure of the performance of a binary classifier, and it ranges from 0 to 1. An AUC of 1.0 would be perfect classification, and an AUC of 0.5 would indicate random guessing. You can see here that it is 0.77, indicating that the attacker was able to differentiate members from non-members of the dataset with a high success rate.

Here, the False Positive Rate (FPR) represents the percentage of records that are falsely identified as being members of the training dataset, and the True Positive Rate (TPR) represents the percentage of records correctly identified as being members of the training dataset. We present 3 different TPR rates for easy access. You can also look at the prompt and responses that were used in the attack in our deep dive section. We also tag the PII’s that were extracted from the model as part of the response.

Examining loss plots (Membership Inference)

We can also examine the loss distribution plots to gain insights into the model's behavior on training and testing data. Research has shown that a high degree of separation in these distributions may indicate that the model is less generalized and more susceptible to membership inference and data leakage. 

Generating Test reports

After the tests have completed successfully, we generated PDF reports containing detailed information of the attack methodology, results and suggestions to the customer for improving their models

DynamoEval SDK Walkthrough


Begin by installing the public Dynamo AI SDK, importing the required libraries and specifying the required environment variables. Create a Dynamo AI instance using your API token and host. If you do not have an API token, generate a token by logging into with your provided credentials. This API token will enable you to programmatically connect to the DynamoFL server, create projects, and evaluate models. If you generate multiple API tokens, only your most recent one will work.

from dynamofl import DynamoFL, GPUConfig, GPUType

API_KEY = "" # Add your API key here
API_HOST = "" # DFL or custom API host here

dfl = DynamoFL(API_KEY, host=API_HOST)

Create a model and dataset object

First, create a local model object. The model object specifies the model that privacy tests will be run on during the create_test method. Dynamo AI currently supports two types of model objects — local models and remote model API endpoints.Here, we demonstrate running tests on local models. A local model object can be used to upload a custom model and run penetration tests. Creating a local model object requires specifying the model file path and architecture. Currently, Dynamo AI supports penetration testing on uploaded models with “.pt” and “.bin” file formats— please confirm that your provided model file fits this formatting. In terms of model architectures, provide any valid HuggingFaceHub model id.

Low-Rank Adaptation (LoRA):
In this tutorial, we use a local model that has been fine-tuned using Low-Rank Adaptation (LoRA), a technique that that ‘freezes’ the majority of parameters in a pre-trained LLM, while fine-tuning a small number of extra model parameters, leading to a decrease in training time, compute usage, and storage costs. When using a model fine-tuned with LoRA or PEFT (parameter-efficient fine-tuning), it’s necessary to also provide the file path to the PEFT adapter configuration.

To run a privacy evaluation test, we also need to specify the dataset used for fine-tuning the model. A dataset can be created by specifying the dataset file path. Here, we also provide the dataset with a unique key and an identifying name.

model_path_dir = "<path_to_your_trained_model_file>"

# using a PEFT LoRA adapter for a lightweight model upload

model_file_path = os.path.join(model_path_dir, "adapter_model.bin")
peft_config_path = os.path.join(model_path_dir, "adapter_config.json")
model_architecture = "dynamofl-sandbox/sheared-llama-1b3"

# Creating a local model referring to a fine-tuned LLaMA 1.3B
model = dfl.create_model(
    name="Sheared LLama DP", 
print(f"Model successfully uploaded with key {model.key}.")

# Upload dataset
dataset = dfl.create_dataset(
    name="Finetuning Dataset"
# dataset id
print(f"Dataset successfully uploaded with key {dataset.key}.")

Test parameters

When configuring a test, a variety of test parameters can be configured to customize the test — including the column_name from the dataset to create prompts from, the types of PII to detect leakage for, and the model temperatures to run tests at. Test configuration parameters can be provided to the create_pii_extraction_test method. When creating a test, it is also required to provide the dataset column names in the test parameters. 

PII Classes and Entities

When configuring a PII extraction or inference attack, one of the most important hyperparameters to configure is the pii_classes parameter. This controls which types of PII the extraction attack will be run for. In addition to the predefined PII classes, leakage can also be detected for custom-defined regex entities. Custom entities can be added by defining a dictionary mapping entity names to the valid Python regex expression in the regex_expressions parameter.

regex_expressions = {
    "USERNAME": r"([a-zA-Z]+_[a-zA-Z0-9]+)",

test_info = dfl.create_pii_extraction_test(
    model_key=model.key, # previously created model identifier key
    dataset_id=dataset._id, # previously created dataset id
    pii_ref_column="text", # column name containing text to be evaluated
    gpu=GPUConfig(gpu_type=GPUType.V100, gpu_count=1), # default GPU parameters
        "temperature": [0.5, 1.0, 1.5]
    }], # test configurations

attack_info = dfl.get_attack_info(attack_id)
print("Attack status: {}.".format(attack_info))

Running the test

To run a Membership Inference privacy evaluation test, we can call the create_membership_inference_test method. Test creation will submit a test to your cloud machine-learning platform, where the test will be run. Dynamo AI currently has four types of privacy tests that measure whether a fine-tuned model has memorized data from the fine-tuned set.

  • PII Extraction whether these PII can be emitted from prompting the model naively and simulates an attacker with no knowledge of the training dataset
  • PII Inference tests whether a model can re-fill PII into sentences from the fine-tuned dataset that we redacted PII from, assuming an attacker with knowledge of the concepts and potential PII in the dataset
  • Data Extraction tests whether the model can be prompted in a manner where it reveals training data verbatim as part of the responses
  • Membership Inference determines whether specific data records can be inferred to be a part of the model's training dataset

After your test has been created, navigate to the model dashboard page in the Dynamo AI UI. Here, you should observe that your model and dataset have been created and that your test is running. After the test has completed, a test report file will be created and can be downloaded for a deep-dive into the test results!

# Upload dataset
dataset_mia = dfl.create_dataset(
    name="Finetuning Dataset"

# dataset id
print(f"Dataset successfully uploaded with key {dataset_mia.key}.")

test_info_mia = dfl.create_membership_inference_test(
    model_key=model.key, # previously created model identifier key
    dataset_id=dataset_mia._id, # previously created dataset id
    gpu=GPUConfig(gpu_type=GPUType.A10G, gpu_count=1), # another GPU configuration

Integrating DynamoEval into Your CI/CD Pipelines

Ensuring data privacy and security in machine learning models by monitoring them in real-time is critical. By integrating DynamoEval into your development and deployment processes, you can conduct thorough testing for privacy and security vulnerabilities. It is crucial to incorporate DynamoEval as part of your post-training checks, where you scan models after training or fine-tuning. This allows you to detect any privacy leaks or compliance issues that may have arisen during the training phase. For effective CI/CD pipeline integration, automate DynamoEval scans to include them in the release phase, thereby ensuring models are checked for vulnerabilities before they are staged for deployment. Additionally, conducting a final privacy check during the deployment phase is vital to safeguard against deploying models with vulnerabilities. Making DynamoEval scans a routine part of the CI/CD pipelines enables you to proactively safeguard your models against privacy risks, thereby maintaining trust and compliance throughout your operations.

Actionable Insights and Mitigation Strategies

DynamoEval not only identifies potential privacy vulnerabilities but also provides actionable insights to mitigate these risks. Based on the evaluation results, the platform offers recommendations on how to improve the model's privacy protection measures. Considering the obtained AUC score of 0.77 in our example, which indicates a significant vulnerability to membership inference attacks, our next steps would be to remediate this risk. The utilization of techniques such as differential privacy during model training can effectively mitigate this vulnerability. Our evaluation demonstrates that employing a differentially private model significantly reduces the AUC, underscoring the efficacy of differential privacy in enhancing privacy protection. In addition to these measures, implementing non-aggressive PII scrubbing techniques that preserve data relationships and uniqueness while minimizing the risk of leakage can further bolster privacy protection efforts.

Finally, leveraging DynamoGuard, our real-time privacy guardrail product, can offer additional layers of protection against data leakage by detecting and redacting PII in real time. Combining both model-level and infrastructure-level privacy measures, can substantially enhance the overall privacy posture of machine learning applications.

Contact Us

As LLMs become increasingly powerful and widely adopted, the risk of exposing sensitive information from training datasets grows. With Dynamo AI's comprehensive privacy solutions, teams can effectively measure, address, and prevent data leakage, ensuring the responsible deployment and use of LLMs while safeguarding sensitive information.

We also offer a range of AI privacy and security solutions to help you build trustworthy and responsible AI systems. To learn more about Dynamo AI and to explore our AI privacy and security offerings, please reach out to us by requesting a demo.