Differential Privacy API
This module provides tools for training machine learning models with strong privacy guarantees using differential privacy techniques.
Main Function
- secureml.privacy.differentially_private_train(model: Any, data: DataFrame | ndarray, epsilon: float = 1.0, delta: float = 1e-05, noise_multiplier: float | None = None, max_grad_norm: float = 1.0, framework: str = 'auto', **kwargs: Any) Any
Train a model using differential privacy.
This function wraps various differential privacy implementations to provide privacy-preserving training for machine learning models.
- Args:
model: The model architecture to train (compatible with the chosen framework) data: The training data (DataFrame or numpy array) epsilon: The privacy budget (smaller values provide stronger privacy guarantees) delta: The privacy delta parameter (smaller values provide stronger privacy) noise_multiplier: Manually set the noise multiplier instead of epsilon/delta max_grad_norm: Maximum norm of gradients for clipping framework: ML framework to use (‘pytorch’, ‘tensorflow’, or ‘auto’ to detect) **kwargs: Additional parameters passed to the underlying training function
- Returns:
The trained model
- Raises:
ValueError: If the framework is not supported or cannot be detected ImportError: If the required dependencies are not installed
This is the main function for training models with differential privacy:
from secureml.privacy import differentially_private_train
# Train a PyTorch model with differential privacy
private_model = differentially_private_train(
model=my_model,
data=training_data,
epsilon=1.0,
delta=1e-5,
max_grad_norm=1.0,
framework="pytorch",
batch_size=64,
epochs=10,
learning_rate=0.001
)
Framework Support
The module supports both PyTorch and TensorFlow as backend frameworks, and can automatically detect which framework is being used:
# Auto-detect framework (default)
private_model = differentially_private_train(
model=my_model,
data=training_data,
epsilon=0.5
)
# Explicitly specify PyTorch
private_model = differentially_private_train(
model=my_model,
data=training_data,
epsilon=0.5,
framework="pytorch"
)
# Explicitly specify TensorFlow
private_model = differentially_private_train(
model=my_model,
data=training_data,
epsilon=0.5,
framework="tensorflow"
)
Implementation Details
PyTorch Implementation (Opacus)
For PyTorch models, the module uses the Opacus library to implement differential privacy:
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from secureml.privacy import differentially_private_train
# Define a simple model
class SimpleModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.layer1 = nn.Linear(input_dim, hidden_dim)
self.layer2 = nn.Linear(hidden_dim, output_dim)
self.relu = nn.ReLU()
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
# Create model instance
model = SimpleModel(input_dim=10, hidden_dim=32, output_dim=2)
# Create some sample data
data = pd.DataFrame(...) # Your data here
# Train with differential privacy
private_model = differentially_private_train(
model=model,
data=data,
epsilon=1.0,
delta=1e-5,
batch_size=64,
epochs=10,
learning_rate=0.001,
criterion=torch.nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam,
early_stopping_patience=3,
target_column="label"
)
TensorFlow Implementation (TensorFlow Privacy)
For TensorFlow models, the module uses TensorFlow Privacy to implement differential privacy. This is run in an isolated environment to avoid dependency conflicts:
import tensorflow as tf
import pandas as pd
from secureml.privacy import differentially_private_train
from secureml.isolated_environments.tf_privacy import setup_tf_privacy_environment
# Optionally set up the TensorFlow Privacy environment in advance
setup_tf_privacy_environment()
# Define a simple model
def create_model(input_shape, num_classes):
model = tf.keras.Sequential([
tf.keras.layers.Dense(32, activation='relu', input_shape=(input_shape,)),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
return model
# Create model instance
model = create_model(input_shape=10, num_classes=2)
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Create some sample data
data = pd.DataFrame(...) # Your data here
# Train with differential privacy
private_model = differentially_private_train(
model=model,
data=data,
epsilon=1.0,
delta=1e-5,
batch_size=64,
epochs=10,
learning_rate=0.001,
target_column="label"
)
Privacy Parameters
Understanding Privacy Budget
The epsilon parameter is the privacy budget - smaller values provide stronger privacy guarantees but may reduce model utility:
epsilon=0.1: Very strong privacy guarantees, but may significantly impact model accuracy
epsilon=1.0: Good balance between privacy and utility for many applications
epsilon=10.0: Weaker privacy guarantees, but better model utility
The delta parameter represents the probability of privacy loss exceeding epsilon:
Typically set to a very small value, usually less than 1/N where N is the dataset size
Common value: delta=1e-5
Manual Noise Multiplier
Instead of specifying epsilon and delta, you can directly set the noise multiplier:
private_model = differentially_private_train(
model=model,
data=data,
noise_multiplier=1.2, # Instead of epsilon/delta
max_grad_norm=1.0,
batch_size=64,
epochs=10
)
Integration with Federated Learning
The differential privacy module can be used in conjunction with federated learning for enhanced privacy:
from secureml.federated import train_federated, FederatedConfig
# Configure federated learning with differential privacy
config = FederatedConfig(
num_rounds=3,
min_fit_clients=2,
apply_differential_privacy=True,
epsilon=1.0,
delta=1e-5
)
# Train a model using federated learning with differential privacy
federated_model = train_federated(
model=model,
client_data_fn=get_client_data,
config=config
)
Isolated Environments
TensorFlow Privacy is run in an isolated virtual environment to avoid dependency conflicts:
- secureml.isolated_environments.tf_privacy.setup_tf_privacy_environment() None
Set up the TensorFlow Privacy environment.
This function can be called explicitly to set up the environment in advance.
- Raises:
RuntimeError: If there’s an error setting up the environment
You can set up the environment in advance:
from secureml.isolated_environments.tf_privacy import setup_tf_privacy_environment
# Set up the TensorFlow Privacy environment
setup_tf_privacy_environment()
Utility Functions
- secureml.isolated_environments.tf_privacy.is_env_valid() bool
Check if the TensorFlow Privacy virtual environment is valid and ready to use.
- Returns:
True if the virtual environment is valid, False otherwise
Check if the TensorFlow Privacy environment is properly set up:
from secureml.isolated_environments.tf_privacy import is_env_valid
if is_env_valid():
print("TensorFlow Privacy environment is ready")
else:
print("TensorFlow Privacy environment needs to be set up")
Best Practices
Start with higher epsilon: Begin with a higher epsilon value (e.g., 5.0) and gradually decrease it to find the right balance between privacy and utility.
Tune batch size: Larger batch sizes can sometimes help with differential privacy training by reducing the number of gradient updates.
Consider clipping threshold: The max_grad_norm parameter controls gradient clipping. Start with 1.0 and adjust based on your model and data.
Privacy vs. utility tradeoff: Be aware that stronger privacy guarantees (lower epsilon) generally result in lower model utility. Adjust based on your specific privacy requirements.
Dataset size matters: Differential privacy works better with larger datasets. If possible, increase your dataset size when using differential privacy.
Minimize epochs: Fewer training epochs generally result in better privacy guarantees, as each epoch consumes privacy budget.
Combined with other privacy techniques: For even stronger privacy protection, combine differential privacy with other techniques like federated learning or secure enclaves.