=================== Differential Privacy API =================== .. module:: secureml.privacy This module provides tools for training machine learning models with strong privacy guarantees using differential privacy techniques. Main Function ------------ .. autofunction:: differentially_private_train This is the main function for training models with differential privacy: .. code-block:: python from secureml.privacy import differentially_private_train # Train a PyTorch model with differential privacy private_model = differentially_private_train( model=my_model, data=training_data, epsilon=1.0, delta=1e-5, max_grad_norm=1.0, framework="pytorch", batch_size=64, epochs=10, learning_rate=0.001 ) Framework Support ---------------- The module supports both PyTorch and TensorFlow as backend frameworks, and can automatically detect which framework is being used: .. code-block:: python # Auto-detect framework (default) private_model = differentially_private_train( model=my_model, data=training_data, epsilon=0.5 ) # Explicitly specify PyTorch private_model = differentially_private_train( model=my_model, data=training_data, epsilon=0.5, framework="pytorch" ) # Explicitly specify TensorFlow private_model = differentially_private_train( model=my_model, data=training_data, epsilon=0.5, framework="tensorflow" ) Implementation Details --------------------- PyTorch Implementation (Opacus) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For PyTorch models, the module uses the `Opacus `_ library to implement differential privacy: .. code-block:: python import torch import torch.nn as nn import torch.optim as optim import pandas as pd from secureml.privacy import differentially_private_train # Define a simple model class SimpleModel(nn.Module): def __init__(self, input_dim, hidden_dim, output_dim): super().__init__() self.layer1 = nn.Linear(input_dim, hidden_dim) self.layer2 = nn.Linear(hidden_dim, output_dim) self.relu = nn.ReLU() def forward(self, x): x = self.layer1(x) x = self.relu(x) x = self.layer2(x) return x # Create model instance model = SimpleModel(input_dim=10, hidden_dim=32, output_dim=2) # Create some sample data data = pd.DataFrame(...) # Your data here # Train with differential privacy private_model = differentially_private_train( model=model, data=data, epsilon=1.0, delta=1e-5, batch_size=64, epochs=10, learning_rate=0.001, criterion=torch.nn.CrossEntropyLoss(), optimizer=torch.optim.Adam, early_stopping_patience=3, target_column="label" ) TensorFlow Implementation (TensorFlow Privacy) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For TensorFlow models, the module uses `TensorFlow Privacy `_ to implement differential privacy. This is run in an isolated environment to avoid dependency conflicts: .. code-block:: python import tensorflow as tf import pandas as pd from secureml.privacy import differentially_private_train from secureml.isolated_environments.tf_privacy import setup_tf_privacy_environment # Optionally set up the TensorFlow Privacy environment in advance setup_tf_privacy_environment() # Define a simple model def create_model(input_shape, num_classes): model = tf.keras.Sequential([ tf.keras.layers.Dense(32, activation='relu', input_shape=(input_shape,)), tf.keras.layers.Dense(num_classes, activation='softmax') ]) return model # Create model instance model = create_model(input_shape=10, num_classes=2) model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # Create some sample data data = pd.DataFrame(...) # Your data here # Train with differential privacy private_model = differentially_private_train( model=model, data=data, epsilon=1.0, delta=1e-5, batch_size=64, epochs=10, learning_rate=0.001, target_column="label" ) Privacy Parameters ----------------- Understanding Privacy Budget ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The `epsilon` parameter is the privacy budget - smaller values provide stronger privacy guarantees but may reduce model utility: - `epsilon=0.1`: Very strong privacy guarantees, but may significantly impact model accuracy - `epsilon=1.0`: Good balance between privacy and utility for many applications - `epsilon=10.0`: Weaker privacy guarantees, but better model utility The `delta` parameter represents the probability of privacy loss exceeding epsilon: - Typically set to a very small value, usually less than 1/N where N is the dataset size - Common value: `delta=1e-5` Manual Noise Multiplier ~~~~~~~~~~~~~~~~~~~~~~~ Instead of specifying `epsilon` and `delta`, you can directly set the noise multiplier: .. code-block:: python private_model = differentially_private_train( model=model, data=data, noise_multiplier=1.2, # Instead of epsilon/delta max_grad_norm=1.0, batch_size=64, epochs=10 ) Integration with Federated Learning ---------------------------------- The differential privacy module can be used in conjunction with federated learning for enhanced privacy: .. code-block:: python from secureml.federated import train_federated, FederatedConfig # Configure federated learning with differential privacy config = FederatedConfig( num_rounds=3, min_fit_clients=2, apply_differential_privacy=True, epsilon=1.0, delta=1e-5 ) # Train a model using federated learning with differential privacy federated_model = train_federated( model=model, client_data_fn=get_client_data, config=config ) Isolated Environments -------------------- TensorFlow Privacy is run in an isolated virtual environment to avoid dependency conflicts: .. autofunction:: secureml.isolated_environments.tf_privacy.setup_tf_privacy_environment You can set up the environment in advance: .. code-block:: python from secureml.isolated_environments.tf_privacy import setup_tf_privacy_environment # Set up the TensorFlow Privacy environment setup_tf_privacy_environment() Utility Functions ---------------- .. autofunction:: secureml.isolated_environments.tf_privacy.is_env_valid Check if the TensorFlow Privacy environment is properly set up: .. code-block:: python from secureml.isolated_environments.tf_privacy import is_env_valid if is_env_valid(): print("TensorFlow Privacy environment is ready") else: print("TensorFlow Privacy environment needs to be set up") Best Practices ------------- 1. **Start with higher epsilon**: Begin with a higher epsilon value (e.g., 5.0) and gradually decrease it to find the right balance between privacy and utility. 2. **Tune batch size**: Larger batch sizes can sometimes help with differential privacy training by reducing the number of gradient updates. 3. **Consider clipping threshold**: The `max_grad_norm` parameter controls gradient clipping. Start with 1.0 and adjust based on your model and data. 4. **Privacy vs. utility tradeoff**: Be aware that stronger privacy guarantees (lower epsilon) generally result in lower model utility. Adjust based on your specific privacy requirements. 5. **Dataset size matters**: Differential privacy works better with larger datasets. If possible, increase your dataset size when using differential privacy. 6. **Minimize epochs**: Fewer training epochs generally result in better privacy guarantees, as each epoch consumes privacy budget. 7. **Combined with other privacy techniques**: For even stronger privacy protection, combine differential privacy with other techniques like federated learning or secure enclaves.