Differential Privacy Examples ============================ This section demonstrates how to use SecureML's differential privacy features to train machine learning models with formal privacy guarantees. PyTorch Model with Differential Privacy -------------------------------------- In this example, we'll train a PyTorch neural network with differential privacy using Opacus under the hood: .. code-block:: python import numpy as np import torch import torch.nn as nn from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from secureml.privacy import differentially_private_train # Create a synthetic dataset np.random.seed(42) n_samples = 1000 n_features = 10 # Generate features and binary classification target X = np.random.randn(n_samples, n_features) w = np.random.randn(n_features) y = (np.dot(X, w) + 0.1 * np.random.randn(n_samples) > 0).astype(int) # Split and scale the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Prepare data for SecureML (expects DataFrame or numpy array with target as last column) train_data = np.column_stack((X_train, y_train)) # Define a PyTorch model class BinaryClassifier(nn.Module): def __init__(self, input_dim): super(BinaryClassifier, self).__init__() self.layer1 = nn.Linear(input_dim, 32) self.layer2 = nn.Linear(32, 16) self.layer3 = nn.Linear(16, 1) self.relu = nn.ReLU() self.sigmoid = nn.Sigmoid() def forward(self, x): x = self.relu(self.layer1(x)) x = self.relu(self.layer2(x)) x = self.sigmoid(self.layer3(x)) return x # Create the model model = BinaryClassifier(X_train.shape[1]) # Train with differential privacy dp_model = differentially_private_train( model=model, data=train_data, epsilon=1.0, # Privacy budget delta=1e-5, # Privacy delta batch_size=64, epochs=5, learning_rate=0.001, validation_split=0.1, framework="pytorch", verbose=True ) # Evaluate the model dp_model.eval() with torch.no_grad(): X_test_tensor = torch.tensor(X_test, dtype=torch.float32) y_pred = dp_model(X_test_tensor).numpy().flatten() > 0.5 accuracy = np.mean(y_pred == y_test) print(f"Test accuracy with differential privacy: {accuracy:.4f}") TensorFlow Model with Differential Privacy --------------------------------------- SecureML also supports differential privacy for TensorFlow models using TensorFlow Privacy in an isolated environment: .. code-block:: python import numpy as np import tensorflow as tf from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from secureml.privacy import differentially_private_train # Create a synthetic multi-class dataset np.random.seed(42) n_samples = 1000 n_features = 10 n_classes = 3 # Generate features and multi-class classification target X = np.random.randn(n_samples, n_features) w = np.random.randn(n_features, n_classes) logits = np.dot(X, w) y = np.argmax(logits, axis=1) # Split and scale the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Prepare data for SecureML train_data = np.column_stack((X_train, y_train)) # Define a TensorFlow model tf_model = tf.keras.Sequential([ tf.keras.layers.Dense(32, activation='relu', input_shape=(n_features,)), tf.keras.layers.Dense(16, activation='relu'), tf.keras.layers.Dense(n_classes, activation='softmax') ]) # Compile the model tf_model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # Train with differential privacy dp_tf_model = differentially_private_train( model=tf_model, data=train_data, epsilon=1.0, delta=1e-5, max_grad_norm=1.0, batch_size=64, epochs=5, framework="tensorflow", verbose=True ) # Evaluate the model test_loss, test_accuracy = dp_tf_model.evaluate(X_test, y_test, verbose=0) print(f"Test accuracy with differential privacy: {test_accuracy:.4f}") Privacy-Utility Tradeoff ---------------------- One important aspect of differential privacy is understanding the tradeoff between privacy and utility. Here's an example that evaluates model performance across different privacy budgets (epsilon values): .. code-block:: python import numpy as np import torch import torch.nn as nn import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from secureml.privacy import differentially_private_train # Create dataset and prepare data as in previous examples # ... # Define different privacy budgets to test epsilons = [0.1, 0.5, 1.0, 5.0, 10.0] accuracies = [] # Function to create a model with the same architecture def create_model(): return nn.Sequential( nn.Linear(n_features, 32), nn.ReLU(), nn.Linear(32, 16), nn.ReLU(), nn.Linear(16, 1), nn.Sigmoid() ) # Train and evaluate for each epsilon for epsilon in epsilons: model = create_model() # Train with differential privacy dp_model = differentially_private_train( model=model, data=train_data, epsilon=epsilon, delta=1e-5, batch_size=64, epochs=5, framework="pytorch", verbose=False ) # Evaluate dp_model.eval() with torch.no_grad(): X_test_tensor = torch.tensor(X_test, dtype=torch.float32) y_pred = dp_model(X_test_tensor).numpy().flatten() > 0.5 accuracy = np.mean(y_pred == y_test) accuracies.append(accuracy) # Train a non-private model for comparison non_private_model = create_model() # Train the non-private model... # Plot the privacy-utility tradeoff plt.figure(figsize=(10, 6)) plt.plot(epsilons, accuracies, 'o-', label='DP Model') plt.axhline(y=non_private_accuracy, color='r', linestyle='--', label='Non-private Model') plt.xscale('log') plt.xlabel('Privacy Budget (ε)') plt.ylabel('Test Accuracy') plt.title('Privacy-Utility Tradeoff') plt.grid(True, alpha=0.3) plt.legend() plt.savefig('privacy_utility_tradeoff.png') This will produce a graph showing how model accuracy changes as the privacy budget (epsilon) increases. Typically, as epsilon increases (less privacy), accuracy improves and approaches the non-private model's performance. Federated Learning with Differential Privacy ----------------------------------------- SecureML allows combining federated learning with differential privacy for enhanced privacy guarantees: .. code-block:: python import torch import torch.nn as nn from secureml.federated import start_federated_client # Define a model model = nn.Sequential( nn.Linear(10, 32), nn.ReLU(), nn.Linear(32, 16), nn.ReLU(), nn.Linear(16, 1), nn.Sigmoid() ) # Start a federated learning client with differential privacy start_federated_client( model=model, data=client_data, # Client's local dataset server_address="localhost:8080", apply_differential_privacy=True, epsilon=1.0, delta=1e-5, max_grad_norm=1.0 ) Advanced Options and Best Practices -------------------------------- Setting Hyperparameters ^^^^^^^^^^^^^^^^^^^^^ When training with differential privacy, several hyperparameters can significantly affect both privacy and utility: .. code-block:: python dp_model = differentially_private_train( model=model, data=train_data, epsilon=1.0, delta=1e-5, # Noise and clipping parameters noise_multiplier=None, # Auto-calculated from epsilon if None max_grad_norm=1.0, # Clipping threshold for gradients # Training parameters batch_size=64, # Larger batch sizes need less noise learning_rate=0.001, # May need adjustment compared to non-DP training epochs=10, # Validation and early stopping validation_split=0.2, early_stopping_patience=3, # Other parameters verbose=True, framework="pytorch" # or "tensorflow" ) Data Preparation Tips ^^^^^^^^^^^^^^^^^^^ For better performance with differential privacy: 1. **Normalize your data**: Normalized features perform better with gradient clipping 2. **Balance classes**: Imbalanced datasets can make private training more challenging 3. **Remove outliers**: Extreme values can have a disproportionate effect on gradients .. code-block:: python # Example of proper data preparation from sklearn.preprocessing import StandardScaler # Normalize features scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Create a balanced subsample if needed from sklearn.utils import resample X_balanced, y_balanced = resample(X_train, y_train, stratify=y_train, random_state=42) Monitoring Privacy Budget ^^^^^^^^^^^^^^^^^^^^^ Both PyTorch and TensorFlow implementations allow you to monitor the privacy budget spent: .. code-block:: python # For PyTorch (Opacus) from opacus import PrivacyEngine # After training with Opacus, the privacy engine has a get_epsilon method privacy_engine = PrivacyEngine() # Training code... # Get the privacy budget spent spent_epsilon = privacy_engine.get_epsilon(delta=1e-5) print(f"Privacy budget spent (ε = {spent_epsilon:.4f})") # For TensorFlow Privacy, the spent budget is returned as part of the result # when verbose=True is set in differentially_private_train