=================== Differential Privacy =================== Differential privacy (DP) provides a mathematical framework for quantifying and limiting the privacy risk when training machine learning models. SecureML implements state-of-the-art differential privacy techniques that allow you to train models with formal privacy guarantees. Core Concepts ------------ **Epsilon (ε)**: The privacy budget that quantifies the maximum privacy loss. Lower values provide stronger privacy guarantees. **Delta (δ)**: The probability of information leakage beyond what is allowed by epsilon. This should be very small (typically less than 1/n where n is the dataset size). **Sensitivity**: The maximum influence a single data point can have on the output. **Noise Mechanisms**: Algorithms that add calibrated noise to protect privacy (Laplace, Gaussian, etc.). Basic Usage ---------- Training a Differentially Private Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The simplest way to train a model with differential privacy is to use the ``differentially_private_train`` function: .. code-block:: python from secureml.privacy import differentially_private_train import tensorflow as tf # or import torch for PyTorch # Create a model (TensorFlow example) model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)), tf.keras.layers.Dense(1, activation='sigmoid') ]) # Train the model with differential privacy dp_model = differentially_private_train( model=model, data=training_data, # DataFrame or numpy array epsilon=1.0, # Privacy budget delta=1e-5, # Probability of privacy breach max_grad_norm=1.0, # Maximum gradient norm for clipping noise_multiplier=None, # If None, calculated from epsilon and delta batch_size=64, epochs=10 ) # Make predictions with the differentially private model predictions = dp_model.predict(X_test) SecureML automatically detects whether you're using PyTorch or TensorFlow based on the model you provide. You can also explicitly specify the framework: .. code-block:: python # Specify the framework explicitly dp_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, framework="tensorflow" # or "pytorch" ) Supported Frameworks ------------------ SecureML supports differential privacy for multiple ML frameworks: **PyTorch Integration with Opacus** For PyTorch models, SecureML uses the Opacus library under the hood: .. code-block:: python import torch import torch.nn as nn # Define a PyTorch model model = nn.Sequential( nn.Linear(input_size, 128), nn.ReLU(), nn.Linear(128, output_size) ) # Train with differential privacy dp_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, batch_size=64, epochs=10, criterion=torch.nn.CrossEntropyLoss(), learning_rate=0.001, validation_split=0.2 ) **TensorFlow Integration with TensorFlow Privacy** For TensorFlow models, SecureML uses TensorFlow Privacy in an isolated environment: .. code-block:: python import tensorflow as tf # Create a Keras model model = tf.keras.Sequential([ tf.keras.layers.Dense(128, activation='relu', input_shape=(input_size,)), tf.keras.layers.Dense(output_size, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train with differential privacy dp_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, batch_size=64, epochs=10, early_stopping_patience=3 ) TensorFlow Privacy and Isolated Environments -------------------------------------------- When using TensorFlow Privacy with SecureML, the library uses an isolated environment to handle dependency conflicts. This is all managed automatically for you. What Happens Behind the Scenes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When you specify ``framework="tensorflow"`` in the ``differentially_private_train`` function: 1. SecureML checks if a TensorFlow Privacy isolated environment exists 2. If not, it creates one automatically (there may be a delay during this first-time setup) 3. Your model and data are serialized and sent to the isolated environment 4. Training happens in the isolated environment 5. The trained model is returned to your main environment .. code-block:: python from secureml.privacy import differentially_private_train import tensorflow as tf # Create a model model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Train with differential privacy using TensorFlow Privacy # This automatically uses the isolated environment private_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, epochs=10, batch_size=32, framework="tensorflow" # This triggers the isolated environment ) # Use the model as normal predictions = private_model.predict(test_data) Pre-setup for Faster Execution ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To avoid delays during your first run, you can set up the TensorFlow Privacy environment in advance: .. code-block:: bash secureml environments setup-tf-privacy For more detailed information on how SecureML manages isolated environments, see the :doc:`isolated_environments` section. Advanced Training Options ----------------------- Both PyTorch and TensorFlow integrations support additional training parameters: .. code-block:: python # Common parameters for both frameworks dp_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, batch_size=64, epochs=10, learning_rate=0.001, validation_split=0.2, shuffle=True, verbose=True, early_stopping_patience=5 # Stop training if validation loss doesn't improve ) Data Preparation -------------- The ``differentially_private_train`` function can handle both DataFrames and NumPy arrays: .. code-block:: python # Using a DataFrame dp_model = differentially_private_train( model=model, data=df, # DataFrame where the last column is the target by default target_column="label", # Specify a different target column if needed epsilon=1.0, delta=1e-5 ) # Using NumPy arrays dp_model = differentially_private_train( model=model, data=np.concatenate([X, y.reshape(-1, 1)], axis=1), # Concatenate features and labels epsilon=1.0, delta=1e-5 ) Monitoring Privacy Budget ----------------------- Both frameworks provide information about the actual privacy budget spent during training. This is displayed in the output if ``verbose=True``: .. code-block:: python # Train with differential privacy dp_model = differentially_private_train( model=model, data=training_data, epsilon=1.0, delta=1e-5, verbose=True # Will show privacy budget spent after training ) In PyTorch (Opacus), you can also manually query the spent privacy budget: .. code-block:: python from opacus import PrivacyEngine # After training with Opacus, the privacy engine has a get_epsilon method privacy_engine = PrivacyEngine() # Training code... # Get the privacy budget spent spent_epsilon = privacy_engine.get_epsilon(delta=1e-5) print(f"Privacy budget spent (ε = {spent_epsilon:.4f})") Integration with Federated Learning ------------------------------- SecureML supports combining differential privacy with federated learning: .. code-block:: python from secureml.federated import start_federated_client # Start a federated learning client with differential privacy start_federated_client( model=model, data=client_data, server_address="localhost:8080", apply_differential_privacy=True, epsilon=1.0, delta=1e-5, max_grad_norm=1.0 ) Best Practices ------------- 1. **Start with a higher epsilon**: Begin with a higher privacy budget (e.g., ε=10) and gradually reduce it to find the right balance 2. **Use larger batch sizes**: Larger batches reduce the amount of noise needed 3. **Pre-train on public data**: Initialize models with public data before fine-tuning with differential privacy on sensitive data 4. **Simplify models**: Simpler models often require less privacy budget 5. **Monitor training curves**: Watch for signs of excessive noise affecting convergence 6. **Manually set noise_multiplier**: If the auto-calculated noise is too high, try manually setting a lower value 7. **Tune the clipping threshold**: Find the optimal gradient clipping threshold for your specific problem Further Reading ------------- * :doc:`/api/privacy` - Complete API reference for differential privacy functions * :doc:`/examples/differential_privacy` - More examples of differential privacy techniques * `The Algorithmic Foundations of Differential Privacy `_ - Foundational paper by Dwork and Roth