Federated Learning

Federated Learning (FL) is a machine learning technique that trains models across multiple decentralized devices or servers holding local data samples, without exchanging the actual data. SecureML provides a robust framework for implementing secure and privacy-preserving federated learning systems.

Core Concepts

Federated Learning Types:

  • Cross-device FL: Learning across many (thousands to millions) mobile or IoT devices

  • Cross-silo FL: Learning across a small number of organizations or data silos

  • Vertical FL: Learning when different organizations have different features for the same entities

  • Horizontal FL: Learning when different organizations have the same features for different entities

Key Components:

  • Federated Clients: Devices or servers that hold local data

  • Federated Server: Central server that orchestrates the learning process

  • Aggregation Algorithms: Methods to combine model updates from multiple clients

Basic Usage

Training with Federated Learning

The main way to use federated learning in SecureML is with the train_federated function:

from secureml.federated import train_federated, FederatedConfig
import torch.nn as nn

# Define your model (PyTorch example)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Define a function that returns client data
def get_client_data():
    # Return a dictionary mapping client IDs to their datasets
    return {
        "client-001": client_1_data,
        "client-002": client_2_data,
        "client-003": client_3_data
    }

# Configure federated learning
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=1.0,
    min_fit_clients=2,
    min_available_clients=2,
    use_secure_aggregation=True,
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    weight_update_strategy="ema",
    weight_mixing_rate=0.5
)

# Train the model with federated learning
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config,
    framework="pytorch",  # or "tensorflow", or "auto"
    model_save_path="federated_model.pt"
)

Setting Up a Federated Server

For real-world deployments, you can set up a federated learning server:

from secureml.federated import start_federated_server, FederatedConfig
import torch.nn as nn

# Define your model (PyTorch example)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Configure federated learning
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=0.8,
    min_fit_clients=3,
    min_available_clients=5,
    server_address="0.0.0.0:8080",
    use_secure_aggregation=True
)

# Start the federated server
start_federated_server(
    model=model,
    config=config,
    framework="pytorch"  # or "tensorflow", or "auto"
)

Setting Up a Federated Client

On each client device or server:

from secureml.federated import start_federated_client
import torch.nn as nn
import pandas as pd

# Define your model (must match the server's architecture)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Load local data (pandas DataFrame or NumPy array)
local_data = pd.read_csv("client_data.csv")

# Start the federated client
start_federated_client(
    model=model,
    data=local_data,
    server_address="fl-server.example.com:8080",
    framework="pytorch",  # or "tensorflow", or "auto"
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    test_split=0.2,  # Optional: Use 20% of data for local evaluation
    batch_size=64,
    learning_rate=0.001
)

Advanced Techniques

Secure Aggregation

SecureML supports secure aggregation to protect client updates:

from secureml.federated import FederatedConfig, train_federated

# Configure federated learning with secure aggregation
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=1.0,
    min_fit_clients=2,
    min_available_clients=2,
    use_secure_aggregation=True  # Enable secure aggregation
)

# Train with secure aggregation
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Differential Privacy in Federated Learning

Add differential privacy to client updates:

from secureml.federated import start_federated_client

# Start a client with differential privacy
start_federated_client(
    model=model,
    data=local_data,
    server_address="fl-server.example.com:8080",
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    max_grad_norm=1.0,  # Clipping parameter
    noise_multiplier=1.1  # Noise level (optional)
)

# Or configure it system-wide
from secureml.federated import FederatedConfig, train_federated

config = FederatedConfig(
    num_rounds=10,
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5
)

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Advanced Weight Update Strategies

SecureML provides sophisticated weight update mechanisms for federated learning to improve convergence and stability:

from secureml.federated import FederatedConfig, train_federated

# Configure federated learning with Exponential Moving Average (EMA) weight updates
ema_config = FederatedConfig(
    num_rounds=10,
    weight_update_strategy="ema",       # Use exponential moving average
    weight_mixing_rate=0.5,             # 50% mix of new weights, 50% of old weights
    warmup_rounds=2                     # Gradually increase mixing rate over first 2 rounds
)

# Train with EMA updates
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=ema_config
)

# Use momentum-based weight updates
momentum_config = FederatedConfig(
    num_rounds=10,
    weight_update_strategy="momentum",  # Use momentum-based updates
    weight_mixing_rate=0.1,             # Small update step size
    weight_momentum=0.9,                # High momentum coefficient
    apply_weight_constraints=True,      # Constrain updates to prevent instability
    max_weight_change=0.3               # Maximum 30% change in any weight
)

# Train with momentum updates
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=momentum_config
)

Weight Update Strategy Types

SecureML supports three different strategies for updating model weights in federated learning:

  1. Direct Updates (strategy="direct"): The simplest strategy, where client models directly adopt the weights received from the server. This is the classic federated learning approach.

  2. Exponential Moving Average (EMA) (strategy="ema"): A weighted average between old and new weights. This creates smoother updates and can improve training stability:

    updated_weight = (1 - mixing_rate) * old_weight + mixing_rate * new_weight
    
  3. Momentum-Based Updates (strategy="momentum"): Uses a momentum term to accelerate training and avoid local minima:

    momentum_update = momentum * previous_update + mixing_rate * (new_weight - old_weight)
    updated_weight = old_weight + momentum_update
    

Key Configuration Parameters

  • weight_mixing_rate: Controls how much of the new weights to incorporate (0.0 to 1.0). Lower values make smaller, more conservative updates.

  • weight_momentum: For momentum strategy, determines how much previous updates influence current ones (typically 0.9 to 0.99).

  • warmup_rounds: Number of initial rounds with gradually increasing mixing rates. Useful for stabilizing early training.

  • apply_weight_constraints: When True, prevents any weight from changing too dramatically in a single update.

  • max_weight_change: Maximum relative change allowed in any weight when constraints are enabled (e.g., 0.2 = 20% maximum change).

Choosing a Strategy

  • Use Direct for simpler models and homogeneous data distributions.

  • Use EMA for improved stability and when working with sensitive data that might create noisy updates.

  • Use Momentum for faster convergence on complex problems and when clients have heterogeneous data distributions.

For maximum stability, especially with differential privacy enabled, combine momentum with weight constraints:

from secureml.federated import FederatedConfig, train_federated

# Configuration for stable training with differential privacy
config = FederatedConfig(
    num_rounds=20,
    weight_update_strategy="momentum",
    weight_momentum=0.95,
    apply_weight_constraints=True,
    max_weight_change=0.25,
    apply_differential_privacy=True,
    epsilon=1.0
)

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Supported Frameworks

SecureML supports multiple frameworks for federated learning:

PyTorch Models

import torch.nn as nn
from secureml.federated import train_federated

# Define a PyTorch model
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create and train the model
model = SimpleNN()

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    framework="pytorch"
)

TensorFlow Models

import tensorflow as tf
from secureml.federated import train_federated

# Define a TensorFlow model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model (this is optional, will be done internally if needed)
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train the model
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    framework="tensorflow"
)

Best Practices

  1. Start with simulation: Test your federated learning setup in a simulated environment using train_federated before deploying to real clients with start_federated_server and start_federated_client.

  2. Handle heterogeneous data: Use advanced weight update strategies like momentum or EMA to handle non-IID data distributions.

  3. Consider communication costs: Keep model sizes reasonable and choose appropriate batch sizes to manage communication overhead.

  4. Apply privacy protections: Combine federated learning with differential privacy and secure aggregation for maximum privacy protection.

  5. Monitor convergence: Carefully monitor convergence rates and model performance, as federated learning may converge differently than centralized training.

  6. Framework detection: You can set framework=”auto” to let SecureML automatically detect whether you’re using PyTorch or TensorFlow, but it’s best to explicitly specify the framework when possible.

  7. Data preparation: Ensure your data is properly formatted before training. SecureML expects a pandas DataFrame or numpy array, with the target variable either specified via the target_column parameter or assumed to be the last column.

Further Reading