Federated Learning

Federated Learning (FL) is a machine learning technique that trains models across multiple decentralized devices or servers holding local data samples, without exchanging the actual data. SecureML provides a robust framework for implementing secure and privacy-preserving federated learning systems.

Core Concepts

Federated Learning Types:

Cross-device FL: Learning across many (thousands to millions) mobile or IoT devices
Cross-silo FL: Learning across a small number of organizations or data silos
Vertical FL: Learning when different organizations have different features for the same entities
Horizontal FL: Learning when different organizations have the same features for different entities

Key Components:

Federated Clients: Devices or servers that hold local data
Federated Server: Central server that orchestrates the learning process
Aggregation Algorithms: Methods to combine model updates from multiple clients

Basic Usage

Training with Federated Learning

The main way to use federated learning in SecureML is with the train_federated function:

from secureml.federated import train_federated, FederatedConfig
import torch.nn as nn

# Define your model (PyTorch example)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Define a function that returns client data
def get_client_data():
    # Return a dictionary mapping client IDs to their datasets
    return {
        "client-001": client_1_data,
        "client-002": client_2_data,
        "client-003": client_3_data
    }

# Configure federated learning
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=1.0,
    min_fit_clients=2,
    min_available_clients=2,
    use_secure_aggregation=True,
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    weight_update_strategy="ema",
    weight_mixing_rate=0.5
)

# Train the model with federated learning
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config,
    framework="pytorch",  # or "tensorflow", or "auto"
    model_save_path="federated_model.pt"
)

Setting Up a Federated Server

For real-world deployments, you can set up a federated learning server:

from secureml.federated import start_federated_server, FederatedConfig
import torch.nn as nn

# Define your model (PyTorch example)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Configure federated learning
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=0.8,
    min_fit_clients=3,
    min_available_clients=5,
    server_address="0.0.0.0:8080",
    use_secure_aggregation=True
)

# Start the federated server
start_federated_server(
    model=model,
    config=config,
    framework="pytorch"  # or "tensorflow", or "auto"
)

Setting Up a Federated Client

On each client device or server:

from secureml.federated import start_federated_client
import torch.nn as nn
import pandas as pd

# Define your model (must match the server's architecture)
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create a model
model = SimpleNN()

# Load local data (pandas DataFrame or NumPy array)
local_data = pd.read_csv("client_data.csv")

# Start the federated client
start_federated_client(
    model=model,
    data=local_data,
    server_address="fl-server.example.com:8080",
    framework="pytorch",  # or "tensorflow", or "auto"
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    test_split=0.2,  # Optional: Use 20% of data for local evaluation
    batch_size=64,
    learning_rate=0.001
)

Advanced Techniques

Secure Aggregation

SecureML supports secure aggregation to protect client updates:

from secureml.federated import FederatedConfig, train_federated

# Configure federated learning with secure aggregation
config = FederatedConfig(
    num_rounds=10,
    fraction_fit=1.0,
    min_fit_clients=2,
    min_available_clients=2,
    use_secure_aggregation=True  # Enable secure aggregation
)

# Train with secure aggregation
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Differential Privacy in Federated Learning

Add differential privacy to client updates:

from secureml.federated import start_federated_client

# Start a client with differential privacy
start_federated_client(
    model=model,
    data=local_data,
    server_address="fl-server.example.com:8080",
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5,
    max_grad_norm=1.0,  # Clipping parameter
    noise_multiplier=1.1  # Noise level (optional)
)

# Or configure it system-wide
from secureml.federated import FederatedConfig, train_federated

config = FederatedConfig(
    num_rounds=10,
    apply_differential_privacy=True,
    epsilon=1.0,
    delta=1e-5
)

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Advanced Weight Update Strategies

SecureML provides sophisticated weight update mechanisms for federated learning to improve convergence and stability:

from secureml.federated import FederatedConfig, train_federated

# Configure federated learning with Exponential Moving Average (EMA) weight updates
ema_config = FederatedConfig(
    num_rounds=10,
    weight_update_strategy="ema",       # Use exponential moving average
    weight_mixing_rate=0.5,             # 50% mix of new weights, 50% of old weights
    warmup_rounds=2                     # Gradually increase mixing rate over first 2 rounds
)

# Train with EMA updates
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=ema_config
)

# Use momentum-based weight updates
momentum_config = FederatedConfig(
    num_rounds=10,
    weight_update_strategy="momentum",  # Use momentum-based updates
    weight_mixing_rate=0.1,             # Small update step size
    weight_momentum=0.9,                # High momentum coefficient
    apply_weight_constraints=True,      # Constrain updates to prevent instability
    max_weight_change=0.3               # Maximum 30% change in any weight
)

# Train with momentum updates
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=momentum_config
)

Weight Update Strategy Types

SecureML supports three different strategies for updating model weights in federated learning:

Direct Updates (strategy="direct"): The simplest strategy, where client models directly adopt the weights received from the server. This is the classic federated learning approach.
Exponential Moving Average (EMA) (strategy="ema"): A weighted average between old and new weights. This creates smoother updates and can improve training stability:
```
updated_weight = (1 - mixing_rate) * old_weight + mixing_rate * new_weight
```

Momentum-Based Updates (strategy="momentum"): Uses a momentum term to accelerate training and avoid local minima:

momentum_update = momentum * previous_update + mixing_rate * (new_weight - old_weight)
updated_weight = old_weight + momentum_update

Key Configuration Parameters

weight_mixing_rate: Controls how much of the new weights to incorporate (0.0 to 1.0). Lower values make smaller, more conservative updates.
weight_momentum: For momentum strategy, determines how much previous updates influence current ones (typically 0.9 to 0.99).
warmup_rounds: Number of initial rounds with gradually increasing mixing rates. Useful for stabilizing early training.
apply_weight_constraints: When True, prevents any weight from changing too dramatically in a single update.
max_weight_change: Maximum relative change allowed in any weight when constraints are enabled (e.g., 0.2 = 20% maximum change).

Choosing a Strategy

Use Direct for simpler models and homogeneous data distributions.
Use EMA for improved stability and when working with sensitive data that might create noisy updates.
Use Momentum for faster convergence on complex problems and when clients have heterogeneous data distributions.

For maximum stability, especially with differential privacy enabled, combine momentum with weight constraints:

from secureml.federated import FederatedConfig, train_federated

# Configuration for stable training with differential privacy
config = FederatedConfig(
    num_rounds=20,
    weight_update_strategy="momentum",
    weight_momentum=0.95,
    apply_weight_constraints=True,
    max_weight_change=0.25,
    apply_differential_privacy=True,
    epsilon=1.0
)

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    config=config
)

Supported Frameworks

SecureML supports multiple frameworks for federated learning:

PyTorch Models

import torch.nn as nn
from secureml.federated import train_federated

# Define a PyTorch model
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Create and train the model
model = SimpleNN()

trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    framework="pytorch"
)

TensorFlow Models

import tensorflow as tf
from secureml.federated import train_federated

# Define a TensorFlow model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model (this is optional, will be done internally if needed)
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train the model
trained_model = train_federated(
    model=model,
    client_data_fn=get_client_data,
    framework="tensorflow"
)

Best Practices

Start with simulation: Test your federated learning setup in a simulated environment using train_federated before deploying to real clients with start_federated_server and start_federated_client.
Handle heterogeneous data: Use advanced weight update strategies like momentum or EMA to handle non-IID data distributions.
Consider communication costs: Keep model sizes reasonable and choose appropriate batch sizes to manage communication overhead.
Apply privacy protections: Combine federated learning with differential privacy and secure aggregation for maximum privacy protection.
Monitor convergence: Carefully monitor convergence rates and model performance, as federated learning may converge differently than centralized training.
Framework detection: You can set framework=”auto” to let SecureML automatically detect whether you’re using PyTorch or TensorFlow, but it’s best to explicitly specify the framework when possible.
Data preparation: Ensure your data is properly formatted before training. SecureML expects a pandas DataFrame or numpy array, with the target variable either specified via the target_column parameter or assumed to be the last column.