Isolated Environments

Overview

SecureML uses isolated virtual environments to manage dependencies that have conflicting requirements with the main package. This architectural approach allows SecureML to seamlessly integrate with packages like TensorFlow Privacy, which requires specific package versions that would otherwise conflict with SecureML’s core dependencies.

Note

Currently, the primary use case for isolated environments is TensorFlow Privacy, which requires packaging ~= 22.0, while other SecureML dependencies require packaging >= 24.0. The isolated environment requires Python 3.11. If your main SecureML environment uses a different Python version (e.g., 3.12), you must set the TF_PRIVACY_PYTHON environment variable to point to a Python 3.11 executable.

Why Isolated Environments?

Machine learning libraries often have complex dependency trees with specific version requirements. In particular:

  • Version Conflicts: Libraries like TensorFlow Privacy may require versions of dependencies that conflict with other parts of SecureML

  • Dependency Bloat: Installing all possible dependencies would make the package unnecessarily large

  • User Experience: We want to provide a seamless experience without forcing users to manage complex environments manually

How Isolated Environments Work

When you use functionality that requires TensorFlow Privacy through SecureML, the library:

  1. Automatic Management: Automatically creates and manages a separate Python virtual environment

  2. Transparent Integration: Handles all communication between your main environment and the isolated environment

  3. Efficient Resource Usage: Only creates the environment when needed

The Architecture

Main Python Environment                 Isolated Environment
┌───────────────────────────┐          ┌──────────────────────────┐
│                           │          │                          │
│  Your Application         │          │  TensorFlow Privacy      │
│  ┌──────────────────┐     │          │  Environment             │
│  │                  │     │  JSON    │                          │
│  │  SecureML        │─────┼─────────▶│  • TensorFlow            │
│  │                  │     │  IPC     │  • TensorFlow Privacy    │
│  └──────────────────┘     │          │  • Numpy, Pandas         │
│                           │          │  • SecureML              │
└───────────────────────────┘          └──────────────────────────┘
  • Communication: SecureML uses a secure JSON-based communication protocol to transfer data between environments

  • Serialization: Model parameters, datasets, and results are serialized when passing between environments

  • Error Handling: Any errors in the isolated environment are properly captured and reported back to the main environment

TensorFlow Privacy Integration

The most common use case for isolated environments is when using SecureML’s differential privacy functionality with TensorFlow.

Default Behavior

By default, when you call differentially_private_train() with framework="tensorflow", SecureML will:

  1. Check if the TensorFlow Privacy environment exists

  2. Create it if it doesn’t exist (this happens only once), using Python 3.11

  3. Send your model and data to the isolated environment

  4. Run the training in the isolated environment

  5. Return the trained model back to your main environment

Example Usage

from secureml import privacy
import tensorflow as tf

# Create a model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train with differential privacy
private_model = privacy.differentially_private_train(
    model=model,
    data=training_data,
    epsilon=1.0,
    delta=1e-5,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    framework="tensorflow"  # This triggers the isolated environment
)

# The model is trained with differential privacy guarantees and returned to your main environment
predictions = private_model.predict(test_data)

Managing Isolated Environments

Command Line Interface

SecureML provides CLI commands to manage isolated environments:

# Set up the TensorFlow Privacy environment in advance
secureml environments setup-tf-privacy

# Force recreation of the environment (useful for troubleshooting)
secureml environments setup-tf-privacy --force

# Check the status of isolated environments
secureml environments info

Note

If your main Python environment is not Python 3.11, set the TF_PRIVACY_PYTHON environment variable to the path of a Python 3.11 executable before running these commands. For example:

  • Linux/macOS: export TF_PRIVACY_PYTHON=/usr/bin/python3.11

  • Windows: set TF_PRIVACY_PYTHON=C:\Python311\python.exe

Using the API

You can also manage isolated environments programmatically:

from secureml.isolated_environments import (
    setup_tf_privacy_environment,
    is_env_valid,
    get_env_path
)

# Set up the environment
setup_tf_privacy_environment()

# Check if the environment is valid
if is_env_valid():
    print("Environment is ready for use")
else:
    print("Environment needs to be set up")

# Get the path to the environment
env_path = get_env_path()
print(f"TensorFlow Privacy environment is at: {env_path}")

Location and Structure

By default, isolated environments are created at:

  • Linux/macOS: ~/.secureml/tf_privacy_venv

  • Windows: %USERPROFILE%\.secureml\tf_privacy_venv

The environment contains:

  • Python 3.11 interpreter

  • TensorFlow (compatible version)

  • TensorFlow Privacy

  • NumPy and Pandas

  • A copy of SecureML

Advanced Topics

Custom Environment Path

Currently, SecureML does not support customizing the environment path, but this feature is planned for future releases.

Troubleshooting

If you encounter issues with the isolated environment:

  1. Ensure Python 3.11 is used: If your main environment uses Python 3.12 or another version, set the TF_PRIVACY_PYTHON environment variable:

    # Linux/macOS
    export TF_PRIVACY_PYTHON=/usr/bin/python3.11
    secureml environments setup-tf-privacy
    
    # Windows
    set TF_PRIVACY_PYTHON=C:\Python311\python.exe
    secureml environments setup-tf-privacy
    
  2. Recreate the environment:

    secureml environments setup-tf-privacy --force
    
  3. Check for errors during setup:

    secureml environments setup-tf-privacy --verbose
    
  4. Verify installed packages:

    # Linux/macOS
    ~/.secureml/tf_privacy_venv/bin/pip list
    
    # Windows
    %USERPROFILE%\.secureml\tf_privacy_venv\Scripts\pip list
    
  5. Manual cleanup (if necessary):

    # Remove the environment directory
    rm -rf ~/.secureml/tf_privacy_venv  # Linux/macOS
    rmdir /s /q %USERPROFILE%\.secureml\tf_privacy_venv  # Windows
    
    # Then recreate it
    secureml environments setup-tf-privacy
    

Performance Considerations

  • First-time setup: The first time you use TensorFlow Privacy functionality, there will be a delay as the environment is created and packages are installed

  • Subsequent usage: After the initial setup, the overhead is minimal, primarily related to data serialization/deserialization

  • Memory usage: The isolated environment runs in a separate process, which requires additional memory

Implementation Details

For developers interested in how isolated environments are implemented:

  • The run_tf_privacy_function() function manages the execution of code in the isolated environment

  • Communication happens through temporary files containing JSON-serialized data

  • A subprocess is created to run Python code in the isolated environment using a Python 3.11 interpreter

  • The result is returned through another temporary file

  • The Python 3.11 requirement is enforced by checking the current interpreter version or the TF_PRIVACY_PYTHON environment variable; if neither provides Python 3.11, an error is raised with instructions

Future Plans

In future releases, we plan to:

  • Support custom environment locations

  • Add more isolated environments for other conflicting dependencies

  • Improve error reporting and logging

  • Add support for memory-mapped communication for better performance with large datasets