CLI Examples

This section demonstrates how to use the SecureML command-line interface through practical examples. You can use these examples as a starting point for your own privacy-preserving data workflows.

Basic Setup

To use the CLI, make sure you have SecureML installed:

pip install secureml

Getting help information:

# Show general help
secureml --help

# Show help for a specific command
secureml anonymization --help

# Show version information
secureml --version

Anonymization Examples

Applying k-anonymity to protect sensitive data:

# Basic k-anonymity with k=3
secureml anonymization k-anonymize patient_data.csv anonymized_data.csv \
    --quasi-id age --quasi-id zipcode \
    --sensitive diagnosis --sensitive income \
    --k 3

# Using a different output format
secureml anonymization k-anonymize patient_data.csv anonymized_data.json \
    --quasi-id age --quasi-id zipcode \
    --sensitive diagnosis \
    --k 2 \
    --format json

Compliance Checking Examples

Verifying compliance with privacy regulations:

# Basic GDPR compliance check
secureml compliance check patient_data.csv \
    --regulation GDPR

# Compliance check with metadata and HTML output
secureml compliance check patient_data.csv \
    --regulation GDPR \
    --metadata metadata.json \
    --output gdpr_report.html \
    --format html

Example metadata.json file:

{
    "description": "Patient health data",
    "data_owner": "Example Hospital",
    "data_retention_period": "5 years",
    "data_encrypted": true,
    "data_storage_location": "EU",
    "consent_obtained": true,
    "consent_date": "2023-01-15"
}

Checking both dataset and model compliance:

# Comprehensive HIPAA compliance check
secureml compliance check patient_data.csv \
    --regulation HIPAA \
    --metadata metadata.json \
    --model-config model_config.json \
    --output hipaa_report.pdf \
    --format pdf

Example model_config.json file:

{
    "model_type": "RandomForestClassifier",
    "parameters": {
        "n_estimators": 100,
        "max_depth": 5
    },
    "supports_forget_request": true,
    "supports_deletion_request": true,
    "data_processing_purpose": "Medical diagnosis prediction",
    "model_storage_location": "EU"
}

Synthetic Data Generation Examples

Creating synthetic datasets based on real data:

# Basic statistical synthesis
secureml synthetic generate patient_data.csv synthetic_data.csv \
    --method statistical \
    --samples 1000

# Auto-detecting sensitive columns
secureml synthetic generate patient_data.csv synthetic_data.csv \
    --method statistical \
    --auto-detect-sensitive \
    --sensitivity-confidence 0.7 \
    --sensitivity-sample-size 200 \
    --samples 1000

# Using GAN-based synthesis with specific sensitive columns
secureml synthetic generate patient_data.csv synthetic_data.parquet \
    --method gan \
    --sensitive name --sensitive email --sensitive diagnosis \
    --epochs 300 --batch-size 32 \
    --samples 500 \
    --format parquet

Regulation Presets Examples

Working with regulation presets:

# List all available regulation presets
secureml presets list

# View the GDPR preset
secureml presets show gdpr

# Extract just the personal data identifiers field from GDPR
secureml presets show gdpr --field personal_data_identifiers

# Save the entire HIPAA preset to a file
secureml presets show hipaa --output hipaa_preset.json

Isolated Environment Examples

Managing isolated environments for conflicting dependencies:

# Set up the TensorFlow Privacy environment
secureml environments setup-tf-privacy

# Check if environments are properly configured
secureml environments info

# Force recreation of an environment
secureml environments setup-tf-privacy --force

Key Management Examples

Working with encryption keys (requires HashiCorp Vault):

# Configure Vault connection
secureml keys configure-vault \
    --vault-url https://vault.example.com:8200 \
    --vault-token hvs.example_token \
    --vault-path secureml

# Test Vault connection
secureml keys configure-vault --test-connection

# Generate a new encryption key
secureml keys generate-key \
    --key-name patient_data_key \
    --length 32 \
    --encoding hex

# Retrieve a key
secureml keys get-key \
    --key-name patient_data_key \
    --encoding base64

Using environment variables for safer key management:

# Set environment variables instead of passing tokens directly
export SECUREML_VAULT_URL=https://vault.example.com:8200
export SECUREML_VAULT_TOKEN=hvs.example_token

# The command now uses environment variables automatically
secureml keys get-key --key-name patient_data_key

End-to-End Example Workflow

A complete workflow for processing sensitive health data:

# 1. Check compliance of the original dataset
secureml compliance check patient_data.csv \
    --regulation GDPR \
    --output compliance_original.html \
    --format html

# 2. Anonymize the dataset for safe processing
secureml anonymization k-anonymize patient_data.csv anonymized_data.csv \
    --quasi-id age --quasi-id zipcode \
    --sensitive diagnosis --sensitive income \
    --k 3

# 3. Check compliance of the anonymized dataset
secureml compliance check anonymized_data.csv \
    --regulation GDPR \
    --output compliance_anonymized.html \
    --format html

# 4. Generate synthetic data for sharing with researchers
secureml synthetic generate anonymized_data.csv synthetic_data.csv \
    --method statistical \
    --auto-detect-sensitive \
    --samples 1000

# 5. Final compliance check on the synthetic data
secureml compliance check synthetic_data.csv \
    --regulation GDPR \
    --output compliance_synthetic.html \
    --format html

Processing Multiple Files

Example shell script for batch processing:

#!/bin/bash

# Directory containing data files
DATA_DIR="patient_data"

# Process each CSV file in the directory
for file in "$DATA_DIR"/*.csv; do
    filename=$(basename "$file" .csv)

    echo "Processing $filename..."

    # Check compliance
    secureml compliance check "$file" \
        --regulation GDPR \
        --output "reports/${filename}_compliance.html" \
        --format html

    # Anonymize data
    secureml anonymization k-anonymize "$file" \
        "anonymized/${filename}_anon.csv" \
        --quasi-id age --quasi-id zipcode \
        --sensitive diagnosis --sensitive income \
        --k 3

    # Generate synthetic data
    secureml synthetic generate "anonymized/${filename}_anon.csv" \
        "synthetic/${filename}_synth.csv" \
        --method statistical \
        --samples 1000

    echo "$filename completed."
done

echo "All files processed."

Performance Considerations

For large datasets, consider these performance tips:

Batch processing: Process large files in batches rather than all at once
Sample data first: Test your commands on a small sample before processing the entire dataset
Choose appropriate output formats: For large datasets, parquet format may be more efficient
Monitor resources: Some operations (especially GAN-based synthetic data generation) can be resource-intensive

# Process only a subset of records for testing
head -n 1000 large_dataset.csv > sample_dataset.csv

# Test your workflow on the sample
secureml synthetic generate sample_dataset.csv synthetic_sample.csv \
    --method statistical \
    --samples 500

# If satisfied, process the full dataset with parquet output
secureml synthetic generate large_dataset.csv synthetic_full.parquet \
    --method statistical \
    --samples 10000 \
    --format parquet