CLI Examples =========== This section demonstrates how to use the SecureML command-line interface through practical examples. You can use these examples as a starting point for your own privacy-preserving data workflows. Basic Setup ---------- To use the CLI, make sure you have SecureML installed: .. code-block:: bash pip install secureml Getting help information: .. code-block:: bash # Show general help secureml --help # Show help for a specific command secureml anonymization --help # Show version information secureml --version Anonymization Examples ------------------- Applying k-anonymity to protect sensitive data: .. code-block:: bash # Basic k-anonymity with k=3 secureml anonymization k-anonymize patient_data.csv anonymized_data.csv \ --quasi-id age --quasi-id zipcode \ --sensitive diagnosis --sensitive income \ --k 3 # Using a different output format secureml anonymization k-anonymize patient_data.csv anonymized_data.json \ --quasi-id age --quasi-id zipcode \ --sensitive diagnosis \ --k 2 \ --format json Compliance Checking Examples ------------------------- Verifying compliance with privacy regulations: .. code-block:: bash # Basic GDPR compliance check secureml compliance check patient_data.csv \ --regulation GDPR # Compliance check with metadata and HTML output secureml compliance check patient_data.csv \ --regulation GDPR \ --metadata metadata.json \ --output gdpr_report.html \ --format html Example metadata.json file: .. code-block:: json { "description": "Patient health data", "data_owner": "Example Hospital", "data_retention_period": "5 years", "data_encrypted": true, "data_storage_location": "EU", "consent_obtained": true, "consent_date": "2023-01-15" } Checking both dataset and model compliance: .. code-block:: bash # Comprehensive HIPAA compliance check secureml compliance check patient_data.csv \ --regulation HIPAA \ --metadata metadata.json \ --model-config model_config.json \ --output hipaa_report.pdf \ --format pdf Example model_config.json file: .. code-block:: json { "model_type": "RandomForestClassifier", "parameters": { "n_estimators": 100, "max_depth": 5 }, "supports_forget_request": true, "supports_deletion_request": true, "data_processing_purpose": "Medical diagnosis prediction", "model_storage_location": "EU" } Synthetic Data Generation Examples ------------------------------- Creating synthetic datasets based on real data: .. code-block:: bash # Basic statistical synthesis secureml synthetic generate patient_data.csv synthetic_data.csv \ --method statistical \ --samples 1000 # Auto-detecting sensitive columns secureml synthetic generate patient_data.csv synthetic_data.csv \ --method statistical \ --auto-detect-sensitive \ --sensitivity-confidence 0.7 \ --sensitivity-sample-size 200 \ --samples 1000 # Using GAN-based synthesis with specific sensitive columns secureml synthetic generate patient_data.csv synthetic_data.parquet \ --method gan \ --sensitive name --sensitive email --sensitive diagnosis \ --epochs 300 --batch-size 32 \ --samples 500 \ --format parquet Regulation Presets Examples ------------------------ Working with regulation presets: .. code-block:: bash # List all available regulation presets secureml presets list # View the GDPR preset secureml presets show gdpr # Extract just the personal data identifiers field from GDPR secureml presets show gdpr --field personal_data_identifiers # Save the entire HIPAA preset to a file secureml presets show hipaa --output hipaa_preset.json Isolated Environment Examples -------------------------- Managing isolated environments for conflicting dependencies: .. code-block:: bash # Set up the TensorFlow Privacy environment secureml environments setup-tf-privacy # Check if environments are properly configured secureml environments info # Force recreation of an environment secureml environments setup-tf-privacy --force Key Management Examples -------------------- Working with encryption keys (requires HashiCorp Vault): .. code-block:: bash # Configure Vault connection secureml keys configure-vault \ --vault-url https://vault.example.com:8200 \ --vault-token hvs.example_token \ --vault-path secureml # Test Vault connection secureml keys configure-vault --test-connection # Generate a new encryption key secureml keys generate-key \ --key-name patient_data_key \ --length 32 \ --encoding hex # Retrieve a key secureml keys get-key \ --key-name patient_data_key \ --encoding base64 Using environment variables for safer key management: .. code-block:: bash # Set environment variables instead of passing tokens directly export SECUREML_VAULT_URL=https://vault.example.com:8200 export SECUREML_VAULT_TOKEN=hvs.example_token # The command now uses environment variables automatically secureml keys get-key --key-name patient_data_key End-to-End Example Workflow ------------------------- A complete workflow for processing sensitive health data: .. code-block:: bash # 1. Check compliance of the original dataset secureml compliance check patient_data.csv \ --regulation GDPR \ --output compliance_original.html \ --format html # 2. Anonymize the dataset for safe processing secureml anonymization k-anonymize patient_data.csv anonymized_data.csv \ --quasi-id age --quasi-id zipcode \ --sensitive diagnosis --sensitive income \ --k 3 # 3. Check compliance of the anonymized dataset secureml compliance check anonymized_data.csv \ --regulation GDPR \ --output compliance_anonymized.html \ --format html # 4. Generate synthetic data for sharing with researchers secureml synthetic generate anonymized_data.csv synthetic_data.csv \ --method statistical \ --auto-detect-sensitive \ --samples 1000 # 5. Final compliance check on the synthetic data secureml compliance check synthetic_data.csv \ --regulation GDPR \ --output compliance_synthetic.html \ --format html Processing Multiple Files ----------------------- Example shell script for batch processing: .. code-block:: bash #!/bin/bash # Directory containing data files DATA_DIR="patient_data" # Process each CSV file in the directory for file in "$DATA_DIR"/*.csv; do filename=$(basename "$file" .csv) echo "Processing $filename..." # Check compliance secureml compliance check "$file" \ --regulation GDPR \ --output "reports/${filename}_compliance.html" \ --format html # Anonymize data secureml anonymization k-anonymize "$file" \ "anonymized/${filename}_anon.csv" \ --quasi-id age --quasi-id zipcode \ --sensitive diagnosis --sensitive income \ --k 3 # Generate synthetic data secureml synthetic generate "anonymized/${filename}_anon.csv" \ "synthetic/${filename}_synth.csv" \ --method statistical \ --samples 1000 echo "$filename completed." done echo "All files processed." Performance Considerations ------------------------ For large datasets, consider these performance tips: 1. **Batch processing**: Process large files in batches rather than all at once 2. **Sample data first**: Test your commands on a small sample before processing the entire dataset 3. **Choose appropriate output formats**: For large datasets, parquet format may be more efficient 4. **Monitor resources**: Some operations (especially GAN-based synthetic data generation) can be resource-intensive .. code-block:: bash # Process only a subset of records for testing head -n 1000 large_dataset.csv > sample_dataset.csv # Test your workflow on the sample secureml synthetic generate sample_dataset.csv synthetic_sample.csv \ --method statistical \ --samples 500 # If satisfied, process the full dataset with parquet output secureml synthetic generate large_dataset.csv synthetic_full.parquet \ --method statistical \ --samples 10000 \ --format parquet