Audit Trails

Audit trails provide a chronological record of all data operations and model activities, which is critical for compliance with privacy regulations and for ensuring accountability in machine learning systems. SecureML offers audit trail capabilities to track all privacy-relevant operations throughout the ML lifecycle.

Core Concepts

Audit Events: Discrete actions or operations captured in the audit trail, such as data access, model training, or prediction requests.

Immutability: Ensuring audit logs cannot be altered or tampered with after they are created.

Granularity: Different levels of detail in audit logs, from high-level system events to fine-grained data access patterns.

Compliance Integration: Connecting audit trails to specific compliance requirements and regulations.

Basic Usage

Creating an Audit Trail

To create an audit trail for your SecureML application:

from secureml.audit import AuditTrail

# Initialize an audit trail
audit = AuditTrail(
    operation_name='credit_risk_model_training',
    log_dir='audit_logs/',  # Optional: directory for storing logs
    log_level=20,  # Optional: logging level (default: INFO)
    context={'app_version': '1.0.0'},  # Optional: context to include in all logs
    regulations=['GDPR', 'CCPA']  # Optional: regulations this operation should comply with
)

# The audit trail will be automatically initialized with a unique operation_id
# and start_time, which are included in all subsequent logs

Logging Data Access Events

Track when sensitive data is accessed:

# Log a data access event
audit.log_data_access(
    dataset_name='customer_financial_data',
    columns_accessed=['income', 'credit_score', 'loan_history'],
    num_records=5000,
    purpose='model_training',
    user='analyst_123'  # Optional: user who performed the access
)

Logging Data Transformations

Track data transformations:

# Log a data transformation event
audit.log_data_transformation(
    transformation_type='anonymization',
    input_data='raw_customer_data',
    output_data='anonymized_customer_data',
    parameters={
        'method': 'k-anonymity',
        'k': 5,
        'quasi_identifiers': ['age', 'zipcode', 'gender']
    }
)

Logging Model Operations

Track model-related activities:

# Log model training event
audit.log_model_training(
    model_type='random_forest',
    dataset_name='customer_data_anonymized',
    parameters={'n_estimators': 100, 'max_depth': 10},
    metrics={'accuracy': 0.92, 'auc': 0.88},
    privacy_parameters={'epsilon': 1.0, 'delta': 1e-5}
)

# Log model inference event
audit.log_model_inference(
    model_id='credit_risk_classifier_v1',
    input_data='customer_application_123',
    output='high_risk',
    confidence=0.85
)

Logging Compliance Checks

Track compliance verification:

# Log a compliance check
audit.log_compliance_check(
    check_type='data_minimization',
    regulation='GDPR',
    result=True,  # True = passed, False = failed
    details={
        'columns_before': 25,
        'columns_after': 10,
        'columns_removed': ['unnecessary_field_1', 'unnecessary_field_2']
    }
)

Logging User Requests

Track GDPR/CCPA user requests:

# Log a user request (e.g., GDPR right to access)
audit.log_user_request(
    request_type='data_access_request',
    user_id='user_12345',
    details={
        'request_date': '2023-06-20',
        'data_categories': ['personal_info', 'financial_data']
    },
    status='completed'
)

Closing the Audit Trail

Properly close the audit trail when the operation is complete:

# Close the audit trail
audit.close(
    status='completed',  # Or 'error', 'cancelled', etc.
    details={
        'execution_time': 125.7,
        'output_location': 'models/credit_risk_v1.pkl'
    }
)

Advanced Techniques

Using the Audit Function Decorator

Automatically audit function calls:

from secureml.audit import audit_function

# Create a decorated function
@audit_function(
    operation_name='data_preprocessing',
    log_dir='audit_logs',
    regulations=['GDPR']
)
def process_sensitive_data(data, anonymize=True):
    # Function implementation...
    return processed_data

# When this function is called, the audit trail will automatically:
# 1. Log the function call with parameters
# 2. Log the return value or any exceptions
# 3. Close the audit trail

Retrieving Audit Logs

Retrieve and analyze audit logs:

from secureml.audit import get_audit_logs

# Get logs for a specific operation
logs = get_audit_logs(
    operation_id='12345-abcde-67890',  # Optional: specific operation ID
    operation_name='credit_risk_model_training',  # Optional: operation name
    start_time='2023-01-01T00:00:00',  # Optional: filter by start time
    end_time='2023-06-30T23:59:59',  # Optional: filter by end time
    log_dir='audit_logs'  # Optional: directory containing logs
)

# Analyze the logs
for log in logs:
    print(f"Event: {log['event_type']} - Time: {log['timestamp']}")

Integration with Reporting

Using the ReportGenerator

Generate HTML or PDF reports from audit logs:

from secureml.reporting import ReportGenerator

# Create a report generator
generator = ReportGenerator()

# Generate an audit report
report_path = generator.generate_audit_report(
    logs=logs,  # Logs retrieved with get_audit_logs
    output_file='audit_report.pdf',
    title='Credit Risk Model Audit Report',
    logo_path='company_logo.png',  # Optional
    include_charts=True  # Optional: include visualizations
)

print(f"Audit report generated at: {report_path}")

Integration with Compliance Checking

Audit trails can be automatically created when performing compliance checks:

from secureml.compliance import ComplianceAuditor

# Create a compliance auditor with audit integration
auditor = ComplianceAuditor(
    regulation='GDPR',
    log_dir='audit_logs'  # This enables automatic audit trail creation
)

# The audit trails for all operations will be stored in the log directory
dataset_report = auditor.audit_dataset(
    dataset=df,
    dataset_name='patient_records'
)

Best Practices

  1. Start early: Enable audit trails from the beginning of your project, not as an afterthought

  2. Be comprehensive: Log all privacy-relevant operations, not just the obvious ones

  3. Use proper granularity: Balance between logging too much (performance impact) and too little (missing important events)

  4. Secure audit logs: Implement proper access controls for log files

  5. Regular reviews: Periodically review audit logs for anomalies or compliance issues

  6. Contextual information: Include sufficient context in each log entry to understand the operation’s purpose

  7. Automation: Use the audit_function decorator for critical operations

  8. User attribution: Always include user information when logging events to ensure accountability

  9. Purpose tracking: Record the purpose for data access and processing to demonstrate compliance with purpose limitation principles

  10. Privacy by design: Implement privacy-preserving audit logs that don’t themselves become a privacy risk

Further Reading

  • Audit Trail API - Complete API reference for audit trail functions

  • Audit Trail Examples - More examples of audit trail implementation

  • /compliance/audit_requirements - Audit requirements for different regulations