Audit Trails
Audit trails provide a chronological record of all data operations and model activities, which is critical for compliance with privacy regulations and for ensuring accountability in machine learning systems. SecureML offers audit trail capabilities to track all privacy-relevant operations throughout the ML lifecycle.
Core Concepts
Audit Events: Discrete actions or operations captured in the audit trail, such as data access, model training, or prediction requests.
Immutability: Ensuring audit logs cannot be altered or tampered with after they are created.
Granularity: Different levels of detail in audit logs, from high-level system events to fine-grained data access patterns.
Compliance Integration: Connecting audit trails to specific compliance requirements and regulations.
Basic Usage
Creating an Audit Trail
To create an audit trail for your SecureML application:
from secureml.audit import AuditTrail
# Initialize an audit trail
audit = AuditTrail(
operation_name='credit_risk_model_training',
log_dir='audit_logs/', # Optional: directory for storing logs
log_level=20, # Optional: logging level (default: INFO)
context={'app_version': '1.0.0'}, # Optional: context to include in all logs
regulations=['GDPR', 'CCPA'] # Optional: regulations this operation should comply with
)
# The audit trail will be automatically initialized with a unique operation_id
# and start_time, which are included in all subsequent logs
Logging Data Access Events
Track when sensitive data is accessed:
# Log a data access event
audit.log_data_access(
dataset_name='customer_financial_data',
columns_accessed=['income', 'credit_score', 'loan_history'],
num_records=5000,
purpose='model_training',
user='analyst_123' # Optional: user who performed the access
)
Logging Data Transformations
Track data transformations:
# Log a data transformation event
audit.log_data_transformation(
transformation_type='anonymization',
input_data='raw_customer_data',
output_data='anonymized_customer_data',
parameters={
'method': 'k-anonymity',
'k': 5,
'quasi_identifiers': ['age', 'zipcode', 'gender']
}
)
Logging Model Operations
Track model-related activities:
# Log model training event
audit.log_model_training(
model_type='random_forest',
dataset_name='customer_data_anonymized',
parameters={'n_estimators': 100, 'max_depth': 10},
metrics={'accuracy': 0.92, 'auc': 0.88},
privacy_parameters={'epsilon': 1.0, 'delta': 1e-5}
)
# Log model inference event
audit.log_model_inference(
model_id='credit_risk_classifier_v1',
input_data='customer_application_123',
output='high_risk',
confidence=0.85
)
Logging Compliance Checks
Track compliance verification:
# Log a compliance check
audit.log_compliance_check(
check_type='data_minimization',
regulation='GDPR',
result=True, # True = passed, False = failed
details={
'columns_before': 25,
'columns_after': 10,
'columns_removed': ['unnecessary_field_1', 'unnecessary_field_2']
}
)
Logging User Requests
Track GDPR/CCPA user requests:
# Log a user request (e.g., GDPR right to access)
audit.log_user_request(
request_type='data_access_request',
user_id='user_12345',
details={
'request_date': '2023-06-20',
'data_categories': ['personal_info', 'financial_data']
},
status='completed'
)
Closing the Audit Trail
Properly close the audit trail when the operation is complete:
# Close the audit trail
audit.close(
status='completed', # Or 'error', 'cancelled', etc.
details={
'execution_time': 125.7,
'output_location': 'models/credit_risk_v1.pkl'
}
)
Advanced Techniques
Using the Audit Function Decorator
Automatically audit function calls:
from secureml.audit import audit_function
# Create a decorated function
@audit_function(
operation_name='data_preprocessing',
log_dir='audit_logs',
regulations=['GDPR']
)
def process_sensitive_data(data, anonymize=True):
# Function implementation...
return processed_data
# When this function is called, the audit trail will automatically:
# 1. Log the function call with parameters
# 2. Log the return value or any exceptions
# 3. Close the audit trail
Retrieving Audit Logs
Retrieve and analyze audit logs:
from secureml.audit import get_audit_logs
# Get logs for a specific operation
logs = get_audit_logs(
operation_id='12345-abcde-67890', # Optional: specific operation ID
operation_name='credit_risk_model_training', # Optional: operation name
start_time='2023-01-01T00:00:00', # Optional: filter by start time
end_time='2023-06-30T23:59:59', # Optional: filter by end time
log_dir='audit_logs' # Optional: directory containing logs
)
# Analyze the logs
for log in logs:
print(f"Event: {log['event_type']} - Time: {log['timestamp']}")
Integration with Reporting
Using the ReportGenerator
Generate HTML or PDF reports from audit logs:
from secureml.reporting import ReportGenerator
# Create a report generator
generator = ReportGenerator()
# Generate an audit report
report_path = generator.generate_audit_report(
logs=logs, # Logs retrieved with get_audit_logs
output_file='audit_report.pdf',
title='Credit Risk Model Audit Report',
logo_path='company_logo.png', # Optional
include_charts=True # Optional: include visualizations
)
print(f"Audit report generated at: {report_path}")
Integration with Compliance Checking
Audit trails can be automatically created when performing compliance checks:
from secureml.compliance import ComplianceAuditor
# Create a compliance auditor with audit integration
auditor = ComplianceAuditor(
regulation='GDPR',
log_dir='audit_logs' # This enables automatic audit trail creation
)
# The audit trails for all operations will be stored in the log directory
dataset_report = auditor.audit_dataset(
dataset=df,
dataset_name='patient_records'
)
Best Practices
Start early: Enable audit trails from the beginning of your project, not as an afterthought
Be comprehensive: Log all privacy-relevant operations, not just the obvious ones
Use proper granularity: Balance between logging too much (performance impact) and too little (missing important events)
Secure audit logs: Implement proper access controls for log files
Regular reviews: Periodically review audit logs for anomalies or compliance issues
Contextual information: Include sufficient context in each log entry to understand the operation’s purpose
Automation: Use the audit_function decorator for critical operations
User attribution: Always include user information when logging events to ensure accountability
Purpose tracking: Record the purpose for data access and processing to demonstrate compliance with purpose limitation principles
Privacy by design: Implement privacy-preserving audit logs that don’t themselves become a privacy risk
Further Reading
Audit Trail API - Complete API reference for audit trail functions
Audit Trail Examples - More examples of audit trail implementation
/compliance/audit_requirements - Audit requirements for different regulations