Audit Trail API
This module provides tools for creating and managing audit logs for ML operations, helping to document data processing and model decisions for compliance purposes.
AuditTrail Class
- class secureml.audit.AuditTrail(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)
Class for managing audit trails in SecureML operations.
The AuditTrail class provides methods for logging operations on datasets and models, making it easier to track data transformations and model decisions for compliance purposes.
- __init__(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)
Initialize an audit trail for an operation.
- Args:
operation_name: Name of the operation being audited log_dir: Directory to store log files (default: secureml_audit_logs) log_level: Logging level to use context: Additional context information to include in all logs regulations: List of regulations this audit trail is tracking compliance with
- close(status: str = 'completed', details: Dict[str, Any] | None = None) None
Close the audit trail.
- Args:
status: Final status of the operation details: Additional details about the operation’s completion
- log_compliance_check(check_type: str, regulation: str, result: bool, details: Dict[str, Any]) None
Log a compliance check.
- Args:
check_type: Type of compliance check regulation: Regulation being checked result: Result of the check (True=passed, False=failed) details: Details about the check
- log_data_access(dataset_name: str, columns_accessed: List[str], num_records: int, purpose: str, user: str | None = None) None
Log access to a dataset.
- Args:
dataset_name: Name of the dataset being accessed columns_accessed: List of columns accessed num_records: Number of records accessed purpose: Purpose of the access user: User who performed the access
- log_data_transformation(transformation_type: str, input_data: str, output_data: str, parameters: Dict[str, Any]) None
Log a data transformation.
- Args:
transformation_type: Type of transformation (e.g., anonymization, encryption) input_data: Description of input data output_data: Description of output data parameters: Parameters used for the transformation
- log_error(error_type: str, message: str, details: Dict[str, Any] | None = None) None
Log an error.
- Args:
error_type: Type of error message: Error message details: Additional details about the error
- log_event(event_type: str, details: Dict[str, Any]) None
Log an event to the audit trail.
- Args:
event_type: Type of event being logged details: Details about the event
- log_model_inference(model_id: str, input_data: str, output: Any, confidence: float | None = None) None
Log model inference.
- Args:
model_id: Identifier for the model input_data: Description of input data output: Model output confidence: Confidence score for the output
- log_model_training(model_type: str, dataset_name: str, parameters: Dict[str, Any], metrics: Dict[str, Any] | None = None, privacy_parameters: Dict[str, Any] | None = None) None
Log model training.
- Args:
model_type: Type of model being trained dataset_name: Name of the dataset used for training parameters: Training parameters metrics: Training metrics privacy_parameters: Privacy parameters used (e.g., epsilon for DP)
- log_user_request(request_type: str, user_id: str, details: Dict[str, Any], status: str) None
Log a user request (e.g., GDPR right to access).
- Args:
request_type: Type of request user_id: ID of the user making the request details: Details about the request status: Status of the request
- setup_file_logging() None
Setup file logging for the audit trail.
The AuditTrail class provides a comprehensive way to track operations in your machine learning pipeline. It records various events with timestamps and context information, creating an immutable record that can be used for compliance purposes.
Basic Usage Example:
from secureml.audit import AuditTrail
# Create an audit trail for a model training operation
audit = AuditTrail(
operation_name="model_training",
context={"model_version": "v1.0", "environment": "production"},
regulations=["GDPR", "HIPAA"]
)
# Log events during your operation
audit.log_data_access(
dataset_name="patient_records",
columns_accessed=["age", "diagnosis", "treatment"],
num_records=1000,
purpose="training disease prediction model",
user="data_scientist_1"
)
# Close the audit trail when done
audit.close()
Utility Functions
Audit Function Decorator
- secureml.audit.audit_function(operation_name: str | None = None, log_dir: str | None = None, regulations: List[str] | None = None) Callable
Decorator for auditing function calls.
- Args:
operation_name: Name of the operation (defaults to function name) log_dir: Directory to store audit logs regulations: List of regulations this function should comply with
- Returns:
Decorated function with audit trail
The audit_function decorator provides a simple way to add auditing to any function:
from secureml.audit import audit_function
@audit_function(regulations=["GDPR"])
def train_model(data, params):
# Function implementation
return model
Log Retrieval
- secureml.audit.get_audit_logs(operation_id: str | None = None, operation_name: str | None = None, start_time: str | None = None, end_time: str | None = None, log_dir: str | None = None) List[Dict[str, Any]]
Retrieve audit logs for analysis.
- Args:
operation_id: ID of the operation to retrieve logs for operation_name: Name of the operation to retrieve logs for start_time: Start time for logs (ISO format) end_time: End time for logs (ISO format) log_dir: Directory containing audit logs
- Returns:
List of audit log entries matching the criteria
This function allows you to retrieve and analyze audit logs:
from secureml.audit import get_audit_logs
# Get all logs for a specific operation
logs = get_audit_logs(
operation_name="model_training",
start_time="2023-01-01T00:00:00",
end_time="2023-01-31T23:59:59"
)
Configuration
The audit module uses these default configuration values:
DEFAULT_LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
DEFAULT_LOG_LEVEL = logging.INFO
DEFAULT_LOG_DIR = "secureml_audit_logs"
You can override these by providing custom parameters when creating an AuditTrail instance.
Working with Regulations
The audit trail system is designed to support compliance with various regulations including:
GDPR: General Data Protection Regulation
HIPAA: Health Insurance Portability and Accountability Act
CCPA: California Consumer Privacy Act
When initializing an AuditTrail, you can specify which regulations apply:
audit = AuditTrail(
operation_name="credit_scoring",
regulations=["GDPR", "CCPA"]
)
# This will be recorded in the audit logs for compliance reporting
audit.log_compliance_check(
check_type="data_access_permission",
regulation="GDPR",
result=True,
details={"user_consent_obtained": True, "legal_basis": "legitimate_interest"}
)
Best Practices
Start early: Begin auditing from the earliest stages of your ML project
Be comprehensive: Log all significant operations and decisions
Include context: Add relevant context to your audit logs
Use consistent naming: Maintain consistent operation names and event types
Automate: Use the audit_function decorator to automatically audit functions
Regular review: Periodically review audit logs for compliance