Audit Trail API

This module provides tools for creating and managing audit logs for ML operations, helping to document data processing and model decisions for compliance purposes.

AuditTrail Class

class secureml.audit.AuditTrail(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)

Class for managing audit trails in SecureML operations.

The AuditTrail class provides methods for logging operations on datasets and models, making it easier to track data transformations and model decisions for compliance purposes.

__init__(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)

Initialize an audit trail for an operation.

Args:: operation_name: Name of the operation being audited log_dir: Directory to store log files (default: secureml_audit_logs) log_level: Logging level to use context: Additional context information to include in all logs regulations: List of regulations this audit trail is tracking compliance with

close(status: str = 'completed', details: Dict[str, Any] | None = None) → None

Close the audit trail.

Args:: status: Final status of the operation details: Additional details about the operation’s completion

log_compliance_check(check_type: str, regulation: str, result: bool, details: Dict[str, Any]) → None

Log a compliance check.

Args:: check_type: Type of compliance check regulation: Regulation being checked result: Result of the check (True=passed, False=failed) details: Details about the check

log_data_access(dataset_name: str, columns_accessed: List[str], num_records: int, purpose: str, user: str | None = None) → None

Log access to a dataset.

Args:: dataset_name: Name of the dataset being accessed columns_accessed: List of columns accessed num_records: Number of records accessed purpose: Purpose of the access user: User who performed the access

log_data_transformation(transformation_type: str, input_data: str, output_data: str, parameters: Dict[str, Any]) → None

Log a data transformation.

Args:: transformation_type: Type of transformation (e.g., anonymization, encryption) input_data: Description of input data output_data: Description of output data parameters: Parameters used for the transformation

log_error(error_type: str, message: str, details: Dict[str, Any] | None = None) → None

Log an error.

Args:: error_type: Type of error message: Error message details: Additional details about the error

log_event(event_type: str, details: Dict[str, Any]) → None

Log an event to the audit trail.

Args:: event_type: Type of event being logged details: Details about the event

log_model_inference(model_id: str, input_data: str, output: Any, confidence: float | None = None) → None

Log model inference.

Args:: model_id: Identifier for the model input_data: Description of input data output: Model output confidence: Confidence score for the output

log_model_training(model_type: str, dataset_name: str, parameters: Dict[str, Any], metrics: Dict[str, Any] | None = None, privacy_parameters: Dict[str, Any] | None = None) → None

Log model training.

Args:: model_type: Type of model being trained dataset_name: Name of the dataset used for training parameters: Training parameters metrics: Training metrics privacy_parameters: Privacy parameters used (e.g., epsilon for DP)

log_user_request(request_type: str, user_id: str, details: Dict[str, Any], status: str) → None

Log a user request (e.g., GDPR right to access).

Args:: request_type: Type of request user_id: ID of the user making the request details: Details about the request status: Status of the request

setup_file_logging() → None: Setup file logging for the audit trail.

The AuditTrail class provides a comprehensive way to track operations in your machine learning pipeline. It records various events with timestamps and context information, creating an immutable record that can be used for compliance purposes.

Basic Usage Example:

from secureml.audit import AuditTrail

# Create an audit trail for a model training operation
audit = AuditTrail(
    operation_name="model_training",
    context={"model_version": "v1.0", "environment": "production"},
    regulations=["GDPR", "HIPAA"]
)

# Log events during your operation
audit.log_data_access(
    dataset_name="patient_records",
    columns_accessed=["age", "diagnosis", "treatment"],
    num_records=1000,
    purpose="training disease prediction model",
    user="data_scientist_1"
)

# Close the audit trail when done
audit.close()

Utility Functions

Audit Function Decorator

secureml.audit.audit_function(operation_name: str | None = None, log_dir: str | None = None, regulations: List[str] | None = None) → Callable

Decorator for auditing function calls.

Args:: operation_name: Name of the operation (defaults to function name) log_dir: Directory to store audit logs regulations: List of regulations this function should comply with
Returns:: Decorated function with audit trail

The audit_function decorator provides a simple way to add auditing to any function:

from secureml.audit import audit_function

@audit_function(regulations=["GDPR"])
def train_model(data, params):
    # Function implementation
    return model

Log Retrieval

Retrieve audit logs for analysis.

Args:: operation_id: ID of the operation to retrieve logs for operation_name: Name of the operation to retrieve logs for start_time: Start time for logs (ISO format) end_time: End time for logs (ISO format) log_dir: Directory containing audit logs
Returns:: List of audit log entries matching the criteria

This function allows you to retrieve and analyze audit logs:

from secureml.audit import get_audit_logs

# Get all logs for a specific operation
logs = get_audit_logs(
    operation_name="model_training",
    start_time="2023-01-01T00:00:00",
    end_time="2023-01-31T23:59:59"
)

Configuration

The audit module uses these default configuration values:

DEFAULT_LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
DEFAULT_LOG_LEVEL = logging.INFO
DEFAULT_LOG_DIR = "secureml_audit_logs"

You can override these by providing custom parameters when creating an AuditTrail instance.

Working with Regulations

The audit trail system is designed to support compliance with various regulations including:

GDPR: General Data Protection Regulation
HIPAA: Health Insurance Portability and Accountability Act
CCPA: California Consumer Privacy Act

When initializing an AuditTrail, you can specify which regulations apply:

audit = AuditTrail(
    operation_name="credit_scoring",
    regulations=["GDPR", "CCPA"]
)

# This will be recorded in the audit logs for compliance reporting
audit.log_compliance_check(
    check_type="data_access_permission",
    regulation="GDPR",
    result=True,
    details={"user_consent_obtained": True, "legal_basis": "legitimate_interest"}
)

Best Practices

Start early: Begin auditing from the earliest stages of your ML project
Be comprehensive: Log all significant operations and decisions
Include context: Add relevant context to your audit logs
Use consistent naming: Maintain consistent operation names and event types
Automate: Use the audit_function decorator to automatically audit functions
Regular review: Periodically review audit logs for compliance