Audit Trail API

This module provides tools for creating and managing audit logs for ML operations, helping to document data processing and model decisions for compliance purposes.

AuditTrail Class

class secureml.audit.AuditTrail(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)

Class for managing audit trails in SecureML operations.

The AuditTrail class provides methods for logging operations on datasets and models, making it easier to track data transformations and model decisions for compliance purposes.

__init__(operation_name: str, log_dir: str | None = None, log_level: int = 20, context: Dict[str, Any] | None = None, regulations: List[str] | None = None)

Initialize an audit trail for an operation.

Args:

operation_name: Name of the operation being audited log_dir: Directory to store log files (default: secureml_audit_logs) log_level: Logging level to use context: Additional context information to include in all logs regulations: List of regulations this audit trail is tracking compliance with

close(status: str = 'completed', details: Dict[str, Any] | None = None) None

Close the audit trail.

Args:

status: Final status of the operation details: Additional details about the operation’s completion

log_compliance_check(check_type: str, regulation: str, result: bool, details: Dict[str, Any]) None

Log a compliance check.

Args:

check_type: Type of compliance check regulation: Regulation being checked result: Result of the check (True=passed, False=failed) details: Details about the check

log_data_access(dataset_name: str, columns_accessed: List[str], num_records: int, purpose: str, user: str | None = None) None

Log access to a dataset.

Args:

dataset_name: Name of the dataset being accessed columns_accessed: List of columns accessed num_records: Number of records accessed purpose: Purpose of the access user: User who performed the access

log_data_transformation(transformation_type: str, input_data: str, output_data: str, parameters: Dict[str, Any]) None

Log a data transformation.

Args:

transformation_type: Type of transformation (e.g., anonymization, encryption) input_data: Description of input data output_data: Description of output data parameters: Parameters used for the transformation

log_error(error_type: str, message: str, details: Dict[str, Any] | None = None) None

Log an error.

Args:

error_type: Type of error message: Error message details: Additional details about the error

log_event(event_type: str, details: Dict[str, Any]) None

Log an event to the audit trail.

Args:

event_type: Type of event being logged details: Details about the event

log_model_inference(model_id: str, input_data: str, output: Any, confidence: float | None = None) None

Log model inference.

Args:

model_id: Identifier for the model input_data: Description of input data output: Model output confidence: Confidence score for the output

log_model_training(model_type: str, dataset_name: str, parameters: Dict[str, Any], metrics: Dict[str, Any] | None = None, privacy_parameters: Dict[str, Any] | None = None) None

Log model training.

Args:

model_type: Type of model being trained dataset_name: Name of the dataset used for training parameters: Training parameters metrics: Training metrics privacy_parameters: Privacy parameters used (e.g., epsilon for DP)

log_user_request(request_type: str, user_id: str, details: Dict[str, Any], status: str) None

Log a user request (e.g., GDPR right to access).

Args:

request_type: Type of request user_id: ID of the user making the request details: Details about the request status: Status of the request

setup_file_logging() None

Setup file logging for the audit trail.

The AuditTrail class provides a comprehensive way to track operations in your machine learning pipeline. It records various events with timestamps and context information, creating an immutable record that can be used for compliance purposes.

Basic Usage Example:

from secureml.audit import AuditTrail

# Create an audit trail for a model training operation
audit = AuditTrail(
    operation_name="model_training",
    context={"model_version": "v1.0", "environment": "production"},
    regulations=["GDPR", "HIPAA"]
)

# Log events during your operation
audit.log_data_access(
    dataset_name="patient_records",
    columns_accessed=["age", "diagnosis", "treatment"],
    num_records=1000,
    purpose="training disease prediction model",
    user="data_scientist_1"
)

# Close the audit trail when done
audit.close()

Utility Functions

Audit Function Decorator

secureml.audit.audit_function(operation_name: str | None = None, log_dir: str | None = None, regulations: List[str] | None = None) Callable

Decorator for auditing function calls.

Args:

operation_name: Name of the operation (defaults to function name) log_dir: Directory to store audit logs regulations: List of regulations this function should comply with

Returns:

Decorated function with audit trail

The audit_function decorator provides a simple way to add auditing to any function:

from secureml.audit import audit_function

@audit_function(regulations=["GDPR"])
def train_model(data, params):
    # Function implementation
    return model

Log Retrieval

secureml.audit.get_audit_logs(operation_id: str | None = None, operation_name: str | None = None, start_time: str | None = None, end_time: str | None = None, log_dir: str | None = None) List[Dict[str, Any]]

Retrieve audit logs for analysis.

Args:

operation_id: ID of the operation to retrieve logs for operation_name: Name of the operation to retrieve logs for start_time: Start time for logs (ISO format) end_time: End time for logs (ISO format) log_dir: Directory containing audit logs

Returns:

List of audit log entries matching the criteria

This function allows you to retrieve and analyze audit logs:

from secureml.audit import get_audit_logs

# Get all logs for a specific operation
logs = get_audit_logs(
    operation_name="model_training",
    start_time="2023-01-01T00:00:00",
    end_time="2023-01-31T23:59:59"
)

Configuration

The audit module uses these default configuration values:

DEFAULT_LOG_FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
DEFAULT_LOG_LEVEL = logging.INFO
DEFAULT_LOG_DIR = "secureml_audit_logs"

You can override these by providing custom parameters when creating an AuditTrail instance.

Working with Regulations

The audit trail system is designed to support compliance with various regulations including:

  • GDPR: General Data Protection Regulation

  • HIPAA: Health Insurance Portability and Accountability Act

  • CCPA: California Consumer Privacy Act

When initializing an AuditTrail, you can specify which regulations apply:

audit = AuditTrail(
    operation_name="credit_scoring",
    regulations=["GDPR", "CCPA"]
)

# This will be recorded in the audit logs for compliance reporting
audit.log_compliance_check(
    check_type="data_access_permission",
    regulation="GDPR",
    result=True,
    details={"user_consent_obtained": True, "legal_basis": "legitimate_interest"}
)

Best Practices

  1. Start early: Begin auditing from the earliest stages of your ML project

  2. Be comprehensive: Log all significant operations and decisions

  3. Include context: Add relevant context to your audit logs

  4. Use consistent naming: Maintain consistent operation names and event types

  5. Automate: Use the audit_function decorator to automatically audit functions

  6. Regular review: Periodically review audit logs for compliance