=============
Audit Trails
=============

Audit trails provide a chronological record of all data operations and model activities, which is critical for compliance with privacy regulations and for ensuring accountability in machine learning systems. SecureML offers audit trail capabilities to track all privacy-relevant operations throughout the ML lifecycle.

Core Concepts
------------

**Audit Events**: Discrete actions or operations captured in the audit trail, such as data access, model training, or prediction requests.

**Immutability**: Ensuring audit logs cannot be altered or tampered with after they are created.

**Granularity**: Different levels of detail in audit logs, from high-level system events to fine-grained data access patterns.

**Compliance Integration**: Connecting audit trails to specific compliance requirements and regulations.

Basic Usage
----------

Creating an Audit Trail
^^^^^^^^^^^^^^^^^^^

To create an audit trail for your SecureML application:

.. code-block:: python

    from secureml.audit import AuditTrail
    
    # Initialize an audit trail
    audit = AuditTrail(
        operation_name='credit_risk_model_training',
        log_dir='audit_logs/',  # Optional: directory for storing logs
        log_level=20,  # Optional: logging level (default: INFO)
        context={'app_version': '1.0.0'},  # Optional: context to include in all logs
        regulations=['GDPR', 'CCPA']  # Optional: regulations this operation should comply with
    )
    
    # The audit trail will be automatically initialized with a unique operation_id
    # and start_time, which are included in all subsequent logs

Logging Data Access Events
^^^^^^^^^^^^^^^^^^^^^^^

Track when sensitive data is accessed:

.. code-block:: python

    # Log a data access event
    audit.log_data_access(
        dataset_name='customer_financial_data',
        columns_accessed=['income', 'credit_score', 'loan_history'],
        num_records=5000,
        purpose='model_training',
        user='analyst_123'  # Optional: user who performed the access
    )

Logging Data Transformations
^^^^^^^^^^^^^^^^^^^^^

Track data transformations:

.. code-block:: python

    # Log a data transformation event
    audit.log_data_transformation(
        transformation_type='anonymization',
        input_data='raw_customer_data',
        output_data='anonymized_customer_data',
        parameters={
            'method': 'k-anonymity',
            'k': 5,
            'quasi_identifiers': ['age', 'zipcode', 'gender']
        }
    )

Logging Model Operations
^^^^^^^^^^^^^^^^^^^^^

Track model-related activities:

.. code-block:: python

    # Log model training event
    audit.log_model_training(
        model_type='random_forest',
        dataset_name='customer_data_anonymized',
        parameters={'n_estimators': 100, 'max_depth': 10},
        metrics={'accuracy': 0.92, 'auc': 0.88},
        privacy_parameters={'epsilon': 1.0, 'delta': 1e-5}
    )
    
    # Log model inference event
    audit.log_model_inference(
        model_id='credit_risk_classifier_v1',
        input_data='customer_application_123',
        output='high_risk',
        confidence=0.85
    )

Logging Compliance Checks
^^^^^^^^^^^^^^^^^^^^^

Track compliance verification:

.. code-block:: python

    # Log a compliance check
    audit.log_compliance_check(
        check_type='data_minimization',
        regulation='GDPR',
        result=True,  # True = passed, False = failed
        details={
            'columns_before': 25,
            'columns_after': 10,
            'columns_removed': ['unnecessary_field_1', 'unnecessary_field_2']
        }
    )

Logging User Requests
^^^^^^^^^^^^^^^^^^^^^

Track GDPR/CCPA user requests:

.. code-block:: python

    # Log a user request (e.g., GDPR right to access)
    audit.log_user_request(
        request_type='data_access_request',
        user_id='user_12345',
        details={
            'request_date': '2023-06-20',
            'data_categories': ['personal_info', 'financial_data']
        },
        status='completed'
    )

Closing the Audit Trail
^^^^^^^^^^^^^^^^^^^^^

Properly close the audit trail when the operation is complete:

.. code-block:: python

    # Close the audit trail
    audit.close(
        status='completed',  # Or 'error', 'cancelled', etc.
        details={
            'execution_time': 125.7,
            'output_location': 'models/credit_risk_v1.pkl'
        }
    )

Advanced Techniques
------------------

Using the Audit Function Decorator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Automatically audit function calls:

.. code-block:: python

    from secureml.audit import audit_function
    
    # Create a decorated function
    @audit_function(
        operation_name='data_preprocessing',
        log_dir='audit_logs',
        regulations=['GDPR']
    )
    def process_sensitive_data(data, anonymize=True):
        # Function implementation...
        return processed_data
    
    # When this function is called, the audit trail will automatically:
    # 1. Log the function call with parameters
    # 2. Log the return value or any exceptions
    # 3. Close the audit trail

Retrieving Audit Logs
^^^^^^^^^^^^^^^^^^^^^

Retrieve and analyze audit logs:

.. code-block:: python

    from secureml.audit import get_audit_logs
    
    # Get logs for a specific operation
    logs = get_audit_logs(
        operation_id='12345-abcde-67890',  # Optional: specific operation ID
        operation_name='credit_risk_model_training',  # Optional: operation name
        start_time='2023-01-01T00:00:00',  # Optional: filter by start time
        end_time='2023-06-30T23:59:59',  # Optional: filter by end time
        log_dir='audit_logs'  # Optional: directory containing logs
    )
    
    # Analyze the logs
    for log in logs:
        print(f"Event: {log['event_type']} - Time: {log['timestamp']}")

Integration with Reporting
-------------------------

Using the ReportGenerator
^^^^^^^^^^^^^^^^^^^^^

Generate HTML or PDF reports from audit logs:

.. code-block:: python

    from secureml.reporting import ReportGenerator
    
    # Create a report generator
    generator = ReportGenerator()
    
    # Generate an audit report
    report_path = generator.generate_audit_report(
        logs=logs,  # Logs retrieved with get_audit_logs
        output_file='audit_report.pdf',
        title='Credit Risk Model Audit Report',
        logo_path='company_logo.png',  # Optional
        include_charts=True  # Optional: include visualizations
    )
    
    print(f"Audit report generated at: {report_path}")

Integration with Compliance Checking
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Audit trails can be automatically created when performing compliance checks:

.. code-block:: python

    from secureml.compliance import ComplianceAuditor
    
    # Create a compliance auditor with audit integration
    auditor = ComplianceAuditor(
        regulation='GDPR',
        log_dir='audit_logs'  # This enables automatic audit trail creation
    )
    
    # The audit trails for all operations will be stored in the log directory
    dataset_report = auditor.audit_dataset(
        dataset=df,
        dataset_name='patient_records'
    )

Best Practices
-------------

1. **Start early**: Enable audit trails from the beginning of your project, not as an afterthought

2. **Be comprehensive**: Log all privacy-relevant operations, not just the obvious ones

3. **Use proper granularity**: Balance between logging too much (performance impact) and too little (missing important events)

4. **Secure audit logs**: Implement proper access controls for log files

5. **Regular reviews**: Periodically review audit logs for anomalies or compliance issues

6. **Contextual information**: Include sufficient context in each log entry to understand the operation's purpose

7. **Automation**: Use the audit_function decorator for critical operations

8. **User attribution**: Always include user information when logging events to ensure accountability

9. **Purpose tracking**: Record the purpose for data access and processing to demonstrate compliance with purpose limitation principles

10. **Privacy by design**: Implement privacy-preserving audit logs that don't themselves become a privacy risk

Further Reading
-------------

* :doc:`/api/audit` - Complete API reference for audit trail functions
* :doc:`/examples/audit` - More examples of audit trail implementation
* :doc:`/compliance/audit_requirements` - Audit requirements for different regulations