=================
Compliance API
=================

.. module:: secureml.compliance

This module provides tools to verify that datasets and models comply with privacy regulations like GDPR, CCPA, and HIPAA.

Main Functions
-------------

.. autofunction:: check_compliance

This is the main function for checking compliance with privacy regulations:

.. code-block:: python

    from secureml.compliance import check_compliance
    
    # Check a dataset for GDPR compliance
    report = check_compliance(
        data=my_dataframe,
        regulation="GDPR"
    )
    
    # Check if any issues were found
    if report.has_issues():
        print(report)

Compliance Reports
-----------------

.. autoclass:: ComplianceReport
   :members:
   :special-members: __init__

The `ComplianceReport` class contains the results of a compliance check and provides methods for accessing and displaying those results:

.. code-block:: python

    # Access report summary
    summary = report.summary()
    
    # Get detailed information
    if report.has_issues():
        for issue in report.issues:
            print(f"Issue: {issue['issue']}")
            print(f"Severity: {issue['severity']}")
            print(f"Recommendation: {issue['recommendation']}")

Compliance Auditor
------------------

.. autoclass:: ComplianceAuditor
   :members:
   :special-members: __init__

The `ComplianceAuditor` class provides a higher-level interface for conducting compliance audits of ML pipelines, generating comprehensive audit trails, and producing detailed reports:

.. code-block:: python

    from secureml.compliance import ComplianceAuditor
    
    # Create an auditor for GDPR compliance
    auditor = ComplianceAuditor(regulation="GDPR")
    
    # Audit a dataset
    dataset_report = auditor.audit_dataset(
        dataset=my_dataframe,
        dataset_name="customer_data"
    )
    
    # Audit a model
    model_report = auditor.audit_model(
        model_config=model_params,
        model_name="credit_scoring_model"
    )
    
    # Audit an entire ML pipeline
    pipeline_report = auditor.audit_pipeline(
        dataset=my_dataframe,
        dataset_name="customer_data",
        model=my_model,
        model_name="credit_scoring_model",
        preprocessing_steps=preprocessing_config
    )
    
    # Generate a PDF report
    auditor.generate_pdf(
        pipeline_report,
        output_file="compliance_report.pdf",
        title="ML Pipeline Compliance Audit"
    )

Data Identification Functions
----------------------------

.. autofunction:: identify_personal_data

This function identifies personal data in a dataset:

.. code-block:: python

    from secureml.compliance import identify_personal_data
    
    # Identify personal data in a dataframe
    personal_data_info = identify_personal_data(
        data=my_dataframe,
        max_samples=200  # Analyze up to 200 samples for text content
    )
    
    # Check which columns contain personal data
    personal_columns = personal_data_info["columns"]
    
    # Check what personal data was found in text content
    content_findings = personal_data_info["content_findings"]

.. autofunction:: identify_phi

This function identifies Protected Health Information (PHI) in a dataset:

.. code-block:: python

    from secureml.compliance import identify_phi
    
    # Identify PHI in a healthcare dataset
    phi_info = identify_phi(
        data=healthcare_data,
        max_samples=100
    )
    
    # Check which columns contain PHI
    phi_columns = phi_info["columns"]

NLP Utilities
------------

.. autofunction:: get_nlp_model

This function loads and caches a SpaCy NLP model for text analysis:

.. code-block:: python

    from secureml.compliance import get_nlp_model
    
    # Get the default SpaCy model
    nlp = get_nlp_model()
    
    # Analyze text for entities
    doc = nlp("Patient John Doe was diagnosed with hypertension.")
    entities = [(ent.text, ent.label_) for ent in doc.ents]

Working with Regulation Presets
------------------------------

The compliance module uses regulation-specific presets that define rules and checks for each regulation. These presets are loaded from the `secureml.presets` module:

.. code-block:: python

    from secureml.presets import list_available_presets, load_preset, get_preset_field
    
    # List available regulations
    regulations = list_available_presets()  # Returns ['gdpr', 'ccpa', 'hipaa', ...]
    
    # Load GDPR preset
    gdpr_preset = load_preset("gdpr")
    
    # Get specific field from a preset
    personal_data_identifiers = get_preset_field("gdpr", "personal_data_identifiers")

Supported Regulations
--------------------

The module currently supports compliance checks for:

1. **GDPR** (General Data Protection Regulation)
   - Checks for personal data and special categories
   - Verifies data minimization
   - Checks for consent metadata
   - Verifies right-to-be-forgotten support

2. **CCPA** (California Consumer Privacy Act)
   - Checks for personal information disclosure
   - Verifies opt-out options for data sharing
   - Checks deletion request support

3. **HIPAA** (Health Insurance Portability and Accountability Act)
   - Identifies Protected Health Information (PHI)
   - Checks for proper de-identification
   - Verifies data encryption

Best Practices
-------------

1. **Regular audits**: Run compliance checks regularly, especially before training models
2. **Document remediation**: Document how compliance issues were addressed
3. **Multi-regulation**: Check against all regulations applicable to your jurisdiction
4. **Full pipeline**: Audit the entire ML pipeline, not just individual components
5. **Update checks**: Keep regulation presets updated as laws and interpretations change