Report Generation Examples

This section demonstrates how to create compliance and audit reports with SecureML. Reports are critical for documenting privacy and compliance measures for regulators, auditors, and stakeholders.

Basic Compliance Reports

The simplest way to generate a compliance report is from an existing ComplianceReport object:

from secureml.reporting import ReportGenerator
from secureml.compliance import ComplianceReport

# Create a compliance report
report = ComplianceReport("GDPR")

# Add passed checks
report.add_passed_check("Data minimization principle")
report.add_passed_check("Explicit consent obtained")

# Add warnings
report.add_warning(
    component="Data Storage",
    warning="Data retention period not specified",
    recommendation="Define explicit data retention periods"
)

# Add issues
report.add_issue(
    component="Sensitive Data",
    issue="Email addresses not encrypted",
    severity="medium",
    recommendation="Apply encryption to email fields"
)

# Create a report generator
generator = ReportGenerator()

# Generate HTML report
output_file = "compliance_report.html"
report_path = generator.generate_compliance_report(
    report=report,
    output_file=output_file,
    include_charts=True
)

print(f"HTML report generated and saved to: {output_file}")

The generated HTML report includes:

A summary of the compliance status
Charts showing issues by severity
Lists of passed checks, warnings, and issues
Recommendations for addressing each issue

Reports From Compliance Checks

You can generate reports directly from compliance checks on datasets:

import pandas as pd
from secureml.compliance import check_compliance
from secureml.reporting import ReportGenerator

# Sample data with sensitive information
data = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe'],
    'age': [34, 29],
    'email': ['john.smith@example.com', 'jane.doe@example.com'],
    'phone': ['555-123-4567', '555-234-5678'],
    'ssn': ['123-45-6789', '234-56-7890'],
    'medical_condition': ['Diabetes', 'None'],
    'income': [65000, 72000]
})

# Check compliance with GDPR
report = check_compliance(
    data=data,
    regulation="GDPR",
    max_samples=10
)

# Create a report generator
generator = ReportGenerator()

# Generate HTML report
output_file = "dataset_compliance_report.html"
report_path = generator.generate_compliance_report(
    report=report,
    output_file=output_file,
    include_charts=True
)

This workflow is particularly useful for: - Documenting dataset compliance before ML model training - Regular compliance audits of data processing systems - Demonstrating compliance to privacy officers and regulators

Audit Reports

You can generate reports from audit logs to track data processing operations:

from secureml.reporting import ReportGenerator
from secureml.audit import get_audit_logs

# Retrieve audit logs
logs = get_audit_logs(
    operation_name="model_training",
    start_time="2023-01-01T00:00:00",
    end_time="2023-01-31T23:59:59"
)

# Create a report generator
generator = ReportGenerator()

# Generate HTML report
output_file = "audit_report.html"
report_path = generator.generate_audit_report(
    logs=logs,
    output_file=output_file,
    title="ML Model Training Audit Report",
    include_charts=True,
    additional_context={
        "regulations": ["GDPR", "HIPAA"],
        "data_owner": "Research Department",
        "report_purpose": "Regulatory compliance verification"
    }
)

For a complete audit trail workflow, you can create an audit trail, log events, and then generate a report:

from secureml.audit import AuditTrail, get_audit_logs
from secureml.reporting import ReportGenerator

# Create an audit trail
audit = AuditTrail(
    operation_name="model_training",
    regulations=["GDPR", "HIPAA"]
)

# Log events
audit.log_data_access(
    dataset_name="patient_data",
    columns_accessed=["age", "gender", "blood_pressure"],
    num_records=1000,
    purpose="Training disease prediction model"
)

audit.log_model_training(
    model_type="RandomForest",
    dataset_name="patient_data_anonymized",
    parameters={"n_estimators": 100, "max_depth": 10},
    metrics={"accuracy": 0.85, "auc": 0.91},
    privacy_parameters={"anonymization": "k_anonymity_5"}
)

# Close the audit trail
audit.close()

# Retrieve the audit logs
logs = get_audit_logs(operation_name="model_training")

# Generate a report
generator = ReportGenerator()
generator.generate_audit_report(
    logs=logs,
    output_file="model_training_audit.html",
    title="Model Training Audit Report",
    include_charts=True
)

Customizing Reports

You can customize reports with logos and custom CSS:

# Define custom CSS
custom_css = """
body {
    font-family: Arial, sans-serif;
    line-height: 1.6;
    color: #333;
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
    background-color: #f9f9f9;
}

.report-header {
    background-color: #3498db;
    color: white;
    padding: 20px;
    border-radius: 5px;
    margin-bottom: 30px;
}

.high {
    color: #e74c3c;
    font-weight: bold;
}

.medium {
    color: #f39c12;
    font-weight: bold;
}

.low {
    color: #3498db;
    font-weight: bold;
}
"""

# Create a report generator with custom CSS
generator = ReportGenerator(custom_css=custom_css)

# Generate report with a logo
report_path = generator.generate_compliance_report(
    report=compliance_report,
    output_file="custom_report.html",
    logo_path="company_logo.png",
    include_charts=True,
    additional_context={
        "organization": "Example Corporation",
        "department": "Data Science Team",
        "project": "Customer Behavior Analysis"
    }
)

Combined Reports

You can create combined reports that include both compliance and audit information:

import os
from datetime import datetime
from secureml.reporting import ReportGenerator

# First, create separate reports
generator = ReportGenerator()

# Generate compliance report to a temporary file
compliance_file = "temp_compliance.html"
generator.generate_compliance_report(
    report=compliance_report,
    output_file=compliance_file,
    include_charts=True
)

# Generate audit report to a temporary file
audit_file = "temp_audit.html"
generator.generate_audit_report(
    logs=audit_logs,
    output_file=audit_file,
    title="ML Operation Audit Trail",
    include_charts=True
)

# Read the contents of both files
with open(compliance_file, 'r') as f:
    compliance_content = f.read()

with open(audit_file, 'r') as f:
    audit_content = f.read()

# Extract the main content from each
compliance_body = compliance_content.split('<body>')[1].split('</body>')[0]
audit_body = audit_content.split('<body>')[1].split('</body>')[0]

# Create a combined HTML file
combined_html = f"""<!DOCTYPE html>
<html>
<head>
    <title>Combined Compliance and Audit Report</title>
    <style>
        body {{
            font-family: Arial, sans-serif;
            line-height: 1.6;
            color: #333;
            max-width: 1200px;
            margin: 0 auto;
            padding: 20px;
        }}

        .report-section {{
            margin-bottom: 30px;
            border: 1px solid #ddd;
            padding: 20px;
            border-radius: 5px;
        }}

        /* Preserve styling from original reports */
        {generator._get_css()}
    </style>
</head>
<body>
    <h1>Combined Compliance and Audit Report</h1>
    <p>Generated on {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}</p>

    <div class="report-section">
        <h2>Compliance Report</h2>
        {compliance_body}
    </div>

    <div class="report-section">
        <h2>Audit Report</h2>
        {audit_body}
    </div>
</body>
</html>
"""

# Write the combined file
combined_file = "combined_report.html"
with open(combined_file, 'w') as f:
    f.write(combined_html)

# Clean up temporary files
os.remove(compliance_file)
os.remove(audit_file)

Scheduled Reports

For regular reporting, you can set up automated report generation:

from datetime import datetime, timedelta
import os
from secureml.audit import get_audit_logs
from secureml.reporting import ReportGenerator

def generate_weekly_report(output_dir="reports"):
    """Generate a weekly report from logs."""
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Calculate date range for the previous week
    end_date = datetime.now()
    start_date = end_date - timedelta(days=7)

    # Format for log retrieval
    start_time = start_date.isoformat()
    end_time = end_date.isoformat()

    # Retrieve logs
    logs = get_audit_logs(
        start_time=start_time,
        end_time=end_time
    )

    if not logs:
        print(f"No logs found for period {start_date} to {end_date}")
        return None

    # Generate report
    generator = ReportGenerator()
    output_file = f"{output_dir}/weekly_report_{start_date.strftime('%Y%m%d')}_{end_date.strftime('%Y%m%d')}.html"

    report_path = generator.generate_audit_report(
        logs=logs,
        output_file=output_file,
        title=f"Weekly Audit Report: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}",
        include_charts=True,
        additional_context={
            "report_type": "Weekly",
            "generated_by": "Automated System"
        }
    )

    return report_path

# This function can be called by a scheduler (cron, Windows Task Scheduler, Airflow, etc.)
# Example cron entry (Linux/Mac) for running every Sunday at midnight:
# 0 0 * * 0 python /path/to/generate_weekly_report.py

Complete Example

Here’s a complete example that generates a comprehensive report for a privacy-preserving ML pipeline:

import pandas as pd
import os
from datetime import datetime

from secureml.reporting import ReportGenerator
from secureml.compliance import check_compliance, ComplianceReport
from secureml.audit import AuditTrail, get_audit_logs

# Create output directory
report_dir = "privacy_reports"
os.makedirs(report_dir, exist_ok=True)

# 1. Create an audit trail for the entire process
audit = AuditTrail(
    operation_name="ml_pipeline_execution",
    regulations=["GDPR"],
    context={"project": "Customer Churn Prediction"}
)

# 2. Load and check the dataset
try:
    # Log the data access
    audit.log_data_access(
        dataset_name="customer_data",
        columns_accessed=["id", "age", "account_balance", "transaction_history", "email"],
        num_records=10000,
        purpose="Churn prediction model training",
        user="data_scientist_1"
    )

    # Simulate loading data
    data = pd.DataFrame({
        'id': range(1, 5),
        'age': [34, 29, 42, 35],
        'email': ['john@example.com', 'jane@example.com', 'robert@example.com', 'emily@example.com'],
        'account_balance': [5000, 12000, 3000, 8000],
        'churn_risk': [0.2, 0.1, 0.7, 0.3]
    })

    # Check compliance
    audit.log_event(
        "compliance_checking",
        {"dataset": "customer_data", "regulation": "GDPR"}
    )

    compliance_report = check_compliance(
        data=data,
        regulation="GDPR",
        max_samples=100
    )

    audit.log_compliance_check(
        check_type="dataset_compliance",
        regulation="GDPR",
        result=not compliance_report.has_issues(),
        details={
            "issues_count": len(compliance_report.issues),
            "warnings_count": len(compliance_report.warnings)
        }
    )

    # 3. Apply anonymization (simulated)
    audit.log_data_transformation(
        transformation_type="anonymization",
        input_data="customer_data",
        output_data="customer_data_anonymized",
        parameters={"method": "k_anonymity", "k": 5}
    )

    # 4. Train model (simulated)
    audit.log_model_training(
        model_type="RandomForest",
        dataset_name="customer_data_anonymized",
        parameters={"n_estimators": 100, "max_depth": 10},
        metrics={"accuracy": 0.85, "auc": 0.91, "f1": 0.87},
        privacy_parameters={"anonymization": "k_anonymity_5"}
    )

    # 5. Close the audit trail
    audit.close("completed")

    # 6. Generate reports
    generator = ReportGenerator()

    # Compliance report
    compliance_file = f"{report_dir}/compliance_report.html"
    generator.generate_compliance_report(
        report=compliance_report,
        output_file=compliance_file,
        include_charts=True
    )

    # Audit report
    logs = get_audit_logs(operation_name="ml_pipeline_execution")
    audit_file = f"{report_dir}/audit_report.html"
    generator.generate_audit_report(
        logs=logs,
        output_file=audit_file,
        title="ML Pipeline Execution Audit",
        include_charts=True
    )

    print(f"Report generation completed. Reports saved to {report_dir}")

except Exception as e:
    # Log the error
    audit.log_error(
        error_type=type(e).__name__,
        message=str(e)
    )
    audit.close("error")
    raise

Best Practices

Be consistent with reporting: Generate reports at regular intervals and after significant ML operations.
Include context: Add metadata like project name, department, and purpose to make reports more meaningful.
Customize reports for different audiences: - Technical teams need detailed error messages and code references - Management needs high-level summaries and risk assessments - Regulators need compliance status and evidence of controls
Store reports securely: Reports often contain sensitive information about vulnerabilities.
Automate report generation: Set up scheduled tasks for regular reporting.
Include visual elements: Charts and graphs make reports more understandable.
Provide actionable recommendations: Every issue should have a clear recommendation.
Establish a reporting workflow: Define who receives reports and how issues are addressed.