HAI Data flow

Data Flow View

This document describes the primary data flows within the hai-agents application. The system is designed around a central "Asset" entity and uses a tRPC-based API for interactions.

High-Level Flow

The application's core purpose is to ingest, analyze, and report on digital "Assets" against a wide range of criteria.


graph TD;
    subgraph "User / Client"
        A["Frontend (Web UI)"];
    end

    subgraph "Backend API (tRPC)"
        B["/trpc/{proxy+} endpoint"];
        C["asset.ts Router"];
        D["Other Routers (User, Admin, etc.)"];
    end

    subgraph "Core Logic & Processing"
        E["Asset Reconciliation & Evaluation Engine"];
        F["External AI/ML Services (Azure, Google, etc.)"];
        G["Event Bus"];
    end

    subgraph "Data Stores"
        H["Database (Postgres)"];
        I["S3 Buckets (CriteriaResult, etc.)"];
    end

    A --> B;
    B --> C;
    B --> D;

    C -- "Triggers (e.g., populate, reconcile)" --> E;
    C -- "CRUD Operations" --> H;
    C -- "Reads Results" --> I;

    E -- "Sends jobs" --> G;
    E -- "Calls for analysis" --> F;
    E -- "Writes results" --> I;
    E -- "Updates status" --> H;

    G -- "Triggers async processing" --> E;
    F -- "Returns analysis" --> E;

Detailed Data Flow for Asset Evaluation

Initiation: A user interacts with the Frontend to create an Asset or trigger an evaluation. This calls a tRPC mutation in the asset.ts router via the main API endpoint.

Triggering: Procedures like populate or reconcileMany in the asset.ts router kick off the main business logic. This likely involves creating records in the Database and sending a message to the Event Bus to start an asynchronous process.

Asynchronous Processing: An Asset Reconciliation & Evaluation Engine (likely composed of several Lambda functions) picks up the message from the Event Bus.

External Analysis: The engine makes calls to various External AI/ML Services (e.g., Azure OpenAI, Google Generative AI) to perform specific analyses based on the defined Criteria (e.g., toxicity detection, red teaming, hallucination checks).

Storing Results: The results from the external services are processed and stored in the appropriate S3 Buckets (e.g., CriteriaResultBucket, ClusteringResultBucket). The status of the evaluation is updated in the main Database.

Retrieval: The user can then view the results on the Frontend. The frontend calls tRPC query procedures (e.g., getToxicityResults, getRedTeamingResults) which read the processed data directly from the S3 buckets or the database.

HAI Data flow

Data Flow View

High-Level Flow

Detailed Data Flow for Asset Evaluation

Recommendations