Skip to content

Intent Model

This document defines the future canonical model for transforming voice, text, and UI interaction into structured runtime intent.

It is the bridge between raw interaction and orchestrated system behavior.

Related docs:


At the orchestration layer:

  • voice input
  • text input
  • UI action

should all normalize into the same conceptual contract.

Recommended equivalence:

  • raw modality-specific input -> InteractionEvent
  • interpreted objective -> StructuredIntent

This allows the runtime system to reason consistently regardless of whether a user typed, spoke, or clicked.


The normalized envelope for any inbound interaction.

Recommended fields:

  • interactionId
  • actorId
  • orgId
  • workspaceId
  • sourceType
    • voice
    • text
    • ui
  • channelType
  • rawInput
  • normalizedText
  • uiAction
  • timestamp
  • sessionId
  • conversationId
  • contextSnapshot

Notes:

  • normalizedText should exist for voice and text flows
  • UI actions may also carry semantic labels that can be normalized into text-like intent hints

The actionable representation of what the user is trying to achieve.

Recommended fields:

  • intentId
  • goal
  • actionType
  • subject
  • entities
  • constraints
  • requestedOutcome
  • confidence
  • ambiguityLevel
  • riskClass
  • approvalClass
  • recommendedWorkflow
  • requiredCapabilities

flowchart TD
interactionEvent[InteractionEvent]
normalization[Normalization]
intentInference[IntentInference]
ambiguityCheck[AmbiguityCheck]
policyCheck[PolicyPrecheck]
structuredIntent[StructuredIntent]
interactionEvent --> normalization
normalization --> intentInference
intentInference --> ambiguityCheck
ambiguityCheck --> policyCheck
policyCheck --> structuredIntent

Convert modality-specific input into a consistent structure.

Examples:

  • voice -> transcript + metadata
  • text -> cleaned text + metadata
  • UI click -> semantic action + local context + optional generated intent hint

Infer:

  • what the user wants
  • which entities are involved
  • whether the system likely can fulfill it

Determine whether the runtime can proceed safely or must clarify.

Common ambiguity sources:

  • missing entity
  • missing target workspace
  • conflicting instructions
  • low confidence
  • policy-sensitive action

Before orchestration proceeds, intent must be classified for:

  • access scope
  • sensitivity
  • approval requirements
  • tool eligibility
  • tenant boundary risk

Confidence represents how strongly the system believes it understands the user goal.

Suggested bands:

  • high
  • medium
  • low

Ambiguity represents how much clarification is still required before safe execution.

Suggested bands:

  • none
  • resolvable
  • blocking

Rules:

  • low confidence does not always require a stop, but it should increase validation pressure
  • blocking ambiguity should prevent autonomous execution

Every structured intent should carry an execution sensitivity classification.

Suggested classes:

  • informational
  • recommendation
  • draft_only
  • state_change
  • sensitive_state_change
  • compliance_sensitive

Examples:

  • asking a question -> informational
  • suggesting the next setup step -> recommendation
  • drafting a prompt revision -> draft_only
  • changing a workspace setting -> state_change
  • deleting a channel or modifying permissions -> sensitive_state_change

  1. Intent models must remain workspace-aware and tenant-safe.
  2. UI interactions should not bypass the intent and policy model for sensitive actions.
  3. Voice inputs must preserve modality-specific metadata, even when normalized to text.
  4. The intent model should support clarification before execution.
  5. Intent contracts must remain stable enough to support analytics, recommendations, and workflow replay.

The MVP-safe path is:

  • start with text and UI interactions
  • model them as InteractionEvent
  • generate StructuredIntent for a small number of high-value workflows
  • add voice later as another input mode mapped into the same contract

Voice should be an extension of the intent model, not the first dependency of the architecture.