Intent Model

Purpose

This document defines the future canonical model for transforming voice, text, and UI interaction into structured runtime intent.

It is the bridge between raw interaction and orchestrated system behavior.

Related docs:

Core principle

At the orchestration layer:

voice input
text input
UI action

should all normalize into the same conceptual contract.

Recommended equivalence:

raw modality-specific input -> InteractionEvent
interpreted objective -> StructuredIntent

This allows the runtime system to reason consistently regardless of whether a user typed, spoke, or clicked.

Canonical interaction objects

`InteractionEvent`

The normalized envelope for any inbound interaction.

Recommended fields:

interactionId
actorId
orgId
workspaceId
sourceType
- voice
- text
- ui
channelType
rawInput
normalizedText
uiAction
timestamp
sessionId
conversationId
contextSnapshot

Notes:

normalizedText should exist for voice and text flows
UI actions may also carry semantic labels that can be normalized into text-like intent hints

`StructuredIntent`

The actionable representation of what the user is trying to achieve.

Recommended fields:

intentId
goal
actionType
subject
entities
constraints
requestedOutcome
confidence
ambiguityLevel
riskClass
approvalClass
recommendedWorkflow
requiredCapabilities

Intent lifecycle

flowchart TD
  interactionEvent[InteractionEvent]
  normalization[Normalization]
  intentInference[IntentInference]
  ambiguityCheck[AmbiguityCheck]
  policyCheck[PolicyPrecheck]
  structuredIntent[StructuredIntent]

  interactionEvent --> normalization
  normalization --> intentInference
  intentInference --> ambiguityCheck
  ambiguityCheck --> policyCheck
  policyCheck --> structuredIntent

Stage 1: Normalization

Convert modality-specific input into a consistent structure.

Examples:

voice -> transcript + metadata
text -> cleaned text + metadata
UI click -> semantic action + local context + optional generated intent hint

Stage 2: Intent inference

Infer:

what the user wants
which entities are involved
whether the system likely can fulfill it

Stage 3: Ambiguity check

Determine whether the runtime can proceed safely or must clarify.

Common ambiguity sources:

missing entity
missing target workspace
conflicting instructions
low confidence
policy-sensitive action

Stage 4: Policy precheck

Before orchestration proceeds, intent must be classified for:

access scope
sensitivity
approval requirements
tool eligibility
tenant boundary risk

Confidence and ambiguity model

Confidence

Confidence represents how strongly the system believes it understands the user goal.

Suggested bands:

high
medium
low

Ambiguity

Ambiguity represents how much clarification is still required before safe execution.

Suggested bands:

none
resolvable
blocking

Rules:

low confidence does not always require a stop, but it should increase validation pressure
blocking ambiguity should prevent autonomous execution

Approval and action classes

Every structured intent should carry an execution sensitivity classification.

Suggested classes:

informational
recommendation
draft_only
state_change
sensitive_state_change
compliance_sensitive

Examples:

asking a question -> informational
suggesting the next setup step -> recommendation
drafting a prompt revision -> draft_only
changing a workspace setting -> state_change
deleting a channel or modifying permissions -> sensitive_state_change

Design constraints

Intent models must remain workspace-aware and tenant-safe.
UI interactions should not bypass the intent and policy model for sensitive actions.
Voice inputs must preserve modality-specific metadata, even when normalized to text.
The intent model should support clarification before execution.
Intent contracts must remain stable enough to support analytics, recommendations, and workflow replay.

MVP-compatible interpretation

The MVP-safe path is:

start with text and UI interactions
model them as InteractionEvent
generate StructuredIntent for a small number of high-value workflows
add voice later as another input mode mapped into the same contract

Voice should be an extension of the intent model, not the first dependency of the architecture.