The Missing Message Role

The conversation model assumes two actors, a user who types and an assistant who responds. Turn by turn, back and forth. It's clean and intuitive, but it doesn't survive contact with reality.

Real agents receive events that no human typed. Webhooks fire. Emails arrive. Delivery statuses update. Scheduled tasks trigger. An inbound email from a business can arrive at any time, hours or days after the last conversation turn. That email needs to reach the agent, and the agent needs to act on it.

The naive approach is to push the event into message history and let the model figure it out.

messages.push({
  role: 'user',
  content: JSON.stringify(webhookPayload),
});

This works technically. But the agent doesn't know the difference between a human typing and a webhook firing. So it responds like someone just handed it a gift. "Thanks for letting me know about that email!"

The Role Problem

The conversation model gives you three message roles. None of them are designed for async events.

System role. Seems like the obvious choice, since these are system-level events. But system prompts are for persistent instructions, not per-event context injected mid-conversation. More practically, Gemini will reject system messages injected mid-conversation entirely. You'd be fighting the abstraction.

Assistant role. Puts words in the agent's mouth. You're creating conversation history the agent didn't produce, and the model may "continue" a pattern it didn't start. If you inject an assistant message that says "I see an email arrived," the model has no reason to question it. It'll build on that fabricated history.

User role. Closest fit. The agent naturally responds to user messages, so it'll process the event and act on it. But it creates the problem above. The model assumes a human is talking, so it behaves like one is. Acknowledgements, pleasantries, "I received your message" preambles.

The user role is the right answer. But you need a way to signal that this particular user message isn't from a user.

Hijacking the User Turn

XML tags solve this. LLMs are trained on XML-heavy data. Tags create clear semantic boundaries that models respect without special training. You can define your own tags and reference them in prompt instructions.

The pattern is simple. Wrap your async event in <synthetic-message> tags, inject it as a user-role message, and teach the agent to process it silently.

Prompt Reinforcement

Tags alone aren't enough. Without explicit instructions, the agent might reference the tag name, acknowledge the mechanism, or preface its response with "I see that a synthetic message was received..." You're relying on the model to intuit that the tags are internal plumbing. Sometimes it will. Sometimes it won't.

The fix is to be explicit.

Messages wrapped in `<synthetic-message>` tags are internal system instructions.
Process the content directly without acknowledging the tags or the message itself.
Your response must never mention "synthetic message" or indicate you received
a special instruction.

With this instruction, the agent processes the event and responds naturally. The user has no idea there's a <synthetic-message> tag behind the scenes. That's the point.

From Webhook to Agent

The same event serves two consumers. The frontend needs typed structured data to render rich UI. The LLM needs natural language it can reason about. You store one, and transform it into the other.

Stored Part

{

type: 'data-email-inbound',

data: {

messageId: 'abc-123-def',

from: 'support@telstra.com.au',

subject: 'Re: Billing Dispute...',

body: 'Hi Sarah...'

}

UI Render

support@telstra.com.au

Re: Billing Dispute Case #78234

Hi Sarah,

Thank you for your patience. We have reviewed your case and approved a credit of $47.50 to your account.

LLM Text

<synthetic-message>

An inbound email was received with the following data:

{

"messageId": "abc-123-def",

"from": "support@telstra...",

"subject": "Re: Billing...",

"body": "Hi Sarah..."

}

</synthetic-message>

The first panel is the typed data in your database. The second is what the user sees in the UI. The third is what the model receives after transformation. Same event, three representations.

We built this on the Vercel AI SDK, which gives you message parts for attaching structured data to messages. Parts aren't something the LLM knows about. They live in your database and your frontend. Before calling the model, every message passes through a transformation function.

export const convertMessagePartsToText = ({ parts, ...message }) => ({
  ...message,
  parts: parts.flatMap((part) => {
    if (part.type === 'data-email-inbound') {
      return [{
        type: 'text',
        text: stripIndents`
          <synthetic-message>
            An inbound email was received with the following data:
            ${encode(part.data)}

            Process this email and inform the user about the next steps.
            Do not invent, infer, or fabricate any response content.
          </synthetic-message>
        `,
      }];
    }

    return [part];
  }),
});

Events arrive through webhook handlers and follow the same flow. Extract the data, build a typed part, append it to conversation history, and trigger the agent.

await createAgentSyntheticMessage({
  threadId,
  parts: [{
    type: 'data-email-inbound',
    data: {
      messageId: MessageID,
      from: From,
      subject: Subject,
      body: htmlToMarkdown(HtmlBody),
    },
  }],
});

createAgentSyntheticMessage loads history, appends the new parts as a user-role message, runs the transformation across every message, and streams the result. The <synthetic-message> tags get added right before the model call.

Why Not the System Role?

It's worth circling back to defend this choice, because "just use the system role" is the first thing most people suggest.

Universal model support. Every model handles user/assistant turns. Not every model handles system messages the same way, or at all. You don't want your event delivery mechanism to break when you swap providers.
Preserves turn structure. User messages get responses. That's the contract. A synthetic user message naturally triggers agent processing. System messages don't have this guarantee. Some models treat them as context, not as something to respond to.
XML tags provide the signalling. The system role's main advantage is that it signals "this is special, not from the user." XML tags give you the same signalling within the user role, without the portability problems.
Validated at scale. Claude Code injects <system-reminder> tags into user-role messages across millions of conversations. The pattern works.
You don't always need XML. OpenClaw, a self-hosted personal assistant that runs across WhatsApp, Telegram, Signal, and other channels, signals with bracket envelopes instead. [Signal +2m] hey there or [Telegram group:123] Alice: what's the plan?. No XML, no prompt instructions explaining the format. The model just parses it. Different point on the explicitness spectrum, same principle.
You can also fake tool calls. Chris Cook shared another take. Instead of injecting user-role messages, convert the event into a tool output and push it into message history. The model sees what looks like a normal tool call/response pair and treats it as grounded knowledge. Same idea, different mechanism.

The Tradeoffs

Token cost. Synthetic messages inflate the context window. Natural language descriptions are more verbose than the structured data they represent. An email that's 200 bytes as JSON might be 400 bytes as natural language with XML wrapping. This compounds across a conversation with many events.

Prompt fragility. "Never mention synthetic message" is a negative instruction, and negative instructions are inherently brittle. Adversarial input can break them. Model updates can change how reliably they're followed. You're relying on prompt compliance for a user-facing quality issue.

Debugging opacity. When the agent misinterprets an event, you're inspecting the transformed text, not the original structured data. The transformation function becomes an extra layer to reason about when things go wrong.

The Takeaway

External events break the conversational model. There's no message role designed for "a webhook just fired." Synthetic messages fix this. Hijack user-role messages with XML tags, teach the agent to process them silently, and transform structured data into natural language for the model. One event, two representations, zero confusion.