TL;DR: MCP servers can return embedded ui:// resources with ChatKit widget payloads. Your Rails controller extracts these from tool result metadata and streams them as thread.item.done events. The widget JSON is hydrated server-side before hitting the browser.

If you’re trying to get OpenAI’s ChatKit UI working with MCP tools in a Ruby on Rails backend, including streaming rich widgets like weather cards over SSE, this post walks through the full architecture and example code.


Quick Background

ChatKit is OpenAI’s UI SDK for building chat interfaces: threads, streaming, file uploads, widgets. Check out the Widget Gallery to see what’s possible.

ChatKit Widget Gallery showing various widget types including calendars, weather, playlists, and forms

MCP (Model Context Protocol) lets tools and backends communicate with LLMs in a consistent way, including returning rich content like UI payloads.

In this setup: ChatKit runs in the browser, Rails is the backend, and MCP servers provide tool results (like weather data) plus optional widget JSON.

This is part of what I’ve been exploring around AI-native app design: how control transfers between user and agent, and how the interface adapts to where you are in that flow. Widgets are one piece of that puzzle. When an agent fetches data, the user should see rich UI, not a wall of text.

What You’ll Learn

By the end, you’ll know how to:

  • Connect OpenAI ChatKit to a Rails backend via SSE streaming
  • Return MCP tool results with embedded ui:// widget resources
  • Extract and hydrate widget JSON in a Rails controller
  • Stream thread.item.done events so ChatKit renders rich widgets instead of plain text

Prerequisites

The Problem

User asks “What’s the weather in Toronto?” I want them to see a formatted card with temperature, conditions, and an icon. Not just: “Current weather in Toronto: 5°C, Cloudy”

The pieces exist, but they don’t naturally connect yet. ChatKit renders widgets (Card, Text, Markdown components). MCP servers return rich content via embedded resources. LLMs call tools and pass results back.

The gap: MCP returns ui:// URIs with specific MIME types. ChatKit expects a particular JSON structure streamed via SSE. My Rails controller translates between them.

Architecture: Server-Side vs Client-Side Hydration

Here’s the end-to-end flow when a user asks about the weather:

  1. User sends message via ChatKit in the browser
  2. ChatKit POSTs to Rails controller, opens SSE connection
  3. Rails creates the user message, streams acknowledgment
  4. Rails calls the LLM, which returns a tool_call for weather
  5. Rails invokes the MCP weather server
  6. MCP server returns text + embedded ui:// widget resource
  7. Rails stores the tool result in message metadata
  8. LLM returns final response
  9. Rails extracts the widget from metadata, transforms to ChatKit format
  10. Rails streams thread.item.done with the widget JSON
  11. ChatKit renders the Card component in the browser
sequenceDiagram participant Browser participant ChatKit.js participant ChatkitController participant Task participant LLM participant MCP Server Browser->>ChatKit.js: User sends message ChatKit.js->>ChatkitController: POST /chatkit (SSE) ChatkitController->>Task: Create user message ChatkitController-->>ChatKit.js: SSE: thread.item.done (user) ChatkitController->>Task: process_llm_response Task->>LLM: Chat completion request LLM-->>Task: tool_call: get_current_weather Task->>MCP Server: Execute tool MCP Server-->>Task: Result with ui:// resource Task->>Task: Store in message.metadata LLM-->>Task: Final response ChatkitController->>ChatkitController: Extract widget from metadata ChatkitController-->>ChatKit.js: SSE: thread.item.done (widget) ChatKit.js->>Browser: Render Card component

The key decision: widget hydration happens server-side. Tools return widget structures, Rails assembles the JSON, SSE delivers it. This matches ChatKit’s Python SDK approach. The difference is extracting from MCP’s embedded resource format rather than using stream_widget.

Why server-side? OpenAI’s Apps SDK takes a different approach: the client (ChatGPT) handles fetching and caching resources. The MCP server returns URIs, and the client renders.

The tradeoffs:

Server-side (ChatKit)Client-side (Apps SDK)
ControlFull control over renderingLess control, client decides
PayloadFully hydrated JSONJust URIs, lighter payloads
CachingServer responsibilityClient handles caching
IntegrationWorks with any clientChatGPT-specific
FlexibilityUse any LLM providerOpenAI ecosystem

For a Rails app where I want the same widgets to render in a web UI I control, server-side hydration makes sense. If I were building primarily for ChatGPT, the Apps SDK approach might be cleaner.

With the architecture defined, let’s look at how each part maps to concrete Rails and MCP code.

Implementation in Rails

MCP Server: Returning Widget Resources

MCP tool results can include embedded resources with arbitrary MIME types. I bundle widget payloads alongside plain text so tools stay usable both for LLM context and for non-widget clients:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# app/mcp_servers/weather_mcp_server/server.rb
def get_current_weather(location:, unit: "celsius")
  weather_data = openweather_client.current_weather(location: location, unit: unit)
  
  text = "Current weather in #{location}: " \
         "#{weather_data[:temperature]}#{weather_data[:unit_symbol]}, " \
         "#{weather_data[:condition]}"

  WidgetTemplateService.hydrate_for_tool_result(
    template: :weatherCurrent,
    data: weather_widget_data(weather_data: weather_data),
    text: text,
    tool_name: "weather"
  )
end

This returns MCP-compliant content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  content: [
    { type: "text", text: "Current weather in Toronto,CA: 5°C, Cloudy" },
    {
      type: "resource",
      resource: {
        uri: "ui://widgets/weather/#{uuid}",
        mimeType: "application/vnd.ui.widget+json",
        text: widget_payload.to_json
      }
    }
  ],
  isError: false
}

Conventions:

  • ui:// URI scheme identifies UI resources
  • application/vnd.ui.widget+json MIME type tells the controller this is a ChatKit widget
  • Plain text comes first so non-widget clients and LLM context still work

Rails Controller: Extracting Widgets

ChatKit uses a single POST endpoint. The interesting part is detecting widget resources in tool results and transforming them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# app/controllers/chatkit_controller.rb
class ChatkitController < ApplicationController
  include ActionController::Live
  
  UI_WIDGET_MIME_TYPE = "application/vnd.ui.widget+json"
  UI_RESOURCE_URI_PREFIX = "ui://"

  private

  # Map a Task message to one or more ChatKit thread items
  def chatkit_items_for_message(task, message)
    case message.role
    when "user"
      [chatkit_user_item(task, message)]
    when "assistant"
      [chatkit_assistant_item(task, message)]
    when "tool_result"
      widget_item = embedded_widget_item(task, message)
      widget_item ? [widget_item] : []
    else
      []
    end
  end

  # Build a ChatKit widget item from MCP embedded resource
  def embedded_widget_item(task, message)
    widget_resource = extract_widget_resource(message)
    return unless widget_resource

    widget_payload = parse_widget_payload(widget_resource)
    return unless widget_payload

    {
      id: message.id.to_s,
      type: "widget",
      thread_id: task.id.to_s,
      created_at: message.created_at.iso8601,
      copy_text: widget_payload[:copy_text],
      widget: widget_payload[:widget]
    }
  end

  # Find the first ui:// resource with widget MIME type
  def extract_widget_resource(message)
    mcp_content = message.metadata&.dig("mcp_content")
    return unless mcp_content.is_a?(Array)

    mcp_content.find do |item|
      next unless item["type"] == "resource"
      resource = item["resource"]
      resource["uri"]&.start_with?(UI_RESOURCE_URI_PREFIX) &&
        resource["mimeType"] == UI_WIDGET_MIME_TYPE
    end
  end

  def parse_widget_payload(resource_item)
    content = resource_item.dig("resource", "text")
    return unless content.present?

    payload = JSON.parse(content)
    { widget: payload["widget"], copy_text: payload["copy_text"] }
  rescue JSON::ParserError
    nil
  end
end

This extraction is generic: any MCP server that returns resources with the right URI prefix and MIME type automatically gets its widgets rendered. A calendar tool, a task manager, a stock ticker: same pattern, different widget templates.

SSE Streaming

When streaming responses, tool results with widgets become thread.item.done events. This is simplified for clarity; production code should handle timeouts and connection errors:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# app/controllers/chatkit_controller.rb
def stream_user_message(req)
  task = task_for(req.dig("params", "thread_id"))
  prepare_stream_headers

  user_message = task.messages.create!(content: extract_content(req), role: "user")
  write_event(stream_options_event)
  thread_item_done_events(task, user_message).each { |e| write_event(e) }

  seen_ids = task.messages.pluck(:id)
  task.process_llm_response  # May invoke MCP tools

  task.messages.where.not(id: seen_ids).order(:created_at).each do |message|
    thread_item_done_events(task, message).each { |e| write_event(e) }
  end
ensure
  response.stream.close
end

def thread_item_done_events(task, message)
  chatkit_items_for_message(task, message).map do |item|
    { type: "thread.item.done", item: item }
  end
end

def write_event(payload)
  response.stream.write("data: #{payload.to_json}\n\n")
end

Widget Templating

For multiple widget types, I separate structure from data:

flowchart LR Tool[Tool Handler] -->|data| Service[WidgetTemplateService] Template[.widget file] -->|structure| Service Service -->|hydrated JSON| Result[MCP Tool Result]

Templates live in config/ui_widget_templates/*.widget:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "template": {
    "type": "Card",
    "style": { "background": "{{ background }}" },
    "children": [
      { "type": "Text", "value": "{{ location }}", "weight": "semibold" },
      { "type": "Text", "value": "{{ temperature }}", "size": "4xl" }
    ]
  },
  "copy_text": "{{ location }}: {{ temperature }}"
}

The service handles hydration with {{ variable }} substitution.

For simpler cases, BaseMCPServer includes helpers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Build inline widgets without templates
def tool_result_with_widget(text:, widget:, copy_text: nil, tool_name: nil)
  # Returns MCP-compliant content array with embedded ui:// resource
end

def ui_card(children:)
  { type: "Card", children: children }
end

def ui_text(value, **options)
  { type: "Text", value: value }.merge(options)
end

Troubleshooting

Multiple Tool Calls Cause Infinite Loops

LLMs often return multiple tool calls in parallel. If you only process the first one, the LLM keeps retrying and you get infinite loops:

1
2
3
4
5
6
7
8
9
def handle_response(response)
  if response.tool_calls.present?
    # Process ALL tool calls, not just [0]
    response.tool_calls.each { |tc| invoke_tool(tc) }
    call_llm_api
  else
    create_assistant_message(response.content)
  end
end

Widgets Not Rendering?

Check these common issues:

  • Wrong MIME type: Must be exactly application/vnd.ui.widget+json
  • Missing URI prefix: The uri field must start with ui://
  • Malformed JSON: Use JSON.parse with rescue to handle parse errors gracefully
  • Widget not in mcp_content: Verify the tool result is stored in message.metadata["mcp_content"]

Why Self-Host?

OpenAI offers hosted ChatKit via Agent Builder. I went self-hosted because:

  • Different LLM providers (Anthropic, Gemini, not just OpenAI)
  • Existing Rails models and authentication
  • Full control over conversation data
  • Flexible MCP tool integration

The cost: implementing the protocol yourself. The Python SDK handles thread persistence, SSE streaming, and widget rendering. In Rails, you build from the protocol docs.

Summary

flowchart TB subgraph Browser CK[ChatKit.js] end subgraph Rails CC[ChatkitController] Task[Task Model] end subgraph MCP Server[MCP Server] Templates[Widget Templates] end CK <-->|SSE| CC CC --> Task Task -->|tool_call| Server Server --> Templates Server -->|ui:// resource| Task CC -->|extract & stream| CK

The code is mostly plumbing. The real work is understanding the formats each layer expects and drawing clean boundaries between them.

If you’re building this:

  1. Get basic ChatKit messaging working first
  2. Add tool calling, return plain text results
  3. Once that works, add widget resources to tool results
  4. Build extraction logic in your controller

Don’t write it all by hand. Describe the architecture, write specs, let AI agents implement. That’s how this got built.

Next Steps

  • Add more widget templates (calendar events, task lists, search results)
  • Experiment with interactive widgets (buttons that trigger server actions)
  • Try different LLM providers behind the same ChatKit frontend

FAQ

Can I use this approach with non-OpenAI LLMs?

Yes. The key is that your Rails backend speaks the ChatKit protocol and your MCP servers return widget resources. The LLM provider itself can be OpenAI, Anthropic, Gemini, or any other.

Do I have to use MCP to get widgets in ChatKit?

No. You can generate widget JSON directly in Rails. MCP just gives you a consistent way to structure tool results across services.

Is SSE required or can I use WebSockets?

ChatKit expects SSE-style streaming. You could adapt these ideas for WebSockets, but the example here is built around SSE.

Further Reading