<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Docs - Agenta Blog</title>
        <link>https://agenta.ai/docs/changelog</link>
        <description>Docs - Agenta Blog</description>
        <lastBuildDate>Wed, 01 Apr 2026 10:13:06 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Changelog]]></title>
            <link>https://agenta.ai/docs/changelog/main</link>
            <guid>https://agenta.ai/docs/changelog/main</guid>
            <pubDate>Wed, 01 Apr 2026 10:13:06 GMT</pubDate>
            <description><![CDATA[Webhooks and GitHub Automations for Prompt Deployments]]></description>
            <content:encoded><![CDATA[<section class="changelog"><h3 class="anchor anchorTargetStickyNavbar_vHny" id="webhooks-and-github-automations-for-prompt-deployments"><a class="" href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations">Webhooks and GitHub Automations for Prompt Deployments</a><a href="https://agenta.ai/docs/changelog/main#webhooks-and-github-automations-for-prompt-deployments" class="hash-link" aria-label="Direct link to webhooks-and-github-automations-for-prompt-deployments" title="Direct link to webhooks-and-github-automations-for-prompt-deployments" translate="no">​</a></h3><p><em>11 March 2026</em></p><p><strong>v0.94.0</strong></p><p>You can now trigger automations when a prompt deployment happens in Agenta. Send the event to any HTTPS endpoint, or call GitHub directly with <code>repository_dispatch</code> or <code>workflow_dispatch</code>.</p><p>This makes it easier to connect prompt deployments to CI, repository sync jobs, and pull request workflows. If your GitHub workflow needs the latest prompt content, fetch it from Agenta during the run and commit the result back to your repo.</p><p>Learn more: <a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/webhooks">Webhooks</a> | <a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/github">GitHub Automations</a></p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="tool-integrations-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/tool-integrations">Tool Integrations in the Playground</a><a href="https://agenta.ai/docs/changelog/main#tool-integrations-in-the-playground" class="hash-link" aria-label="Direct link to tool-integrations-in-the-playground" title="Direct link to tool-integrations-in-the-playground" translate="no">​</a></h3><p><em>27 February 2026</em></p><p><strong>v0.87.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="420" src="https://www.youtube.com/embed/nEbwJhdTQds" title="Tool Integrations in the Playground" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>You can now connect 150+ external tools to your prompts directly from the playground. Browse integrations like Gmail, Slack, Notion, Google Sheets, and GitHub. Authenticate with OAuth, attach tool actions to your prompt, and execute tool calls with one click. Use Google Sheets or Notion as data sources for RAG, send emails from your prompt, or automate developer workflows.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="ai-powered-prompt-refinement-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/refine-ai">AI-Powered Prompt Refinement in the Playground</a><a href="https://agenta.ai/docs/changelog/main#ai-powered-prompt-refinement-in-the-playground" class="hash-link" aria-label="Direct link to ai-powered-prompt-refinement-in-the-playground" title="Direct link to ai-powered-prompt-refinement-in-the-playground" translate="no">​</a></h3><p><em>25 February 2026</em></p><p><strong>v0.84.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="420" src="https://www.youtube.com/embed/V2hHC8hZEeE" title="AI-Powered Prompt Refinement" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>You can now refine prompts with AI directly in the playground. Click the wand icon on any prompt, describe what you want to improve in plain English, and get back a refined version with a summary of changes. Each refinement builds on the last, so you can iterate. Toggle diff view to see exactly what changed, edit the result before applying, or use the quick "Optimize using best practices" shortcut.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="enterprise-compliance-features"><a class="" href="https://agenta.ai/docs/changelog/enterprise-compliance-features">Enterprise Compliance Features</a><a href="https://agenta.ai/docs/changelog/main#enterprise-compliance-features" class="hash-link" aria-label="Direct link to enterprise-compliance-features" title="Direct link to enterprise-compliance-features" translate="no">​</a></h3><p><em>17 February 2026</em></p><p><strong>v0.83.0</strong></p><p>Agenta has new enterprise features. You can now create separate organizations for different teams or clients, each with its own billing, projects, and roles. We added SSO with any OIDC provider (Okta, Azure AD, Auth0, OneLogin, Google Workspace). You can enforce SSO-only for an org and disable password login. Domain verification lets you claim your company domain so new users with matching emails join automatically. We also launched a US region for customers who need their data to stay in the United States.</p><p>SSO, and domain verification are on Business and Enterprise plans. The US region is on all plans.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="folders-for-prompt-organization"><a class="" href="https://agenta.ai/docs/changelog/prompt-folders">Folders for Prompt Organization</a><a href="https://agenta.ai/docs/changelog/main#folders-for-prompt-organization" class="hash-link" aria-label="Direct link to folders-for-prompt-organization" title="Direct link to folders-for-prompt-organization" translate="no">​</a></h3><p><em>4 February 2026</em></p><p><strong>v0.82.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="420" src="https://www.youtube.com/embed/2oy6ymnOq7I" title="Folders for Prompt Organization" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>Prompts multiply fast when you're building agents or managing multiple use cases. Finding the right one becomes guesswork.</p><p>You can now create folders and subfolders to organize prompts. Drag prompts between folders, create nested hierarchies, and search across everything. Folder URLs are shareable.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="onboarding-widget-and-guided-walkthroughs"><a class="" href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs">Onboarding Widget and Guided Walkthroughs</a><a href="https://agenta.ai/docs/changelog/main#onboarding-widget-and-guided-walkthroughs" class="hash-link" aria-label="Direct link to onboarding-widget-and-guided-walkthroughs" title="Direct link to onboarding-widget-and-guided-walkthroughs" translate="no">​</a></h3><p><em>29 January 2026</em></p><p><strong>v0.81.1</strong></p><p>New users now get an onboarding widget with guided walkthroughs. The widget appears in the sidebar and walks you through key features like the playground, evaluations, and observability. Each tour highlights UI elements as you go, so you learn by doing. Track your progress and revisit walkthroughs anytime.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="navigation-links-from-traces-to-appenvironmentvariant"><a class="" href="https://agenta.ai/docs/changelog/trace-navigation-links">Navigation Links from Traces to App/Environment/Variant</a><a href="https://agenta.ai/docs/changelog/main#navigation-links-from-traces-to-appenvironmentvariant" class="hash-link" aria-label="Direct link to navigation-links-from-traces-to-appenvironmentvariant" title="Direct link to navigation-links-from-traces-to-appenvironmentvariant" translate="no">​</a></h3><p><em>28 January 2026</em></p><p><strong>v0.81.0</strong></p><p>You can now click directly from any trace to the application, variant, or environment that generated it. Links appear in both the trace table and drawer view. This makes debugging faster since you can jump straight to the configuration that produced a specific output.</p><p>To enable navigation links, store references in your traces using the Python SDK (<code>ag.tracing.store_refs()</code>) or OpenTelemetry span attributes. See the <a class="" href="https://agenta.ai/docs/observability/trace-with-python-sdk/reference-prompt-versions">reference prompt versions guide</a> for details.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="test-set-versioning-and-new-test-set-ui"><a class="" href="https://agenta.ai/docs/changelog/testset-versioning">Test Set Versioning and New Test Set UI</a><a href="https://agenta.ai/docs/changelog/main#test-set-versioning-and-new-test-set-ui" class="hash-link" aria-label="Direct link to test-set-versioning-and-new-test-set-ui" title="Direct link to test-set-versioning-and-new-test-set-ui" translate="no">​</a></h3><p><em>20 January 2026</em></p><p><strong>v0.74.0</strong></p><p>Test sets now have versioning. Every edit, upload, or programmatic update creates a new version. Evaluations link to specific versions, so you can compare results knowing they used the same test data.</p><p>The test set UI is completely rebuilt. It handles hundreds of thousands of rows without slowing down. Editing is much easier, especially for chat messages. You can view and edit complex JSON directly, toggle between raw and formatted views, and choose whether columns store strings or JSON.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="playground-ux-improvements"><a class="" href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026">Playground UX Improvements</a><a href="https://agenta.ai/docs/changelog/main#playground-ux-improvements" class="hash-link" aria-label="Direct link to playground-ux-improvements" title="Direct link to playground-ux-improvements" translate="no">​</a></h3><p><em>13 January 2026</em></p><p><strong>v0.73.0</strong></p><p>Three quality-of-life improvements to the Playground: You can now see provider costs per million tokens directly in the model selection dropdown. You can run evaluations directly from the Playground without navigating to the evaluation menu. And you can collapse test cases to navigate large test sets more easily.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="chat-sessions-in-observability"><a class="" href="https://agenta.ai/docs/changelog/chat-sessions-observability">Chat Sessions in Observability</a><a href="https://agenta.ai/docs/changelog/main#chat-sessions-in-observability" class="hash-link" aria-label="Direct link to chat-sessions-in-observability" title="Direct link to chat-sessions-in-observability" translate="no">​</a></h3><p><em>9 January 2026</em></p><p><strong>v0.73.0</strong></p><p>You can now track multi-turn conversations with chat sessions. All traces with the same session ID are automatically grouped together, letting you analyze complete conversations instead of individual requests.</p><p>The new session browser shows key metrics like total cost, latency, and token usage per conversation. Open any session to see all traces with their parent-child relationships. This makes debugging chatbots and AI assistants much easier. Add session tracking with one line of code using either our Python SDK or OpenTelemetry.</p><p><strong>Minor improvements:</strong></p><ul>
<li class="">Added time filtering to the analytics dashboard. You can now view metrics for the last 6 hours, 24 hours, 7 days, or 30 days.</li>
<li class="">Added the ability to batch delete multiple traces at once. Select traces using checkboxes and delete them in a single operation.</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="json-multi-field-match-evaluator"><a class="" href="https://agenta.ai/docs/changelog/json-multi-field-match">JSON Multi-Field Match Evaluator</a><a href="https://agenta.ai/docs/changelog/main#json-multi-field-match-evaluator" class="hash-link" aria-label="Direct link to json-multi-field-match-evaluator" title="Direct link to json-multi-field-match-evaluator" translate="no">​</a></h3><p><em>31 December 2025</em></p><p><strong>v0.73.0</strong></p><p>The new JSON Multi-Field Match evaluator validates multiple fields between JSON objects. Configure any number of field paths using dot notation, JSON Path, or JSON Pointer formats. Each field gets its own score (0 or 1), and an aggregate score shows the percentage of matching fields. This evaluator is ideal for entity extraction tasks like validating extracted names, emails, and addresses. The UI automatically detects fields from your test data for quick setup. This replaces the old JSON Field Match evaluator, which only supported single fields.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="pdf-support-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/pdf-support-in-playground">PDF Support in the Playground</a><a href="https://agenta.ai/docs/changelog/main#pdf-support-in-the-playground" class="hash-link" aria-label="Direct link to pdf-support-in-the-playground" title="Direct link to pdf-support-in-the-playground" translate="no">​</a></h3><p><em>17 December 2025</em></p><p><strong>v0.69.0</strong></p><p>The Playground now supports PDF attachments for chat applications. You can attach PDFs by uploading files, providing URLs, or using file IDs from provider APIs. This works with vision-capable models and extends to evaluations and observability. You can now build and test document processing applications like invoice analysis or contract review.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="agenta-documentation-mcp-server"><a class="" href="https://agenta.ai/docs/changelog/mcp-server">Agenta Documentation MCP Server</a><a href="https://agenta.ai/docs/changelog/main#agenta-documentation-mcp-server" class="hash-link" aria-label="Direct link to agenta-documentation-mcp-server" title="Direct link to agenta-documentation-mcp-server" translate="no">​</a></h3><p><em>14 December 2025</em></p><p><strong>v0.68.3</strong></p><p>AI coding agents like Cursor, Claude Code, and VS Code Copilot can now access Agenta documentation directly through the Agenta MCP server. Connect your IDE to get instant answers about Agenta features, APIs, and code examples without leaving your editor. The server supports multiple clients and requires no authentication.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="provider-built-in-tools-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/provider-built-in-tools">Provider Built-in Tools in the Playground</a><a href="https://agenta.ai/docs/changelog/main#provider-built-in-tools-in-the-playground" class="hash-link" aria-label="Direct link to provider-built-in-tools-in-the-playground" title="Direct link to provider-built-in-tools-in-the-playground" translate="no">​</a></h3><p><em>11 December 2025</em></p><p><strong>v0.66.0</strong></p><div style="text-align:center;margin:20px auto;max-width:50%;width:50%"><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAALCAYAAABGbhwYAAAACXBIWXMAAAsTAAALEwEAmpwYAAABCUlEQVR4nH2Pz0rDQBDG993aR/DgTQSPPXiKuQjtwbthL+JJxIsUL176LtU0hFyWLmGSbcz+md2MbKxSKHXgm2HgN9/wMa31RFu7MM4l3vuUiNJGqfRidpOeXc6S86vrxe3d/YQhIqefwv2kXht6fF7Sw9MrvrytaPm+4gwAMh8Gsg71Hj6UjodCiIzJuuYOA9U14DAMv6bkvY8av2w2G86UUjw6RpCOC2Mry5KzGoCHgaj76jGEcBoEgDGM2u3QGHsa3G4ld+hJCPk/2Kk+i0vX99qhR+/DUeqiKDJmjB1fa2PRWEcHwf8cpZSc5Xk+/cjz+Xr9mVRVlSql0qZpRgFA0rbtXAgx/QYcbY8I+vOtKQAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="646"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/tools-dropdown-playground.4df4bcb.600.png" srcset="/docs/assets/ideal-img/tools-dropdown-playground.4df4bcb.600.png 600w" width="600" height="646"></noscript></div></div><p>You can now use provider built-in tools in the Playground. Add web search, code execution, file search, and Bash scripting tools directly to your prompts. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with your prompt configuration and automatically used when you invoke prompts through the LLM gateway.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="projects-within-organizations"><a class="" href="https://agenta.ai/docs/changelog/projects-within-organizations">Projects within Organizations</a><a href="https://agenta.ai/docs/changelog/main#projects-within-organizations" class="hash-link" aria-label="Direct link to projects-within-organizations" title="Direct link to projects-within-organizations" translate="no">​</a></h3><p><em>4 December 2025</em></p><p><strong>v0.65.0</strong></p><p>You can now create projects within an organization. This lets you divide your work between different AI products. Each project scopes its prompts, traces, and evaluations. Create a new project or navigate between projects directly from the sidebar.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="reasoning-effort-support-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/reasoning-effort-support">Reasoning Effort Support in the Playground</a><a href="https://agenta.ai/docs/changelog/main#reasoning-effort-support-in-the-playground" class="hash-link" aria-label="Direct link to reasoning-effort-support-in-the-playground" title="Direct link to reasoning-effort-support-in-the-playground" translate="no">​</a></h3><p><em>18 November 2025</em></p><p><strong>v0.62.5</strong></p><p>You can now configure reasoning effort for models that support this parameter, such as OpenAI's o1 series and Google's Gemini 2.5 Pro. The reasoning effort setting is part of your prompt template, making it available when you fetch prompts via the SDK or invoke them through Agenta as an LLM gateway.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="jinja2-template-support-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/jinja2-template-support">Jinja2 Template Support in the Playground</a><a href="https://agenta.ai/docs/changelog/main#jinja2-template-support-in-the-playground" class="hash-link" aria-label="Direct link to jinja2-template-support-in-the-playground" title="Direct link to jinja2-template-support-in-the-playground" translate="no">​</a></h3><p><em>17 November 2025</em></p><p><strong>v0.62.3</strong></p><p>You can now use Jinja2 templates in your prompts. Jinja2 is available in both the Playground and in prompt management.</p><p>Learn more in our <a href="https://agenta.ai/blog/launch-week-2-day-5-jinja2-prompt-templates" target="_blank" rel="noopener noreferrer" class="">blog post</a> or check the <a class="" href="https://agenta.ai/docs/prompt-engineering/playground/using-playground#switching-template-formats">documentation</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="agenta-core-is-now-open-source"><a class="" href="https://agenta.ai/docs/changelog/open-sourcing-agenta">Agenta Core is Now Open Source</a><a href="https://agenta.ai/docs/changelog/main#agenta-core-is-now-open-source" class="hash-link" aria-label="Direct link to agenta-core-is-now-open-source" title="Direct link to agenta-core-is-now-open-source" translate="no">​</a></h3><p><em>13 November 2025</em></p><p>We're open sourcing the core of Agenta under the MIT license. All functional features are now available to the community. This includes the evaluation system, prompt playground and management, observability, and all core workflows.</p><p>Development moves back to the public repository. We're building in public again. Only enterprise collaboration features like RBAC, SSO, and audit logs remain under a separate license.</p><p>Get started with the <a class="" href="https://agenta.ai/docs/self-host/quick-start">self-hosting guide</a>. View the code and contribute on <a href="https://github.com/Agenta-AI/agenta" target="_blank" rel="noopener noreferrer" class="">GitHub</a>. Read why we made this decision at <a href="https://agenta.ai/blog/commercial-open-source-is-hard-our-journey" target="_blank" rel="noopener noreferrer" class="">agenta.ai/blog/commercial-open-source-is-hard-our-journey</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="evaluation-sdk"><a class="" href="https://agenta.ai/docs/changelog/evaluation-sdk">Evaluation SDK</a><a href="https://agenta.ai/docs/changelog/main#evaluation-sdk" class="hash-link" aria-label="Direct link to evaluation-sdk" title="Direct link to evaluation-sdk" translate="no">​</a></h3><p><em>12 November 2025</em></p><p><strong>v0.62.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/1sZASEjvoOA" title="Evaluation SDK - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>You can now run programmatic evaluations of complex AI agents and workflows directly from code. The Evaluation SDK gives you full control over test data and evaluation logic. It works with agents built using any framework.</p><p>The SDK lets you create test sets in code or fetch them from Agenta. You can use built-in evaluators like LLM-as-a-Judge, semantic similarity, or regex matching. You can also write custom Python evaluators. The SDK evaluates end-to-end workflows or specific spans in execution traces. Evaluations run on your own infrastructure; results display in the Agenta dashboard.</p><p>Check out the <a class="" href="https://agenta.ai/docs/evaluation/evaluation-from-sdk/quick-start">Evaluation SDK documentation</a> to get started.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="online-evaluation"><a class="" href="https://agenta.ai/docs/changelog/online-evaluation">Online Evaluation</a><a href="https://agenta.ai/docs/changelog/main#online-evaluation" class="hash-link" aria-label="Direct link to online-evaluation" title="Direct link to online-evaluation" translate="no">​</a></h3><p><em>11 November 2025</em></p><p><strong>v0.62.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/rP3CiRmu3io" title="Online Evaluation - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>You can now automatically evaluate every request to your LLM application in production. Online Evaluation helps you catch hallucinations and off-brand responses as they happen. You no longer need to discover problems through user complaints.</p><p>You can configure evaluators like LLM-as-a-Judge with custom prompts. Set sampling rates to control costs. Create evaluations with filters for specific spans in your traces. All evaluated requests appear in one dashboard. You can filter traces by evaluation scores to understand issues. You can also add problematic cases to test sets for continuous improvement.</p><p>Setting up online evaluation takes just a couple of minutes. It provides immediate visibility into production quality.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="customize-llm-as-a-judge-output-schemas"><a class="" href="https://agenta.ai/docs/changelog/customize-llm-as-a-judge-output-schemas">Customize LLM-as-a-Judge Output Schemas</a><a href="https://agenta.ai/docs/changelog/main#customize-llm-as-a-judge-output-schemas" class="hash-link" aria-label="Direct link to customize-llm-as-a-judge-output-schemas" title="Direct link to customize-llm-as-a-judge-output-schemas" translate="no">​</a></h3><p><em>10 November 2025</em></p><p><strong>v0.62.0</strong></p><div style="display:flex;justify-content:center;gap:24px;margin:20px 0"><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAd0lEQVR4nHWMzQqDMBCE8/5vIvgiQk5SRPDgtdTaYhLzs9nJlngKQge+hRlmRx3Gdfvnu3gfRhF5tDDzaK1bQoidsqfXKUNiIpRS5CZcB9DKGDcAqIZyZmZGC9UiMwblQ9Tt99/F52vrt/e+WndORHm+UbM1Jep/toW/mA5oik8AAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="311"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/changelog-llm-as-a-judge-response-1.1736d22.600.png" srcset="/docs/assets/ideal-img/changelog-llm-as-a-judge-response-1.1736d22.600.png 600w,/docs/assets/ideal-img/changelog-llm-as-a-judge-response-1.17952d1.1100.png 1100w,/docs/assets/ideal-img/changelog-llm-as-a-judge-response-1.7465f50.1260.png 1260w" width="600" height="311"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAkUlEQVR4nG2MwQrCMBBE8/8/4kG8+A2CiOfiqcaT1LZikjbJ7mY3Ky0iHhwYeMwMY5zz20c/Xvvh2SDiRVV/3cSYbxlgY5j5rKrKRURqXfCrWiuvHfPRZICTiuprTPTykTMQp4wMSAvTMhSRgymfx+BJYlrz/48uhD0R3d2Q2r7zNkyzneZoAdByKW0ppUPE3RuDf78h/0tTSAAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="314"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/changelog-llm-as-a-judge-response-2.9813fa7.600.png" srcset="/docs/assets/ideal-img/changelog-llm-as-a-judge-response-2.9813fa7.600.png 600w,/docs/assets/ideal-img/changelog-llm-as-a-judge-response-2.9ac1e36.1100.png 1100w,/docs/assets/ideal-img/changelog-llm-as-a-judge-response-2.fefbb14.1256.png 1256w" width="600" height="314"></noscript></div></div><p>The LLM-as-a-Judge evaluator now supports custom output schemas. Create multiple feedback outputs per evaluator with any structure you need.</p><p>You can configure output types (binary, multiclass), include reasoning to improve prediction quality, or provide a raw JSON schema with any structure you define. Use these custom schemas in your evaluations to capture exactly the feedback you need.</p><p>Learn more in the <a class="" href="https://agenta.ai/docs/evaluation/configure-evaluators/llm-as-a-judge">LLM-as-a-Judge documentation</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="documentation-overhaul"><a class="" href="https://agenta.ai/docs/changelog/documentation-architecture-overhaul">Documentation Overhaul</a><a href="https://agenta.ai/docs/changelog/main#documentation-overhaul" class="hash-link" aria-label="Direct link to documentation-overhaul" title="Direct link to documentation-overhaul" translate="no">​</a></h3><p><em>3 November 2025</em></p><p><strong>v0.59.10</strong></p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAHCAYAAAAxrNxjAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA0UlEQVR4nE2OS27DMAxEdf8rFd20B0hW8b6AI8u2aEkk9YkpsXC66QAzq4eHMYj8lZB8SjjHmGwp1ZbSbK3N5txm5nbkXD8MIk1ErB6O7j1orU3/pV8jIndDRA9EVoAg27YPgDBKKaO1NmqtcoHnKTfjPUzruqm1tocQFRE1hPCuB+gioud53k2M8bE6q8vyFEw4kGhwLiOXcpnfRhG5mdLatO5R5wX64nY9DlAIpDFlZaK/j73fDZf6/fOMx+bRErFjZkdEjig5xGSRWsz59fkLOsoJ2KbYbl4AAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="444"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/agenta_askai.b35e8d5.600.png" srcset="/docs/assets/ideal-img/agenta_askai.b35e8d5.600.png 600w,/docs/assets/ideal-img/agenta_askai.f1aa52d.1100.png 1100w,/docs/assets/ideal-img/agenta_askai.f8443e0.1600.png 1600w" width="600" height="444"></noscript></div><p>We've completely rewritten and restructured our documentation with a new architecture. This is one of the largest updates we've made, involving a near-complete rewrite of existing content.</p><p>Key improvements include:</p><ul>
<li class=""><strong><a href="https://diataxis.fr/" target="_blank" rel="noopener noreferrer" class="">Diataxis Framework</a></strong>: Organized content into Tutorials, How-to Guides, Reference, and Explanation sections for better discoverability</li>
<li class=""><strong><a class="" href="https://agenta.ai/docs/observability/overview">Expanded Observability Docs</a></strong>: Added missing documentation for tracing, annotations, and observability features</li>
<li class=""><strong><a class="" href="https://agenta.ai/docs/observability/quick-start-opentelemetry">JavaScript/TypeScript Support</a></strong>: Added code examples and documentation for JavaScript developers alongside Python</li>
<li class=""><strong>Ask AI Feature</strong>: Ask questions directly to the documentation for instant answers</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="vertex-ai-provider-support"><a class="" href="https://agenta.ai/docs/changelog/vertex-ai-provider-support">Vertex AI Provider Support</a><a href="https://agenta.ai/docs/changelog/main#vertex-ai-provider-support" class="hash-link" aria-label="Direct link to vertex-ai-provider-support" title="Direct link to vertex-ai-provider-support" translate="no">​</a></h3><p><em>24 October 2025</em></p><p><strong>v0.59.6</strong></p><p>We've added support for Google Cloud's Vertex AI platform. You can now use Gemini models and other Vertex AI partner models in the playground, configure them in the Model Hub, and access them through the Gateway using InVoke endpoints.</p><p>Check out the documentation for <a class="" href="https://agenta.ai/docs/prompt-engineering/playground/custom-providers#configuring-vertex-ai">configuring Vertex AI models</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="filtering-traces-by-annotation"><a class="" href="https://agenta.ai/docs/changelog/filtering-traces-by-annotation">Filtering Traces by Annotation</a><a href="https://agenta.ai/docs/changelog/main#filtering-traces-by-annotation" class="hash-link" aria-label="Direct link to filtering-traces-by-annotation" title="Direct link to filtering-traces-by-annotation" translate="no">​</a></h3><p><em>14 October 2025</em></p><p><strong>v0.58.0</strong></p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAECAYAAAC3OK7NAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAf0lEQVR4nB2LsQrCQBBE7///xUawsBJsbFJaqCAJgliEpLjsLWFvZ0fuBmYeDDMpb9sJiInkk+Sr2YFOko+I+JjZIWXRO0k6AHewutOqt6oJLVR1SGbWh9UdEf3A6qAjaLX2oe37kJZ1PYvIV1XfpZRRRMacc2fr5nn5Xa634x9A6ZhlbNPefAAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="250"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/changelog-annotation-filter.95ae4d3.600.png" srcset="/docs/assets/ideal-img/changelog-annotation-filter.95ae4d3.600.png 600w,/docs/assets/ideal-img/changelog-annotation-filter.702dea4.1100.png 1100w,/docs/assets/ideal-img/changelog-annotation-filter.f9bbd17.1462.png 1462w" width="600" height="250"></noscript></div><p>You can now filter and search traces based on their annotations. This helps you find traces with low scores or bad feedback quickly.</p><p>We rebuilt the filtering system in observability with a simpler dropdown and more options. You can now filter by span status, input keys, app or environment references, and any key within your span.</p><p>The new annotation filtering lets you find:</p><ul>
<li class="">Spans evaluated by a specific evaluator</li>
<li class="">Spans with user feedback like <code>success=True</code></li>
</ul><p>This enables powerful workflows: <a class="" href="https://agenta.ai/docs/tutorials/cookbooks/capture-user-feedback">capture user feedback</a> from your app, filter to find traces with bad feedback, add them to test sets, and improve your prompts based on real user data.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-evaluation-results-dashboard"><a class="" href="https://agenta.ai/docs/changelog/new-evaluation-results-dashboard">New Evaluation Results Dashboard</a><a href="https://agenta.ai/docs/changelog/main#new-evaluation-results-dashboard" class="hash-link" aria-label="Direct link to new-evaluation-results-dashboard" title="Direct link to new-evaluation-results-dashboard" translate="no">​</a></h3><p><em>26 September 2025</em></p><p><strong>v0.54.0</strong></p><p>We've completely redesigned the evaluation results dashboard. You can analyse your evaluation results more easily and understand performance across different metrics.</p><p>Here's what's new:</p><ul>
<li class=""><strong>Metrics plots</strong>: We've added plots for all the evaluator metrics. You can not see the distribution of the results and easily spot outliers.</li>
<li class=""><strong>Side-by-side comparison</strong>: You can now compare multiple evaluations simultaneously. You can compare the plots but also the single outputs.</li>
<li class=""><strong>Improved test cases view</strong>: The results are now displayed in a tabular format works both for small and large datasets.</li>
<li class=""><strong>Focused detail view</strong>: A new focused drawer lets you examine individual data points in more details. It's very helpful if your data is large.</li>
<li class=""><strong>Configuration view</strong>: See exactly which configurations were used in each evaluation</li>
<li class=""><strong>Evaluation Run naming and descriptions</strong>: Add names and descriptions to your evaluation runs to organize things better.</li>
</ul><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/HxY6lZ9HIyw" title="New Evaluation Results Dashboard - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="deep-url-support-for-sharable-links"><a class="" href="https://agenta.ai/docs/changelog/deep-url-support-for-sharable-links">Deep URL Support for Sharable Links</a><a href="https://agenta.ai/docs/changelog/main#deep-url-support-for-sharable-links" class="hash-link" aria-label="Direct link to deep-url-support-for-sharable-links" title="Direct link to deep-url-support-for-sharable-links" translate="no">​</a></h3><p><em>24 September 2025</em></p><p><strong>v0.53.0</strong></p><p>URLs across Agenta now include workspace context, making them fully shareable between team members. Previously, URLs would always point to the default workspace, causing issues when refreshing pages or sharing links.</p><p>Now you can deep link to almost anything in the platform - prompts, evaluations, and more - in any workspace. Share links directly with team members and they'll see exactly what you intended, regardless of their default workspace settings.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="major-speed-improvements-and-bug-fixes"><a class="" href="https://agenta.ai/docs/changelog/speed-improvements-in-the-playground">Major Speed Improvements and Bug Fixes</a><a href="https://agenta.ai/docs/changelog/main#major-speed-improvements-and-bug-fixes" class="hash-link" aria-label="Direct link to major-speed-improvements-and-bug-fixes" title="Direct link to major-speed-improvements-and-bug-fixes" translate="no">​</a></h3><p><em>19 September 2025</em></p><p><strong>v0.52.5</strong></p><p>We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.</p><p>We also made many improvements and fixed bugs:</p><p><strong>Improvements:</strong></p><ul>
<li class=""><a class="" href="https://agenta.ai/docs/evaluation/configure-evaluators/llm-as-a-judge">LLM-as-a-judge</a> now uses double curly braces <code>{{}}</code> instead of single curly braces <code>{</code> and <code>}</code>. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.</li>
</ul><p><strong>Self-hosting:</strong></p><ul>
<li class="">You can now use <a class="" href="https://agenta.ai/docs/self-host/configuration#redis-caching">an external Redis instance</a> for caching by setting it as an environment variable</li>
</ul><p><strong>Bug fixes:</strong></p><ul>
<li class="">Fixed the <a class="" href="https://agenta.ai/docs/custom-workflows/quick-start">custom workflow quick start tutorial</a> and examples</li>
<li class="">Fixed SDK compatibility issues with Python 3.9</li>
<li class="">Fixed default filtering in observability dashboard</li>
<li class="">Fixed error handling in the evaluator playground</li>
<li class="">Fixed the Tracing SDK to allow instrumenting streaming responses and overriding OTEL environment variables</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="multiple-metrics-in-human-evaluation"><a class="" href="https://agenta.ai/docs/changelog/multiple-metrics-in-human-evaluation">Multiple Metrics in Human Evaluation</a><a href="https://agenta.ai/docs/changelog/main#multiple-metrics-in-human-evaluation" class="hash-link" aria-label="Direct link to multiple-metrics-in-human-evaluation" title="Direct link to multiple-metrics-in-human-evaluation" translate="no">​</a></h3><p><em>9 September 2025</em></p><p><strong>v0.51.0</strong></p><p>We rebuilt the human evaluation workflow from scratch. Now you can set multiple evaluators and metrics and use them to score the outputs.</p><p>This lets you evaluate the same output on different metrics like <strong>relevance</strong> or <strong>completeness</strong>. You can also create binary, numerical scores, or even use strings for <strong>comments</strong> or <strong>expected answer</strong>.</p><p>Watch the video below and read the <a class="" href="https://agenta.ai/docs/changelog/multiple-metrics-in-human-evaluation">post</a> for more details. Or check out the <a class="" href="https://agenta.ai/docs/evaluation/human-evaluation/quick-start">docs</a> to learn how to use the new human evaluation workflow.</p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/zpoAbQlsfcw" title="Major Playground Improvements - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="dspy-integration"><a class="" href="https://agenta.ai/docs/integrations/frameworks/dspy/observability">DSPy Integration</a><a href="https://agenta.ai/docs/changelog/main#dspy-integration" class="hash-link" aria-label="Direct link to dspy-integration" title="Direct link to dspy-integration" translate="no">​</a></h3><p><em>29 August 2025</em></p><p>We've added DSPy integration to Agenta. You can now trace and debug your DSPy applications with Agenta.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA3UlEQVR4nAXB20rDMBgA4Lz/g+xiN84rUVGRFSZ4wLHpwK5tshya5tjkTw+2Ffw+pF38O2O+lJd6Nm03NhYGaUMSJgC1MRDpfEWVQ8bBwqSZqTRTY+MgddvXNgLDFMj+K5LDqS3PF4cIb5b9dz5L0461DoNQvhMmRPr6Dvhqk/DmGoqPo0M/JZsfs7cJcz02JnZCe2DCRJy9AF6tUrVexyLbORTSMEE/DcpBL0xIXDmgXAV8OIXqeRuqm9s2v3vwSHv4bWzspQkdVz4x5QNhqi1z4ovPoy+2O5ffP9l/QgHW4i8Q8RcAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="338"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/agenta_dspy.1aa7c6a.600.png" srcset="/docs/assets/ideal-img/agenta_dspy.1aa7c6a.600.png 600w,/docs/assets/ideal-img/agenta_dspy.1537cb2.1100.png 1100w,/docs/assets/ideal-img/agenta_dspy.76d7abd.1600.png 1600w" width="600" height="338"></noscript></div><p><a class="" href="https://agenta.ai/docs/integrations/frameworks/dspy/observability"><strong>View the full DSPy integration →</strong></a></p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="open-sourcing-our-product-roadmap"><a class="" href="https://agenta.ai/docs/roadmap">Open-sourcing our Product Roadmap</a><a href="https://agenta.ai/docs/changelog/main#open-sourcing-our-product-roadmap" class="hash-link" aria-label="Direct link to open-sourcing-our-product-roadmap" title="Direct link to open-sourcing-our-product-roadmap" translate="no">​</a></h3><p><em>12 August 2025</em></p><p>We've made our product roadmap completely transparent and community-driven.</p><p>You can now see exactly what we're building, what's shipped, and what's coming next. Plus vote on features that matter most to you.</p><p><strong>Why we're doing this:</strong> We believe open-source startups succeed when they create the most value possible, and the best way to do that is by building with our community, not in isolation. Up until now, we've been secretive with our roadmap, but we're losing something important: your feedback and the ability to let you shape our direction. Today we're open-sourcing our roadmap because we want to build a community of owners, not just passive users.</p><p><a class="" href="https://agenta.ai/docs/roadmap"><strong>View the full roadmap →</strong></a></p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="major-playground-improvements-and-enhancements"><a class="" href="https://agenta.ai/docs/changelog/major-playground-improvements-and-enhancements">Major Playground Improvements and Enhancements</a><a href="https://agenta.ai/docs/changelog/main#major-playground-improvements-and-enhancements" class="hash-link" aria-label="Direct link to major-playground-improvements-and-enhancements" title="Direct link to major-playground-improvements-and-enhancements" translate="no">​</a></h3><p><em>7 August 2025</em>
<strong>v0.50.5</strong></p><p>We've made significant improvements to the playground. Key features include:</p><ul>
<li class="">Improving the error handling in JSON editor for structured output</li>
<li class="">Preventing the JSON field order from being changed</li>
<li class="">Visual diff when committing changes</li>
<li class="">Markdown and text view toggle</li>
<li class="">Collapsible interface elements</li>
<li class="">Collapsible test cases for large sets</li>
</ul><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/zaiMuWLwC5s" title="Major Playground Improvements - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="support-for-images-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/support-for-images-in-playground">Support for Images in the Playground</a><a href="https://agenta.ai/docs/changelog/main#support-for-images-in-the-playground" class="hash-link" aria-label="Direct link to support-for-images-in-the-playground" title="Direct link to support-for-images-in-the-playground" translate="no">​</a></h3><p><em>29 July 2025</em>
<strong>v0.50.0</strong></p><p>Agenta now supports images in the playground, test sets, and evaluations. Click above for more details.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="llamaindex-integration"><a class="" href="https://agenta.ai/docs/changelog/llamaindex-integration">LlamaIndex Integration</a><a href="https://agenta.ai/docs/changelog/main#llamaindex-integration" class="hash-link" aria-label="Direct link to llamaindex-integration" title="Direct link to llamaindex-integration" translate="no">​</a></h3><p><em>17 June 2025</em>
<strong>v0.48.4</strong></p><p>We're excited to announce observability support for LlamaIndex applications.</p><p>If you're using LlamaIndex, you can now see detailed traces in Agenta to debug your application.</p><p>The integration is auto-instrumentation - just add one line of code and you'll start seeing all your LlamaIndex operations traced.</p><p>This helps when you need to understand what's happening inside your RAG pipeline, track performance bottlenecks, or debug issues in production.</p><p>We've put together a Jupyter notebook and tutorial to get you started. Links are in the comments.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA2ElEQVR4nAHNADL/AOfq7v3U2N782d3i++ru8vjn6/H36e3z9Nnd4fPGys7z0M/Y8uDd6PAA6+7y/ert8frr7vP57PD19unt8vXk6O7zVVJT/DEkKP8AAAD/rZyw9QDX2+D8u8LJ+8XL0vm+xMz43OHn9eLl6vNANDj9g5G0/yNVZv+Qfpf5AOrt8vri5+z44OXr99rf5fbi5+3z6uTv80M+Q/wRKjP/JjFB/4CVs/wA7fD0+u3w9ffq7fP26e3x9Obr8PHo4O7y3MLe9bqoyvmWr9L8gs/w/FWioDpy0Q10AAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="312"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/agenta_llamaindex.ac9b768.600.png" srcset="/docs/assets/ideal-img/agenta_llamaindex.ac9b768.600.png 600w,/docs/assets/ideal-img/agenta_llamaindex.7b46158.1100.png 1100w,/docs/assets/ideal-img/agenta_llamaindex.78623ac.1600.png 1600w" width="600" height="312"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="annotate-your-llm-response-preview"><a class="" href="https://agenta.ai/docs/changelog/annotate-your-llm-response-preview">Annotate Your LLM Response (preview)</a><a href="https://agenta.ai/docs/changelog/main#annotate-your-llm-response-preview" class="hash-link" aria-label="Direct link to annotate-your-llm-response-preview" title="Direct link to annotate-your-llm-response-preview" translate="no">​</a></h3><p><em>15 May 2025</em>
<strong>v0.45.0</strong></p><p>One of the major feature requests we had was the ability to capture user feedback and annotations (e.g. scores) to LLM responses traced in Agenta.</p><p>Today we're previewing one of a family of features around this topic.</p><p>As of today you can use the annotation API to add annotations to LLM responses traced in Agenta.</p><p>This is useful to:</p><ul>
<li class="">Collect user feedback on LLM responses</li>
<li class="">Run custom evaluation workflows</li>
<li class="">Measure application performance in real-time</li>
</ul><p>Check out the how to <a class="" href="https://agenta.ai/docs/observability/trace-with-python-sdk/annotate-traces">annotate traces from API</a> for more details. Or try our new tutorial (available as <a href="https://github.com/Agenta-AI/agenta/blob/main/examples/jupyter/capture_user_feedback.ipynb" target="_blank" rel="noopener noreferrer" class="">jupyter notebook</a>) <a class="" href="https://agenta.ai/docs/tutorials/cookbooks/capture-user-feedback">here</a>.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAzUlEQVR4nE3F3UqEQABA4Xnh3qCLiGIrJEhzd53UcRxto3WEYLW6ioIewKTb6hn213aJE3bVgY8j9vZPOHQ8DhyfY1dy6secDdXfz6UhvLYMghuE40u0immbhvVqyWoxZ7mY023WfH1+8PryzN3jE8KVCp0o3tp3dj+w6bZ03zv6mrbF8y4YxRkiiBJyoymt5b6uqWYz6qrioa4pplPCK0mkM0SYZKRZjkpSYqWJ+/+nDZHSCGvGWBNQ5r0xpRlhzZAivaRIPW4Tl4k84hfKb7cNIWZsNQAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="345"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/evaluation-screenshot-jupyter.6d17b64.600.png" srcset="/docs/assets/ideal-img/evaluation-screenshot-jupyter.6d17b64.600.png 600w,/docs/assets/ideal-img/evaluation-screenshot-jupyter.cc4a5a3.1100.png 1100w,/docs/assets/ideal-img/evaluation-screenshot-jupyter.d607296.1600.png 1600w" width="600" height="345"></noscript></div><p>Other stuff:</p><ul>
<li class="">We have cut our migration process to take a couple of minutes instead of an hour.</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="tool-support-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/tool-support-in-the-playground">Tool Support in the Playground</a><a href="https://agenta.ai/docs/changelog/main#tool-support-in-the-playground" class="hash-link" aria-label="Direct link to tool-support-in-the-playground" title="Direct link to tool-support-in-the-playground" translate="no">​</a></h3><p><em>10 May 2025</em>
<strong>v0.43.1</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px"><iframe width="1030" height="500" src="https://www.youtube.com/embed/SGqHOJf7tb8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>We released tool usage in the Agenta playground - a key feature for anyone building agents with LLMs.</p><p>Agents need tools to access external data, perform calculations, or call APIs.</p><p>Now you can:</p><ul>
<li class="">Define tools directly in the playground using JSON schema</li>
<li class="">Test how your prompt generates tool calls in real-time</li>
<li class="">Preview how your agent handles tool responses</li>
<li class="">Verify tool call correctness with custom evaluators</li>
</ul><p>The tool schema is saved with your prompt configuration, making integration easy when you fetch configs through the API.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="documentation-overhaul-new-models-and-platform-improvements"><a class="" href="https://agenta.ai/docs/changelog/documentation-overhaul-new-models-and-platform-improvements">Documentation Overhaul, New Models, and Platform Improvements</a><a href="https://agenta.ai/docs/changelog/main#documentation-overhaul-new-models-and-platform-improvements" class="hash-link" aria-label="Direct link to documentation-overhaul-new-models-and-platform-improvements" title="Direct link to documentation-overhaul-new-models-and-platform-improvements" translate="no">​</a></h3><p><em>2 May 2025</em></p><p><strong>v0.43.0</strong></p><p>We've made significant improvements across Agenta with a major documentation overhaul, new model support, self-hosting enhancements, and UI improvements.</p><p><strong>Revamped Prompt Engineering Documentation</strong>:</p><p>We've completely rewritten our prompt management and prompt engineering documentation.</p><p>Start exploring the new documentation in our updated <a class="" href="https://agenta.ai/docs/prompt-engineering/quick-start">Quick Start Guide</a>.</p><p><strong>New Model Support</strong>:</p><p>Our platform now supports several new LLM models:</p><ul>
<li class="">Google's Gemini 2.5 Pro and Flash</li>
<li class="">Alibaba Cloud's Qwen 3</li>
<li class="">OpenAI's GPT-4.1</li>
</ul><p>These models are available in both the playground and through the API.</p><p><strong>Playground Enhancements</strong>:</p><p>We've added a draft state to the playground, providing a better editing experience. Changes are now clearly marked as drafts until committed.</p><p><strong>Self-Hosting Improvements</strong>:</p><p>We've significantly simplified the self-hosting experience by changing how environment variables are handled in the frontend:</p><ul>
<li class="">No more rebuilding images to change ports or domains</li>
<li class="">Dynamic configuration through environment variables at runtime</li>
</ul><p>Check out our updated <a class="" href="https://agenta.ai/docs/self-host/quick-start">self-hosting documentation</a> for details.</p><p><strong>Bug Fixes and Optimizations</strong>:</p><ul>
<li class="">Fixed OpenTelemetry integration edge cases</li>
<li class="">Resolved edge cases in the API that affected certain workflow configurations</li>
<li class="">Improved UI responsiveness and fixed minor visual inconsistencies</li>
<li class="">Added chat support in cloud</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="we-are-soc-2-type-2-certified"><a class="" href="https://agenta.ai/docs/changelog/we-are-soc-2-type-2-certified">We are SOC 2 Type 2 Certified</a><a href="https://agenta.ai/docs/changelog/main#we-are-soc-2-type-2-certified" class="hash-link" aria-label="Direct link to we-are-soc-2-type-2-certified" title="Direct link to we-are-soc-2-type-2-certified" translate="no">​</a></h3><p><em>18 April 2025</em>
<strong>v0.42.1</strong></p><p>We are SOC 2 Type 2 Certified. This means that our platform is audited and certified by an independent third party to meet the highest standards of security and compliance.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="structured-output-support-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/structured-output-support-in-the-playground">Structured Output Support in the Playground</a><a href="https://agenta.ai/docs/changelog/main#structured-output-support-in-the-playground" class="hash-link" aria-label="Direct link to structured-output-support-in-the-playground" title="Direct link to structured-output-support-in-the-playground" translate="no">​</a></h3><p><em>15 April 2025</em></p><p><strong>v0.42.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px"><iframe width="1030" height="500" src="https://www.youtube.com/embed/08r4g5mO9lw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>We now support structured output support in the playground. You can define the expected output format and validate the output against it.</p><p>With Agenta's playground, implementing structured outputs is straightforward:</p><ul>
<li class="">
<p>Open any prompt</p>
</li>
<li class="">
<p>Switch the Response format dropdown from text to JSON mode or JSON Schema</p>
</li>
<li class="">
<p>Paste or write your schema (Agenta supports the full JSON Schema specification)</p>
</li>
<li class="">
<p>Run the prompt - the response panel will show the response beautified</p>
</li>
<li class="">
<p>Commit the changes - the schema will be saved with your prompt, so when your SDK fetches the prompt, it will include the schema information</p>
</li>
</ul><p>Check out the blog post for more detail <a href="https://agenta.ai/blog/structured-outputs-playground" target="_blank" rel="noopener noreferrer" class="">https://agenta.ai/blog/structured-outputs-playground</a></p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-feature-prompt-and-deployment-registry"><a class="" href="https://agenta.ai/docs/changelog/new-feature-prompt-and-deployment-registry">New Feature: Prompt and Deployment Registry</a><a href="https://agenta.ai/docs/changelog/main#new-feature-prompt-and-deployment-registry" class="hash-link" aria-label="Direct link to new-feature-prompt-and-deployment-registry" title="Direct link to new-feature-prompt-and-deployment-registry" translate="no">​</a></h3><p><em>7 April 2025</em></p><p><strong>v0.38.0</strong></p><div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px"><iframe width="1030" height="500" src="https://www.youtube.com/embed/ZwpHuXp2WiI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div><p>We've introduced the Prompt and Deployment Registry, giving you a centralized place to manage all variants and versions of your prompts and deployments.</p><p><strong>Key capabilities:</strong></p><ul>
<li class="">View all variants and revisions in a single table</li>
<li class="">Access all commits made to a variant</li>
<li class="">Use older versions of variants directly in the playground</li>
</ul><p>Learn more in our <a href="https://agenta.ai/blog/introducing-prompt-registry" target="_blank" rel="noopener noreferrer" class="">blog post</a>.</p><p><strong>Bug Fixes</strong></p><ul>
<li class="">Fixed minor UI issues with dots in sidebar menu</li>
<li class="">Fixed minor playground UI issues</li>
<li class="">Fixed playground reset default model name</li>
<li class="">Fixed project_id issue on testset detail page</li>
<li class="">Fixed breaking issues with old variants encountered during QA</li>
<li class="">Fixed variant naming logic</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="improvements-to-the-playground-and-custom-workflows"><a class="" href="https://agenta.ai/docs/changelog/improvements-to-the-playground-and-custom-workflows">Improvements to the Playground and Custom Workflows</a><a href="https://agenta.ai/docs/changelog/main#improvements-to-the-playground-and-custom-workflows" class="hash-link" aria-label="Direct link to improvements-to-the-playground-and-custom-workflows" title="Direct link to improvements-to-the-playground-and-custom-workflows" translate="no">​</a></h3><p><em>19 March 2025</em>
<strong>v0.36.0</strong></p><p>We've made several improvements to the playground, including:</p><ul>
<li class="">Improved scrolling behavior</li>
<li class="">Increased discoverability of variants creation and comparison</li>
<li class="">Implemented stop functionality in the playground</li>
</ul><p>As for custom workflows, now they work with sub-routes. This means you can have multiple routes in one file and create multiple custom workflows from the same file.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="opentelemetry-compliance-and-custom-workflows-from-api"><a class="" href="https://agenta.ai/docs/changelog/opentelemetry-compliance-and-custom-workflows-from-api">OpenTelemetry Compliance and Custom workflows from API</a><a href="https://agenta.ai/docs/changelog/main#opentelemetry-compliance-and-custom-workflows-from-api" class="hash-link" aria-label="Direct link to opentelemetry-compliance-and-custom-workflows-from-api" title="Direct link to opentelemetry-compliance-and-custom-workflows-from-api" translate="no">​</a></h3><p><em>11 March 2025</em></p><p><strong>v0.35.0</strong></p><p>We've introduced major improvements to Agenta, focusing on OpenTelemetry compliance and simplified custom workflow debugging.</p><p><strong>OpenTelemetry (OTel) Support</strong>:</p><p>Agenta is now fully OpenTelemetry-compliant. This means you can seamlessly integrate Agenta with thousands of OTel-compatible services using existing SDKs. To integrate your application with Agenta, simply configure an OTel exporter pointing to your Agenta endpoint—no additional setup required.</p><p>We've enhanced distributed tracing capabilities to better debug complex distributed agent systems. All HTTP interactions between agents—whether running within Agenta's SDK or externally—are automatically traced, making troubleshooting and monitoring easier.</p><p>Detailed instructions and examples are available in our <a class="" href="https://agenta.ai/docs/observability/trace-with-opentelemetry/distributed-tracing">distributed tracing documentation</a>.</p><p><strong>Improved Custom Workflows</strong>:</p><p>Based on your feedback, we've streamlined debugging and running custom workflows:</p><ul>
<li class="">
<p><strong>Run workflows from your environments</strong>: You no longer need the Agenta CLI to manage custom workflows. Setting up custom workflows now involves simply adding the Agenta SDK to your code, creating an endpoint, and connecting it to Agenta via the web UI. You can check how it's done in the <a class="" href="https://agenta.ai/docs/custom-workflows/overview">quick start guide</a>.</p>
</li>
<li class="">
<p><strong>Custom Workflows in the new playground</strong>: Custom workflows are now fully compatible with the new playground. You can now nest configurations, run side-by-side comparisons, and debug your agents and complex workflows very easily.</p>
</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-playground"><a class="" href="https://agenta.ai/docs/changelog/new-playground">New Playground</a><a href="https://agenta.ai/docs/changelog/main#new-playground" class="hash-link" aria-label="Direct link to new-playground" title="Direct link to new-playground" translate="no">​</a></h3><p><em>4 February 2025</em></p><p><strong>v0.33.0</strong></p><br><p>We've rebuilt our playground from scratch to make prompt engineering faster and more intuitive. The old playground took 20 seconds to create a prompt - now it's instant.</p><p>Key improvements:</p><ul>
<li class="">Create prompts with multiple messages using our new template system</li>
<li class="">Format variables easily with curly bracket syntax and a built-in validator</li>
<li class="">Switch between chat and completion prompts in one interface</li>
<li class="">Load test sets directly in the playground to iterate faster</li>
<li class="">Save successful outputs as test cases with one click</li>
<li class="">Compare different prompts side-by-side</li>
<li class="">Deploy changes straight to production</li>
</ul><p>For developers, now you create prompts programmatically through our API.</p><p>You can explore these features in our updated playground documentation.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="quality-of-life-improvements"><a class="" href="https://agenta.ai/docs/changelog/quality-of-life-improvements">Quality of life improvements</a><a href="https://agenta.ai/docs/changelog/main#quality-of-life-improvements" class="hash-link" aria-label="Direct link to quality-of-life-improvements" title="Direct link to quality-of-life-improvements" translate="no">​</a></h3><p><em>27 January 2025</em></p><p><strong>v0.32.0</strong></p><img src="https://agenta.ai/docs/assets/images/changelog_sidebar-3054697e4a17df8054176b97d362990c.gif" style="display:block;margin:20px;text-align:center" alt="New collapsible side menu" loading="lazy"><p>Small release today with quality of life improvements, while we're preparing the huge release coming up in the next days:</p><ul>
<li class="">Added a collapsible side menu for better space management</li>
<li class="">Enhanced frontend performance and responsiveness</li>
<li class="">Implemented a confirmation modal when deleting test sets</li>
<li class="">Improved permission handling across the platform</li>
<li class="">Improved frontend test coverage</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="agenta-is-soc-2-type-1-certified"><a class="" href="https://agenta.ai/docs/changelog/agenta-is-soc-2-type-1-certified">Agenta is SOC 2 Type 1 Certified</a><a href="https://agenta.ai/docs/changelog/main#agenta-is-soc-2-type-1-certified" class="hash-link" aria-label="Direct link to agenta-is-soc-2-type-1-certified" title="Direct link to agenta-is-soc-2-type-1-certified" translate="no">​</a></h3><p><em>15 January 2025</em></p><p><strong>v0.31.0</strong></p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAuElEQVR4nD3Ju23DMABFUS6TVTKGN8kOWSETuEvltC5SCKkkB5JFfSxS/Egy5Q+U6hq0AxcHeHhX7CpNlMmH3T/Z9hRN//xE3hiK1iLVcFd2nko5NqkiyTW1chQHi6iUR3aOn6xkm2Tsy5bPzPDynrBa5xyUpzEjovcz2gXKtudXdhgz8l1PvH6kvH1VDD6g/YwYw4XIjSfMEFB2wrqJWg9Yd2SaH12crn9E82V5CueFc9zX5d5ivwHLztruFfKakwAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="338"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/soc2_type1.8810f85.600.png" srcset="/docs/assets/ideal-img/soc2_type1.8810f85.600.png 600w,/docs/assets/ideal-img/soc2_type1.81fd20c.1100.png 1100w,/docs/assets/ideal-img/soc2_type1.3b92d31.1600.png 1600w" width="600" height="338"></noscript></div><p>We've achieved SOC 2 Type 1 certification, validating our security controls for protecting sensitive LLM development data. This certification covers our entire platform, including prompt management, evaluation frameworks, and observability tools.</p><p>Key security features and improvements:</p><ul>
<li class="">Data encryption in transit and at rest</li>
<li class="">Enhanced access control and authentication</li>
<li class="">Comprehensive security monitoring</li>
<li class="">Regular third-party security assessments</li>
<li class="">Backup and disaster recovery protocols</li>
</ul><p>This certification represents a significant milestone for teams using Agenta in production environments. Whether you're using our open-source platform or cloud offering, you can now build LLM applications with enterprise-grade security confidence.</p><p>We've also updated our <a href="https://trustcenter.agenta.ai/" target="_blank" rel="noopener noreferrer" class="">trust center</a> with detailed information about our security practices and compliance standards. For teams interested in learning more about our security controls or requesting our SOC 2 report, please contact <a href="mailto:team@agenta.ai" target="_blank" rel="noopener noreferrer" class="">team@agenta.ai</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-onboarding-flow"><a class="" href="https://agenta.ai/docs/changelog/new-onboarding-flow">New Onboarding Flow</a><a href="https://agenta.ai/docs/changelog/main#new-onboarding-flow" class="hash-link" aria-label="Direct link to new-onboarding-flow" title="Direct link to new-onboarding-flow" translate="no">​</a></h3><p><em>4 January 2025</em></p><p><strong>v0.30.0</strong></p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAACCAYAAABhYU3QAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAPElEQVR4nGM4cebC/137j/x/+erNfxD49+8fmEYHDJNmLfofl1X2/9Dx02CB33/+/P/x4+f/X79/oygEAMX8SlUDDFtkAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="111"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/changelog_onboarding1.b85b274.600.png" srcset="/docs/assets/ideal-img/changelog_onboarding1.b85b274.600.png 600w,/docs/assets/ideal-img/changelog_onboarding1.49e48e3.1100.png 1100w,/docs/assets/ideal-img/changelog_onboarding1.864a51d.1600.png 1600w" width="600" height="111"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAX0lEQVR4nIWOSw6CQBAF5/7XNBo0Cj0gA/2bMixZqC+pXaXyikjl/ngidcbMMbMTqkpmUtoetOXFZbhxHZ2eTkSc6L1Ttj2QujLKwqbJt5X32phkRu0o/RCPX+7+V/wAM3TEWds4U5QAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="277"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/changelog_onboarding2.178956c.600.png" srcset="/docs/assets/ideal-img/changelog_onboarding2.178956c.600.png 600w,/docs/assets/ideal-img/changelog_onboarding2.6ba813b.1100.png 1100w,/docs/assets/ideal-img/changelog_onboarding2.b722602.1600.png 1600w" width="600" height="277"></noscript></div><p>We've redesigned our platform's onboarding to make getting started simpler and more intuitive. Key improvements include:</p><ul>
<li class="">Streamlined tracing setup process</li>
<li class="">Added a demo RAG playground project showcasing custom workflows</li>
<li class="">Enhanced frontend performance</li>
<li class="">Fixed scroll behavior in trace view</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="add-spans-to-test-sets"><a class="" href="https://agenta.ai/docs/changelog/add-spans-to-test-sets">Add Spans to Test Sets</a><a href="https://agenta.ai/docs/changelog/main#add-spans-to-test-sets" class="hash-link" aria-label="Direct link to add-spans-to-test-sets" title="Direct link to add-spans-to-test-sets" translate="no">​</a></h3><p><em>11 December 2024</em></p><p><strong>v0.29.0</strong></p><br><p>This release introduces the ability to add spans to test sets, making it easier to bootstrap your evaluation data from production. The new feature lets you:</p><ul>
<li class="">Add individual or batch spans to test sets</li>
<li class="">Create custom mappings between spans and test sets</li>
<li class="">Preview test set changes before committing them</li>
</ul><p>Additional improvements:</p><ul>
<li class="">Fixed CSV test set upload issues</li>
<li class="">Prevented viewing of incomplete evaluations</li>
<li class="">Added mobile compatibility warning</li>
<li class="">Added support for custom ports in self-hosted installations</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="viewing-traces-in-the-playground-and-authentication-for-deployed-applications"><a class="" href="https://agenta.ai/docs/changelog/viewing-traces-in-the-playground-and-authentication-for-deployed-applications">Viewing Traces in the Playground and Authentication for Deployed Applications</a><a href="https://agenta.ai/docs/changelog/main#viewing-traces-in-the-playground-and-authentication-for-deployed-applications" class="hash-link" aria-label="Direct link to viewing-traces-in-the-playground-and-authentication-for-deployed-applications" title="Direct link to viewing-traces-in-the-playground-and-authentication-for-deployed-applications" translate="no">​</a></h3><p><em>29 November 2024</em></p><p><strong>v0.28.0</strong></p><h4 class="anchor anchorTargetStickyNavbar_vHny" id="viewing-traces-in-the-playground">Viewing traces in the playground:<a href="https://agenta.ai/docs/changelog/main#viewing-traces-in-the-playground" class="hash-link" aria-label="Direct link to Viewing traces in the playground:" title="Direct link to Viewing traces in the playground:" translate="no">​</a></h4><p>You can now see traces directly in the playground. For simple applications, this means you can view the prompts sent to LLMs. For custom workflows, you get an overview of intermediate steps and outputs. This makes it easier to understand what’s happening under the hood and debug your applications.</p><h4 class="anchor anchorTargetStickyNavbar_vHny" id="authentication-improvements">Authentication improvements:<a href="https://agenta.ai/docs/changelog/main#authentication-improvements" class="hash-link" aria-label="Direct link to Authentication improvements:" title="Direct link to Authentication improvements:" translate="no">​</a></h4><p>We’ve strengthened authentication for deployed applications. As you know, Agenta lets you either fetch the app’s config or call it with Agenta acting as a proxy. Now, we’ve added authentication to the second method. The APIs we create are now protected and can be called using an API key. You can find code snippets for calling the application in the overview page.</p><h4 class="anchor anchorTargetStickyNavbar_vHny" id="documentation-improvements">Documentation improvements:<a href="https://agenta.ai/docs/changelog/main#documentation-improvements" class="hash-link" aria-label="Direct link to Documentation improvements:" title="Direct link to Documentation improvements:" translate="no">​</a></h4><p>We’ve added new cookbooks and updated existing documentation:</p><ul>
<li class="">New <a class="" href="https://agenta.ai/docs/tutorials/cookbooks/observability_langchain">cookbook for observability with LangChain</a></li>
<li class="">Updated the <a class="" href="https://agenta.ai/docs/custom-workflows/overview">custom workflows documentation</a> and added <a class="" href="https://agenta.ai/docs/reference/sdk/custom-workflow">reference</a></li>
<li class="">Updated the <a class="" href="https://agenta.ai/docs/reference/sdk/observability">reference for the observability SDK</a> and <a class="" href="https://agenta.ai/docs/reference/sdk/configuration-management">for the prompt management SDK</a></li>
</ul><h4 class="anchor anchorTargetStickyNavbar_vHny" id="bug-fixes">Bug fixes:<a href="https://agenta.ai/docs/changelog/main#bug-fixes" class="hash-link" aria-label="Direct link to Bug fixes:" title="Direct link to Bug fixes:" translate="no">​</a></h4><ul>
<li class="">Fixed an issue with the observability SDK not being compatible with LiteLLM.</li>
<li class="">Fixed an issue where cost and token usage were not correctly computed for all calls.</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="observability-and-prompt-management"><a class="" href="https://agenta.ai/docs/changelog/observability-and-prompt-management">Observability and Prompt Management</a><a href="https://agenta.ai/docs/changelog/main#observability-and-prompt-management" class="hash-link" aria-label="Direct link to observability-and-prompt-management" title="Direct link to observability-and-prompt-management" translate="no">​</a></h3><p><em>6 November 2024</em></p><p><strong>v0.27.0</strong></p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA3klEQVR4nC3MwUoCURjF8fucPUWto0UEgUSQWJOQkUlv0KJNmwIpo4nCiHQsGHNmctRy08wd773fP5w6i7P4cThqZW1LVjf35Pzal/bzu1w9DuT2dSgPQSR+P5KXYcrFfZCr9YqHd9QiHMUY6yiswTqDOIexhmU+0rlWGzt1qocnJONJiT9ZTpbl6GJBrovSwmSm1Xb1mFq9yec4LdEagzULxFnc/2OUTrWq7DfZ9U4Zp9O/oXM4AQFElg1f33Otzho1Wo0DBk9tZmGXJPCJex2Sfoe4d0P6dseoe6l/Aa4U1K5ypd1LAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="333"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/observability.837514c.600.png" srcset="/docs/assets/ideal-img/observability.837514c.600.png 600w,/docs/assets/ideal-img/observability.90bf4a1.1100.png 1100w,/docs/assets/ideal-img/observability.ca91ed4.1600.png 1600w" width="600" height="333"></noscript></div><p>This release is one of our biggest yet—one changelog hardly does it justice.</p><p><strong>First up: Observability</strong></p><p>We’ve had observability in beta for a while, but now it’s been completely rewritten,
with a brand-new UI and fully <strong>open-source code</strong>.</p><p>The new <a class="" href="https://agenta.ai/docs/observability/overview">Observability SDK</a> is compatible with <a href="https://opentelemetry.io/" target="_blank" rel="noopener noreferrer" class="">OpenTelemetry (Otel)</a> and <a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/" target="_blank" rel="noopener noreferrer" class="">gen-ai semantic conventions</a>. This means you get a lot of integrations right out of the box, like <a class="" href="https://agenta.ai/docs/integrations/frameworks/langchain/observability">LangChain</a>, <a class="" href="https://agenta.ai/docs/integrations/llm-providers/openai/observability">OpenAI</a>, and more.</p><p>We’ll publish a full blog post soon, but here’s a quick look at what the new observability offers:</p><ul>
<li class="">
<p>A redesigned UI that lets you visualize nested traces, making it easier to understand what’s happening behind the scenes.</p>
</li>
<li class="">
<p>The web UI lets you filter traces by name, cost, and other attributes—you can even search through them easily.</p>
</li>
<li class="">
<p>The SDK is Otel-compatible, and we’ve already tested integrations for <a class="" href="https://agenta.ai/docs/integrations/llm-providers/openai/observability">OpenAI</a>, <a class="" href="https://agenta.ai/docs/integrations/frameworks/langchain/observability">LangChain</a>, <a class="" href="https://agenta.ai/docs/integrations/llm-providers/litellm/observability">LiteLLM</a>, and <a class="" href="https://agenta.ai/docs/integrations/libraries/instructor/observability">Instructor</a>, with guides available for each. In most cases, adding a few lines of code will have you seeing traces directly in Agenta.</p>
</li>
</ul><p><strong>Next: Prompt Management</strong></p><p>We’ve completely rewritten the <a class="" href="https://agenta.ai/docs/prompt-engineering/managing-prompts-programatically/setup">prompt management SDK</a>, giving you full CRUD capabilities for prompts and configurations. This includes creating, updating, reading history, deploying new versions, and deleting old ones. You can find a first tutorial for this <a class="" href="https://agenta.ai/docs/tutorials/sdk/manage-prompts-with-SDK">here</a>.</p><p><strong>And finally: LLM-as-a-Judge Overhaul</strong></p><p>We've made significant upgrades to the <a class="" href="https://agenta.ai/docs/evaluation/configure-evaluators/llm-as-a-judge">LLM-as-a-Judge evaluator</a>. It now supports prompts with multiple messages and has access to all variables in a test case. You can also switch models (currently supporting OpenAI and Anthropic). These changes make the evaluator much more flexible, and we're seeing better results with it.</p><img src="https://agenta.ai/docs/assets/images/llm-as-a-judge-ff10fa972d72f1c8647c216e1b11ab11.gif" style="display:block;margin:5px auto;width:50%;text-align:center" alt="Configuring the LLM-as-a-Judge evaluator" loading="lazy"><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-application-management-view-and-various-improvements"><a class="" href="https://agenta.ai/docs/changelog/new-application-management-view-and-various-improvements">New Application Management View and Various Improvements</a><a href="https://agenta.ai/docs/changelog/main#new-application-management-view-and-various-improvements" class="hash-link" aria-label="Direct link to new-application-management-view-and-various-improvements" title="Direct link to new-application-management-view-and-various-improvements" translate="no">​</a></h3><p><em>22 October 2024</em></p><p><strong>v0.26.0</strong></p><p>We updated the <strong>Application Management View</strong> to improve the UI. Many users struggled to find their applications when they had a large number, so we've improved the view and added a search bar for quick filtering.
Additionally, we are moving towards a new project structure for the application. We moved test sets and evaluators outside of the application scope. So now, you can use the same test set and evaluators in multiple applications.</p><p><strong>Bug Fixes</strong></p><ul>
<li class="">Added an export button in the evaluation view to export results from the main view.</li>
<li class="">Eliminated Pydantic warnings in the CLI.</li>
<li class="">Improved error messages when <code>fetch_config</code> is called with wrong arguments.</li>
<li class="">Enhanced the custom code evaluation sandbox and removed the limitation that results need to be between 0 and 1</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="evaluator-testing-playground-and-a-new-evaluation-view"><a class="" href="https://agenta.ai/docs/changelog/evaluator-testing-playground-and-a-new-evaluation-view">Evaluator Testing Playground and a New Evaluation View</a><a href="https://agenta.ai/docs/changelog/main#evaluator-testing-playground-and-a-new-evaluation-view" class="hash-link" aria-label="Direct link to evaluator-testing-playground-and-a-new-evaluation-view" title="Direct link to evaluator-testing-playground-and-a-new-evaluation-view" translate="no">​</a></h3><p><em>22 September 2024</em></p><p><strong>v0.25.0</strong></p><br><p>Many users faced challenges configuring evaluators in the web UI. Some
evaluators, such as <code>LLM as a judge</code>, <code>custom code</code>, or RAG evaluators can be
tricky to set up correctly on the first try. Until now, users needed to setup,
run an evaluation, check the errors, then do it again.</p><p>To address this, we've introduced a new evaluator test/debug playground. This feature allows you to test the evaluator live on real data, helping you test the configuration before committing to it and using it for evaluations.</p><p>Additionally, we have improved and redesigned the evaluation view. Both automatic and human evaluations are now within the same view but in different tabs. We're moving towards unifying all evaluator results and consolidating them in one view, allowing you to quickly get an overview of what's working.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="ui-redesign-and-configuration-management-and-overview-view"><a class="" href="https://agenta.ai/docs/changelog/ui-redesign-and-configuration-management-and-overview-view">UI Redesign and Configuration Management and Overview View</a><a href="https://agenta.ai/docs/changelog/main#ui-redesign-and-configuration-management-and-overview-view" class="hash-link" aria-label="Direct link to ui-redesign-and-configuration-management-and-overview-view" title="Direct link to ui-redesign-and-configuration-management-and-overview-view" translate="no">​</a></h3><p><em>22 August 2024</em></p><p><strong>v0.24.0</strong></p><div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAwUlEQVR4nB2NwU7DQAxE9///gx/gwoFjD4gQ0SKKSAOJAknbJG0Fkeh67ex6UDySNZrRk8fd3K00e6t19fKhedlqVnzrU9nppjpqvuu0aM+4X9fevTcHLIoxmquqHaBIKVnXjr/kdl8LmECBEZjBLOaeCFfvDdyfJnL78WLh7+ohzJjnGSIC5oAQgn0eLhO5889kYJAIHwQkM5Zlicm6qEB3HMjV2wecmi3G5hV9tUFfPWOs1+g/cxzKRwxVjiK7pX8y/OHKQRqURQAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/new_ui.cc6c056.600.png" srcset="/docs/assets/ideal-img/new_ui.cc6c056.600.png 600w,/docs/assets/ideal-img/new_ui.52b97c9.1100.png 1100w,/docs/assets/ideal-img/new_ui.b51094c.1600.png 1600w" width="600" height="334"></noscript></div></div><p>We've completely redesigned the platform's UI. Additionally we have introduced a new overview view for your applications. This is part of a series of upcoming improvements slated for the next few weeks.</p><p>The new overview view offers:</p><ul>
<li class="">A dashboard displaying key metrics of your application</li>
<li class="">A table with all the variants of your applications</li>
<li class="">A summary of your application's most recent evaluations</li>
</ul><p>We've also added a new <strong>JSON Diff evaluator</strong>. This evaluator compares two JSON objects and provides a similarity score.</p><p>Lastly, we've updated the UI of our documentation.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-alpha-version-of-the-sdk-for-creating-custom-applications"><a class="" href="https://agenta.ai/docs/changelog/new-alpha-version-of-the-sdk-for-creating-custom-applications">New Alpha Version of the SDK for Creating Custom Applications</a><a href="https://agenta.ai/docs/changelog/main#new-alpha-version-of-the-sdk-for-creating-custom-applications" class="hash-link" aria-label="Direct link to new-alpha-version-of-the-sdk-for-creating-custom-applications" title="Direct link to new-alpha-version-of-the-sdk-for-creating-custom-applications" translate="no">​</a></h3><p><em>20 August 2024</em></p><p><strong>v0.23.0</strong></p><p>We've released a new version of the SDK for creating custom applications. This Pydantic-based SDK significantly simplifies the process of building custom applications. It's fully backward compatible, so your existing code will continue to work seamlessly. We'll soon be rolling out comprehensive documentation and examples for the new SDK.</p><p>In the meantime, here's a quick example of how to use it:</p><div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Agenta</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> pydantic </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BaseModel</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Field</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Define the configuration of the application (that will be shown in the playground )</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">MyConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">BaseModel</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">    temperature</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">float</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">default</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">    prompt_template</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Field</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">default</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"What is the capital of {country}?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Creates an endpoint for the entrypoint of the application</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@ag</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">route</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"/"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> config_schema</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">MyConfig</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">generate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">country</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Fetch the config from the request</span><span class="token plain"></span><br></span><span class="token-line theme-code-block-highlighted-line" style="color:#393A34"><span class="token plain">    config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> MyConfig </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ConfigManager</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_from_route</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">schema</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">MyConfig</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> config</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompt_template</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">format</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">country</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">country</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chat_completion </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-3.5-turbo"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        temperature</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">config</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">temperature</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> chat_completion</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="ragas-evaluators-and-traces-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/ragas-evaluators-and-traces-in-the-playground">RAGAS Evaluators and Traces in the Playground</a><a href="https://agenta.ai/docs/changelog/main#ragas-evaluators-and-traces-in-the-playground" class="hash-link" aria-label="Direct link to ragas-evaluators-and-traces-in-the-playground" title="Direct link to ragas-evaluators-and-traces-in-the-playground" translate="no">​</a></h3><p><em>12 August 2024</em></p><p><strong>v0.22.0</strong></p><p>We're excited to announce two major features this week:</p><ol>
<li class="">
<p>We've integrated <a href="https://docs.ragas.io/" target="_blank" rel="noopener noreferrer" class="">RAGAS evaluators</a> into agenta. Two new evaluators have been added: <strong>RAG Faithfulness</strong> (measuring how consistent the LLM output is with the context) and <strong>Context Relevancy</strong> (assessing how relevant the retrieved context is to the question). Both evaluators use intermediate outputs within the trace to calculate the final score.</p>
<p><a class="" href="https://agenta.ai/docs/evaluation/configure-evaluators/rag-evaluators">Check out the tutorial</a> to learn how to use RAG evaluators.</p>
</li>
</ol> <div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAHCAYAAAAxrNxjAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAxElEQVR4nF2MuQrCQBRF53f9Bht/wEZsxEILI4i4QEwMBIxLNsGgYtLZqJWV1Ugms1zJ2BgPHC68d7mk1mhj5oYYmGsYtg/DCTDxDrDjFGZ4UZvzFfWuSUln7KBECKnzFyGkKnO2SijpTl19fOcMXMiKecF1cbE9UTJyfF1kBYdUqmIhhC664ZkSy4sAJZEzBillRS6+i7v9kZJgn+jF8vEPK7h6viiiKKYkWfZxS2PcUx+PLMAjK3OnzWJLDXstBPMm/QBvGP55r3GHXAAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="432"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/rag_faithfulness.2098391.600.png" srcset="/docs/assets/ideal-img/rag_faithfulness.2098391.600.png 600w,/docs/assets/ideal-img/rag_faithfulness.cf25a5f.1100.png 1100w,/docs/assets/ideal-img/rag_faithfulness.3411ae2.1475.png 1475w" width="600" height="432"></noscript></div></div><ol start="2">
<li class="">
<p>You can now <strong>view traces directly in the playground</strong>. This feature enables you to debug your application while configuring it—for example, by examining the prompts sent to the LLM or reviewing intermediate outputs.</p>
<div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAj0lEQVR4nFWISwrCMBgGc1PPoCBabaFNm5r8jyZWcCviHbPMJ6EbXQzMjNntWzS9w9l6NGPAZSJcHaN1gm6WMvgFB3vP5tTPSMpYVH6ozVDhskbFLI9sGhvqADODRcD8RxnsgGM3ZmO9gIgQiHHzHqF62JpESvXJSzYpKpg8UhR83i9EJSwSEDdKUsJzpfwFYC2B2vfDOq8AAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="315"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/trace_in_playground.0ea3594.600.png" srcset="/docs/assets/ideal-img/trace_in_playground.0ea3594.600.png 600w,/docs/assets/ideal-img/trace_in_playground.7e4c9c9.1100.png 1100w,/docs/assets/ideal-img/trace_in_playground.2e04150.1600.png 1600w" width="600" height="315"></noscript></div></div>
</li>
</ol><div class="theme-admonition theme-admonition-note admonition_Z3eq alert alert--secondary"><div class="admonitionHeading_dNDH"><span class="admonitionIcon_YCMn"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_qAg3"><p>Both features are available exclusively in the cloud and enterprise versions of agenta.</p></div></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="migration-from-mongodb-to-postgres"><a class="" href="https://agenta.ai/docs/changelog/migration-from-mongodb-to-postgres">Migration from MongoDB to Postgres</a><a href="https://agenta.ai/docs/changelog/main#migration-from-mongodb-to-postgres" class="hash-link" aria-label="Direct link to migration-from-mongodb-to-postgres" title="Direct link to migration-from-mongodb-to-postgres" translate="no">​</a></h3><p><em>9 July 2024</em></p><p><strong>v0.19.0</strong></p><p>We have migrated the Agenta database from MongoDB to Postgres. As a result, the <strong>platform is much more faster</strong> (up to 10x in some use cases).</p><p>However, if you are self-hosting agenta, note that this is a breaking change that requires you to manually migrate your data from MongoDB to Postgres.</p><p>If you are using the cloud version of Agenta, there is nothing you need to do (other than enjoying the new performance improvements).</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="more-reliable-evaluations"><a class="" href="https://agenta.ai/docs/changelog/more-reliable-evaluations">More Reliable Evaluations</a><a href="https://agenta.ai/docs/changelog/main#more-reliable-evaluations" class="hash-link" aria-label="Direct link to more-reliable-evaluations" title="Direct link to more-reliable-evaluations" translate="no">​</a></h3><p><em>5 July 2024</em></p><p><strong>v0.18.0</strong></p><div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAxklEQVR4nB3NW0oEMRAF0KzTFQgi4gttFNJ5VKVSydjOOCrOYLftlzs0X7mCZwPHnJwN/eLB99O7sZ8/cr+0uV+N2m9c7dejdtoecKvHXzM4xvNTBRFBEiExQUVQNIMo4vP4gcP608zgEooqfIggZpRSEIngQ4DkjBgCLGkzQQo2VeG9R2LGplZQjPDeQbL8D1KnZqRMyFphXUCgBNEKFxjWRXBS+JgQODfz/Z4x7xPmF8b6JlhfGV/7iGXnsOxGzFuLZbpvf2V9mus4dpOGAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/export_evaluation.0d349e8.600.png" srcset="/docs/assets/ideal-img/export_evaluation.0d349e8.600.png 600w,/docs/assets/ideal-img/export_evaluation.5ef8c54.1100.png 1100w,/docs/assets/ideal-img/export_evaluation.614b4d9.1600.png 1600w" width="600" height="334"></noscript></div></div><p>We have worked extensively on improving the <strong>reliability of evaluations</strong>. Specifically:</p><ul>
<li class="">We improved the status for evaluations and added a new <code>Queued</code> status</li>
<li class="">We improved the error handling in evaluations. Now we show the exact error message that caused the evaluation to fail.</li>
<li class="">We fixed issues that caused evaluations to run infinitely</li>
<li class="">We fixed issues in the calculation of scores in human evaluations.</li>
<li class="">We fixed small UI issues with large output in human evaluations.</li>
<li class="">We have added a new export button in the evaluation view to export the results as a CSV file.</li>
</ul><p>In <strong>observability</strong>:</p><ul>
<li class="">We have added a <strong>new integration with <a href="https://litellm.ai/" target="_blank" rel="noopener noreferrer" class="">Litellm</a></strong> to automatically trace all LLM calls done through it.</li>
<li class="">Now we automatically propagate cost and token usage from spans to traces.</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="evaluators-can-access-all-columns"><a class="" href="https://agenta.ai/docs/changelog/evaluators-can-access-all-columns">Evaluators can access all columns</a><a href="https://agenta.ai/docs/changelog/main#evaluators-can-access-all-columns" class="hash-link" aria-label="Direct link to evaluators-can-access-all-columns" title="Direct link to evaluators-can-access-all-columns" translate="no">​</a></h3><p><em>4 June 2024</em></p><p><strong>v0.17.0</strong></p><div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA5UlEQVR4nB2G3UrCYABAvxfqFfoxFkUUpmVrn4WbG9sy6M+M/NlM69YgqDAs6NZeomJbb7SLHN8Jd+BwjljQdArFKitlk8Kuxeqejaa7rBs+G/JYrVVsluVFKpaKkub5Gf0gIOh26HXa9MMeg9uQ++FA1U2LTd1JxeK2ZDR6IEp++fr+IU4SojgmipP5q8bJKVuGlwptp8p0+smcv9ksr1Iqb5Zl6rLZoiSdVJSkw9VNl8n7B8/jCS+vb7mPT2Pa4Z3SD22MmpuKsNXANSW+JfHNAzxTx6vt49YqOEdl5Vk6w+t6+g8gnZeOCXJMhgAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="363"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/configure_expected_answer.a3de6e9.600.png" srcset="/docs/assets/ideal-img/configure_expected_answer.a3de6e9.600.png 600w,/docs/assets/ideal-img/configure_expected_answer.59fb37a.1100.png 1100w,/docs/assets/ideal-img/configure_expected_answer.215e237.1600.png 1600w" width="600" height="363"></noscript></div></div><p>Evaluators now can access all columns in the test set. Previously, you were limited to using only the <code>correct_answer</code> column for the ground truth / reference answer in evaluation.
Now you can configure your evaluator to use any column in the test set as the ground truth. To do that, open the collapsable <code>Advanced Settings</code> when configuring the evaluator, and define the <code>Expected Answer Column</code> to the name of the columns containing the reference answer you want to use.</p><p>In addition to this:</p><ul>
<li class="">We've upgraded the SDK to pydantic v2.</li>
<li class="">We have improved by 10x the speed for the get config endpoint</li>
<li class="">We have add documentation for observability</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-llm-provider-welcome-gemini"><a class="" href="https://agenta.ai/docs/changelog/new-llm-provider-welcome-gemini">New LLM Provider: Welcome Gemini!</a><a href="https://agenta.ai/docs/changelog/main#new-llm-provider-welcome-gemini" class="hash-link" aria-label="Direct link to new-llm-provider-welcome-gemini" title="Direct link to new-llm-provider-welcome-gemini" translate="no">​</a></h3><p><em>25 May 2024</em></p><p><strong>v0.14.14</strong></p><p>We are excited to announce the addition of Google's Gemini to our list of supported LLM providers, bringing the total number to 12.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAeElEQVR4nEWOWw5EIQhD3f9W1SgiPtFOMJNcfkjTQ4tjZowxYLNVQbUgcwF3QV8DMgf6HHA5ZxAR7r3QoxBpEKlorUFVX4BtV2tFjBG9d5xzwFwhIk+vtT7Q0kIITxhImeC9f8d77w808/wrzChESCm9dyzVoDknfuwVwqWWQWniAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="306"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/gemini_screenshot.f1eb486.600.png" srcset="/docs/assets/ideal-img/gemini_screenshot.f1eb486.600.png 600w,/docs/assets/ideal-img/gemini_screenshot.24d8e88.1100.png 1100w,/docs/assets/ideal-img/gemini_screenshot.1c0ab9c.1250.png 1250w" width="600" height="306"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="playground-improvements"><a class="" href="https://agenta.ai/docs/changelog/playground-improvements">Playground Improvements</a><a href="https://agenta.ai/docs/changelog/main#playground-improvements" class="hash-link" aria-label="Direct link to playground-improvements" title="Direct link to playground-improvements" translate="no">​</a></h3><p><em>24 May 2024</em></p><p><strong>v0.14.1-13</strong></p><ul>
<li class="">We've improved the workflow for adding outputs to a dataset in the playground. In the past, you had to select the name of the test set each time. Now, the last used test set is selected by default..<!-- -->
<img src="https://agenta.ai/docs/assets/images/default-selected-testset_video-0213133eb72c4acd932d8e5440cf0891.gif" style="display:block;margin:20px auto;text-align:center">
</li>
<li class="">We have significantly improved the debugging experience when creating applications from code. Now, if an application fails, you can view the logs to understand the reason behind the failure.</li>
<li class="">We moved the copy message button in the playground to the output text area.</li>
<li class="">We now hide the cost and usage in the playground when they aren't specified</li>
<li class="">We've made improvements to error messages in the playground</li>
</ul><p><strong>Bug Fixes</strong></p><ul>
<li class="">Fixed the order of the arguments when running a custom code evaluator</li>
<li class="">Fixed the timestamp in the Testset view (previous stamps was droppping the trailing 0)</li>
<li class="">Fixed the creation of application from code in the self-hosted version when using Windows</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="prompt-and-configuration-registry"><a class="" href="https://agenta.ai/docs/changelog/prompt-and-configuration-registry">Prompt and Configuration Registry</a><a href="https://agenta.ai/docs/changelog/main#prompt-and-configuration-registry" class="hash-link" aria-label="Direct link to prompt-and-configuration-registry" title="Direct link to prompt-and-configuration-registry" translate="no">​</a></h3><p><em>1 May 2024</em></p><p><strong>v0.14.0</strong></p><p>We've introduced a feature that allows you to use Agenta as a prompt registry or management system. In the deployment view, we now provide an endpoint to directly fetch the latest version of your prompt. Here is how it looks like:</p><div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">from agenta import Agenta</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">agenta = Agenta()</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">config = agenta.get_config(base_id="xxxxx", environment="production", cache_timeout=200) # Fetches the configuration with caching</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div><p>You can find additional documentation <a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/integrating-with-agenta">here</a>.</p><p><strong>Improvements</strong></p><ul>
<li class="">Previously, publishing a variant from the playground to an environment was a manual process., from now on we are publishing by default to the production environment.<!-- -->
<div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAHCAYAAAAxrNxjAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAmElEQVR4nGWP2woCMQxE+///5weID+rDLt3ecm1H2kURDRwSwjCZhG3bcHve0YjAzBAR9N7xW6EJ4fK4IrcCU12iMca/sBJBmJfgjbsvzAyqChJBYFEcOa+l+URX9+5gYZRWsdeEQMyIZUeRA6QVbARWghh/6GNMR0YqEbklEBHcHKOPM2sf5+x9OgpiPJBSWh/Pk99537wAhxESLxaPeloAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="430"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/publish_to_production_by_default_screenshot.e5e949a.600.png" srcset="/docs/assets/ideal-img/publish_to_production_by_default_screenshot.e5e949a.600.png 600w,/docs/assets/ideal-img/publish_to_production_by_default_screenshot.c3effe1.1100.png 1100w,/docs/assets/ideal-img/publish_to_production_by_default_screenshot.eff8fcb.1600.png 1600w" width="600" height="430"></noscript></div>
</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="miscellaneous-improvements"><a class="" href="https://agenta.ai/docs/changelog/miscellaneous-improvements">Miscellaneous Improvements</a><a href="https://agenta.ai/docs/changelog/main#miscellaneous-improvements" class="hash-link" aria-label="Direct link to miscellaneous-improvements" title="Direct link to miscellaneous-improvements" translate="no">​</a></h3><p><em>28 April 2024</em></p><p><strong>v0.13.8</strong></p><ul>
<li class="">The total cost of an evaluation is now displayed in the evaluation table. This allows you to understand how much evaluations are costing you and track your expenses.<!-- -->
<div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAABCAYAAADn9T9+AAAACXBIWXMAABYlAAAWJQFJUiTwAAAAJklEQVR4nB3DQQ4AIAzDMP7/3rUZpyBhyaes4Uornb8gYBIncXd9G+4m1BoGQgAAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="72"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/total_cost_screenshot.718a1e2.600.png" srcset="/docs/assets/ideal-img/total_cost_screenshot.718a1e2.600.png 600w,/docs/assets/ideal-img/total_cost_screenshot.d916b37.1100.png 1100w,/docs/assets/ideal-img/total_cost_screenshot.2e90ed2.1600.png 1600w" width="600" height="72"></noscript></div>
</li>
</ul><p><strong>Bug Fixes</strong></p><ul>
<li class="">Fixed sidebar focus in automatic evaluation results view</li>
<li class="">Fix the incorrect URLs shown when running agenta variant serve</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="evaluation-speed-increase-and-numerous-quality-of-life-improvements"><a class="" href="https://agenta.ai/docs/changelog/evaluation-speed-increase-and-numerous-quality-of-life-improvements">Evaluation Speed Increase and Numerous Quality of Life Improvements</a><a href="https://agenta.ai/docs/changelog/main#evaluation-speed-increase-and-numerous-quality-of-life-improvements" class="hash-link" aria-label="Direct link to evaluation-speed-increase-and-numerous-quality-of-life-improvements" title="Direct link to evaluation-speed-increase-and-numerous-quality-of-life-improvements" translate="no">​</a></h3><p><em>23rd April 2024</em></p><p><strong>v0.13.1-5</strong></p><ul>
<li class="">We've improved the speed of evaluations by 3x through the use of asynchronous batching of calls.</li>
<li class="">We've added Groq as a new provider along with Llama3 to our playground.</li>
</ul><p><strong>Bug Fixes</strong></p><ul>
<li class="">Resolved a rendering UI bug in Testset view.</li>
<li class="">Fixed incorrect URLs displayed when running the 'agenta variant serve' command.</li>
<li class="">Corrected timestamps in the configuration.</li>
<li class="">Resolved errors when using the chat template with empty input.</li>
<li class="">Fixed latency format in evaluation view.</li>
<li class="">Added a spinner to the Human Evaluation results table.</li>
<li class="">Resolved an issue where the gitignore was being overwritten when running 'agenta init'.</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="observability-beta"><a class="" href="https://agenta.ai/docs/changelog/observability-beta">Observability (beta)</a><a href="https://agenta.ai/docs/changelog/main#observability-beta" class="hash-link" aria-label="Direct link to observability-beta" title="Direct link to observability-beta" translate="no">​</a></h3><p><em>14th April 2024</em></p><p><strong>v0.13.0</strong></p><p>You can now monitor your application usage in production. We've added a new observability feature (currently in beta), which allows you to:</p><ul>
<li class="">Monitor cost, latency, and the number of calls to your applications in real-time.</li>
<li class="">View the logs of your LLM calls, including inputs, outputs, and used configurations. You can also add any interesting logs to your test set.</li>
<li class="">Trace your more complex LLM applications to understand the logic within and debug it.</li>
</ul><p>As of now, all new applications created will include observability by default. We are working towards a GA version in the next weeks, which will be scalable and better integrated with your applications. We will also be adding tutorials and documentation about it.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAwklEQVR4nCWNTWsCQRBE5wcH8kPUJSdBCIR4SHJUhAjrXoxX40HjYU+rFzV+IIJknJmeeaHXhqK7q6Ce6c/LNK++sbs3bPWMWz7hFi38osll2qU7XvFRTMT4/RdWIIgQE0QgRJAEJEFH1gXmUA6xmsZQK4rn5i1R9HY44LocYP7WBU4bNAwBCb7eSQn+VhPczxCzr2Z1fYxqKTrhknL1uaNdOcI8vs7SOM85FRmbXovtZ8buvcH2pcmxl3HO2/x2HuQfpLrbuMNh5uIAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/observability_beta_light.ac76a44.600.png" srcset="/docs/assets/ideal-img/observability_beta_light.ac76a44.600.png 600w,/docs/assets/ideal-img/observability_beta_light.14ab0ec.1100.png 1100w,/docs/assets/ideal-img/observability_beta_light.932f794.1600.png 1600w" width="600" height="334"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA3ElEQVR4nCWKvUvDQBxAb7RtPpocd/dLGjIUsUJnW1CKWjAQxMFBB83k1MGAIKYodnBy6qJV8H8tfWId3vB4T90vvzb1y4zv94LV6yGfD0M+6gGresByNmF83XDXLNaqOt+l5Wm6kcULzZYwFvzIEUaanbZPdSyoqtwn1ELiDGINIoakJ7g/F0NLO26OUtRt0SeIhSxxiLX/s7U4a0jF4sWGapSgzsZ7tP2QQDs6Xb0NvhU6kSaINKodc3mQoPpXb5tyOmVxkdOc5syLnPko5XHoeJ70eDrJ+Cmz9S91J11/V04pywAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/observability_beta_dark.4b870fd.600.png" srcset="/docs/assets/ideal-img/observability_beta_dark.4b870fd.600.png 600w,/docs/assets/ideal-img/observability_beta_dark.2e99476.1100.png 1100w,/docs/assets/ideal-img/observability_beta_dark.622fb91.1600.png 1600w" width="600" height="334"></noscript></div><p>Find examples of LLM apps created from code with observability <a href="https://github.com/Agenta-AI/agenta/tree/main/examples/app_with_observability" _target="_blank">here</a>.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="compare-latency-and-costs"><a class="" href="https://agenta.ai/docs/changelog/compare-latency-and-costs">Compare latency and costs</a><a href="https://agenta.ai/docs/changelog/main#compare-latency-and-costs" class="hash-link" aria-label="Direct link to compare-latency-and-costs" title="Direct link to compare-latency-and-costs" translate="no">​</a></h3><p><em>1st April 2024</em></p><p><strong>v0.12.6</strong></p><p>You can now compare the latency and cost of different variants in the evaluation view.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAw0lEQVR4nAXBYUvCQACA4fvP/Yk+BTUQ6qiViJR+kBYjkCQF9XY7b942hUAwSIhChULFD7q9Po/ot73SdmvYro99v8W8VtDhFTr0sKGk2nqjXrsshO08kGQzknHGyKbM5gu+vpd8Ln5IXI52GYPWBUKFEmVStI6IoyGr5S/F8cBuu8FohTKOuOEhRu17TD7HuCnjyZT13z+HsmS33+PyCVH6QRxIxLlslL2XJ6LghmFTYgIf2/RR9TvU4zX6ucqgclacAA50pN9IzjMnAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="297"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/compare_latency_and_cost_light.f35fe2a.600.png" srcset="/docs/assets/ideal-img/compare_latency_and_cost_light.f35fe2a.600.png 600w,/docs/assets/ideal-img/compare_latency_and_cost_light.e18b485.1100.png 1100w,/docs/assets/ideal-img/compare_latency_and_cost_light.c3c7a64.1600.png 1600w" width="600" height="297"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAt0lEQVR4nBXDPU/CUBiA0TuRaODKez/BNmljtbWlUSMDOhASDQMDmCg60E78/78AD/EkRx27j/Phe0W/X9H/LOm+3um2C7rdG/1uyXr7y99+c1JPD5bB9ZiR/i+Im2BjgvFTtBhGxrGoHer1MaJNxIWAdY68KKialvuqJk4i2gbmuaCeS89Ae4biGY4tMc3Jyoa0KLlxgSsJvNw5VFrPz0UaaDNDkwizzNJODbU3zG6FKjF8VnK6AAGqR1fy9jfnAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="297"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/compare_latency_and_cost_dark.7e7e47e.600.png" srcset="/docs/assets/ideal-img/compare_latency_and_cost_dark.7e7e47e.600.png 600w,/docs/assets/ideal-img/compare_latency_and_cost_dark.c8b3e47.1100.png 1100w,/docs/assets/ideal-img/compare_latency_and_cost_dark.c461550.1600.png 1600w" width="600" height="297"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="minor-improvements"><a class="" href="https://agenta.ai/docs/changelog/minor-improvements">Minor improvements</a><a href="https://agenta.ai/docs/changelog/main#minor-improvements" class="hash-link" aria-label="Direct link to minor-improvements" title="Direct link to minor-improvements" translate="no">​</a></h3><p><em>31st March 2024</em></p><p><strong>v0.12.5</strong></p><p><strong>Toggle variants in comparison view</strong></p><p>You can now toggle the visibility of variants in the comparison view, allowing you to compare a multitude of variants side-by-side at the same time.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAArUlEQVR4nB3J22rCMACA4bw/PsbeQGS2VRmrIIKnXUwcG1TRCw9E0qRJStLmH9vFd/UJeRomf13Q3kvC7Z1wmRDOOfFcEKs3tt8P9utxL1Q1ReqEMS3GQUjQhISLCRs6fIL4M0K4akobwdQSrRVaKZxt8K7BW4ON4HdDhD/OCIBUBm1bZG2JfU/3p4v/Fw4F4qXYpMfXCvU55rnOkR8Zepkjy4x6lWEPc+rXQf8Lcb+4rgdySYYAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="297"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/toggle_variants_visibility_light.d8eb5fb.600.png" srcset="/docs/assets/ideal-img/toggle_variants_visibility_light.d8eb5fb.600.png 600w,/docs/assets/ideal-img/toggle_variants_visibility_light.13d9048.1100.png 1100w,/docs/assets/ideal-img/toggle_variants_visibility_light.6138263.1600.png 1600w" width="600" height="297"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAwElEQVR4nB3D20rDMACA4byBiLqDNaHLuqRJuq4V1oF4wAuH01kEnYiDrWxe+gheeeVLb7/gB5/4/X7aL+eB7SKnqS3r+5jVVP5v7gbkFzO+tvVOLB8Dx1FK3HN0ZcZhW9FShtZZQlsmnMiE1U0X0cwDUd+TFzlhNGJYFKQ+YL3H+UBkhmyuNWL94DiSFldW9LOStKzoaEtHG0614UA53q80YvL8uZ9WgY9bw2KS8HY54CVT1Fbxeh5TjzU/s3j3B347UHU4uxO/AAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="297"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/toggle_variants_visibility_dark.4bba4f6.600.png" srcset="/docs/assets/ideal-img/toggle_variants_visibility_dark.4bba4f6.600.png 600w,/docs/assets/ideal-img/toggle_variants_visibility_dark.5c83fe9.1100.png 1100w,/docs/assets/ideal-img/toggle_variants_visibility_dark.0211bc6.1600.png 1600w" width="600" height="297"></noscript></div><p><strong>Improvements</strong></p><ul>
<li class="">You can now add a datapoint from the playground to the test set even if there is a column mismatch</li>
</ul><p><strong>Bug fixes</strong></p><ul>
<li class="">Resolved issue with "Start Evaluation" button in Testset view</li>
<li class="">Fixed bug in CLI causing variant not to serve</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-evaluators"><a class="" href="https://agenta.ai/docs/changelog/new-evaluators">New evaluators</a><a href="https://agenta.ai/docs/changelog/main#new-evaluators" class="hash-link" aria-label="Direct link to new-evaluators" title="Direct link to new-evaluators" translate="no">​</a></h3><p><em>25th March 2024</em></p><p><strong>v0.12.4</strong></p><p>We have added some more evaluators, a new string matching and a Levenshtein distance evaluation.</p><p><strong>Improvements</strong></p><ul>
<li class="">Updated documentation for human evaluation</li>
<li class="">Made improvements to Human evaluation card view</li>
<li class="">Added dialog to indicate testset being saved in UI</li>
</ul><p><strong>Bug fixes</strong></p><ul>
<li class="">Fixed issue with viewing the full output value during evaluation</li>
<li class="">Enhanced error boundary logic to unblock user interface</li>
<li class="">Improved logic to save and retrieve multiple LLM provider keys</li>
<li class="">Fixed Modal instances to support dark mode</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="minor-improvements-1"><a class="" href="https://agenta.ai/docs/changelog/minor-improvements-2">Minor improvements</a><a href="https://agenta.ai/docs/changelog/main#minor-improvements-1" class="hash-link" aria-label="Direct link to minor-improvements-1" title="Direct link to minor-improvements-1" translate="no">​</a></h3><p><em>11th March 2024</em></p><p><strong>v0.12.3</strong></p><ul>
<li class="">Improved the logic of the Webhook evaluator</li>
<li class="">Made the inputs in the Human evaluation view non-editable</li>
<li class="">Added an option to save a test set in the Single model evaluation view</li>
<li class="">Included the evaluator name in the "Configure your evaluator" modal</li>
</ul><p><strong>Bug fixes</strong></p><ul>
<li class="">Fixed column resize in comparison view</li>
<li class="">Resolved a bug affecting the evaluation output in the CSV file</li>
<li class="">Corrected the path to the Evaluators view when navigating from Evaluations</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="highlight-ouput-difference-when-comparing-evaluations"><a class="" href="https://agenta.ai/docs/changelog/highlight-ouput-difference-when-comparing-evaluations">Highlight ouput difference when comparing evaluations</a><a href="https://agenta.ai/docs/changelog/main#highlight-ouput-difference-when-comparing-evaluations" class="hash-link" aria-label="Direct link to highlight-ouput-difference-when-comparing-evaluations" title="Direct link to highlight-ouput-difference-when-comparing-evaluations" translate="no">​</a></h3><p><em>4th March 2024</em></p><p><strong>v0.12.2</strong></p><p>We have improved the evaluation comparison view to show the difference to the expected output.</p><p><strong>Improvements</strong></p><ul>
<li class="">Improved the error messages when invoking LLM applications</li>
<li class="">Improved "Add new evaluation" modal</li>
<li class="">Upgraded Sidemenu to display Configure evaluator and run evaluator under Evaluations section</li>
<li class="">Changed cursor to pointer when hovering over evaluation results</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="deployment-versioning-and-rbac"><a class="" href="https://agenta.ai/docs/changelog/deployment-versioning-and-rbac">Deployment Versioning and RBAC</a><a href="https://agenta.ai/docs/changelog/main#deployment-versioning-and-rbac" class="hash-link" aria-label="Direct link to deployment-versioning-and-rbac" title="Direct link to deployment-versioning-and-rbac" translate="no">​</a></h3><p><em>14th February 2024</em></p><p><strong>v0.12.0</strong></p><p><strong>Deployment versioning</strong></p><p>You now have access to a history of prompts deployed to our three environments. This feature allows you to roll back to previous versions if needed.</p><p><strong>Role-Based Access Control</strong></p><p>You can now invite team members and assign them fine-grained roles in agenta.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAICAYAAADA+m62AAAACXBIWXMAABYlAAAWJQFJUiTwAAAA2ElEQVR4nE1Py2qEQBCcP/fswRezIKs3iUFPm0vID3jYqPMFgVxyCSgqPtFRK3SHDSkoeqanqrpHWJaFKIoQxzEzSZI/pmnKPcdxIKSUqKoKdV2jaRqc58k8jgOEeZ4RBAGEaZos7LqOhW3b8rnve4zjyDUMQwjDMPhxXVd2L8sCrTWnEujOQs/z2EXY951TKPEhnqYJvu9D2LbNo8lJe23bxvy/I/1DuK6LYRi4+RhHddca6vsT6usD4fX6m5hlGZRSKIoCZVkiz3O83+94envB8+sNFynxAx5yFzOaQJI2AAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="489"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/rbac_light.44dba6e.600.png" srcset="/docs/assets/ideal-img/rbac_light.44dba6e.600.png 600w,/docs/assets/ideal-img/rbac_light.c6b5283.996.png 996w" width="600" height="489"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAICAYAAADA+m62AAAACXBIWXMAABYlAAAWJQFJUiTwAAAAzklEQVR4nFWQTWuEMBRFsxU1Rhl8SUw1ZvyAQYlg7bLLdtv//2dueRkYmMXJg7zD5SYiz3MopdA0Deq6RlVVbyilUBQFRJZlmOcZ13XhOA547zGO4wvvPaSUEJy0LAvO80wiE2PEtm3Y9x3TND3FruvQ9z1CCAlOcc7BGANrLXhfliUEH3yhtU5Lltd1TYKxBu7Dpf6CC3MKCyyykBKtATU3WNJoiSD4RcMwJJkn1+B5DwGffz84fr+hZAXB39O2LYjoDU2E+1eEjw/IosQ/J6lqBP/hiJQAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="489"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/rbac_dark.7bcd35c.600.png" srcset="/docs/assets/ideal-img/rbac_dark.7bcd35c.600.png 600w,/docs/assets/ideal-img/rbac_dark.3ef5ea5.996.png 996w" width="600" height="489"></noscript></div><p><strong>Improvements</strong></p><ul>
<li class="">We now prevent the deletion of test sets that are used in evaluations</li>
</ul><p><strong>Bug fixes</strong></p><ul>
<li class="">
<p>Fixed bug in custom code evaluation aggregation. Up until know the aggregated result for custom code evalution where not computed correctly.</p>
</li>
<li class="">
<p>Fixed bug with Evaluation results not being exported correctly</p>
</li>
<li class="">
<p>Updated documentation for vision gpt explain images</p>
</li>
<li class="">
<p>Improved Frontend test for Evaluations</p>
</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="minor-fixes"><a class="" href="https://agenta.ai/docs/changelog/minor-fixes">Minor fixes</a><a href="https://agenta.ai/docs/changelog/main#minor-fixes" class="hash-link" aria-label="Direct link to minor-fixes" title="Direct link to minor-fixes" translate="no">​</a></h3><p><em>4th February 2024</em></p><p><strong>v0.10.2</strong></p><ul>
<li class="">Addressed issue when invoking LLM app with missing LLM provider key</li>
<li class="">Updated LLM providers in Backend enum</li>
<li class="">Fixed bug in variant environment deployment</li>
<li class="">Fixed the sorting in evaluation tables</li>
<li class="">Made use of server timezone instead of UTC</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="prompt-versioning"><a class="" href="https://agenta.ai/docs/changelog/prompt-versioning">Prompt Versioning</a><a href="https://agenta.ai/docs/changelog/main#prompt-versioning" class="hash-link" aria-label="Direct link to prompt-versioning" title="Direct link to prompt-versioning" translate="no">​</a></h3><p><em>31st January 2024</em></p><p><strong>v0.10.0</strong></p><p>We've introduced the feature to version prompts, allowing you to track changes made by the team and revert to previous versions. To view the change history of the configuration, click on the sign in the playground to access all previous versions.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA40lEQVR4nD2NPUvDUAAA3+9zdBDq7OzHGkQDbZq8vHy2JFIqKnWwRPADoeAkDm0NaNGG1g6Cg4u8gpODSOmJFjy47eDEyd3TvF/00C9H6CJB5xLdt5j2KkzzFu3uG2c33Zm4ahkE0R5xLSWuN6glTerpPn6UMn6e8Mv36zXiOFhnt+zgShtXOihX4imXqlVh+DjgC3i/zxCnySaWDPHVIgg8j9D3kbbNcPCwCPMM0VAGjuvh+NG/MowpS0UxGv+tP0YdxNJ2e141TTrWKudGiQuzxOXWCtnaMpODHT5vD9HNjdkP8COxBgQC21UAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/prompt_versioning_light.27de63d.600.png" srcset="/docs/assets/ideal-img/prompt_versioning_light.27de63d.600.png 600w,/docs/assets/ideal-img/prompt_versioning_light.ba71775.1100.png 1100w,/docs/assets/ideal-img/prompt_versioning_light.d0b45ec.1600.png 1600w" width="600" height="334"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA4klEQVR4nCXFzUvCYADA4be2trZ378uy0GnMOSJbH9AhCqImKAl9HAI9d1Fx4KERFX1A14h56dzfKv4iOzw8Ivv6ng/zAdOPNsXTMcVklyJrMh1v8zlKOek/MHh8m4lep4YQKziOwrQ8jD+2YtmSlIMalWrI8LKB6KUhtvRZ8xVaeyglF6R0iKOQcOeA8XWC6LeqmLam5Gt8rdHKW/Cky1ZUp5Hsk13tIdpHMUuGhbnqYTre/67CcCXl2ialqMldN0HUb9/nnbNznrsB96cBeatCfrjBpLnOaxrychHzc1Of/QL3qlsbPQWnHgAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/prompt_versioning_dark.9c1ae48.600.png" srcset="/docs/assets/ideal-img/prompt_versioning_dark.9c1ae48.600.png 600w,/docs/assets/ideal-img/prompt_versioning_dark.389b893.1100.png 1100w,/docs/assets/ideal-img/prompt_versioning_dark.6960d44.1600.png 1600w" width="600" height="334"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="new-json-evaluator"><a class="" href="https://agenta.ai/docs/changelog/new-json-evaluator">New JSON Evaluator</a><a href="https://agenta.ai/docs/changelog/main#new-json-evaluator" class="hash-link" aria-label="Direct link to new-json-evaluator" title="Direct link to new-json-evaluator" translate="no">​</a></h3><p><em>30th January 2024</em></p><p><strong>v0.9.1</strong>
We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the <code>correct_answer</code> column as the ground truth.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAICAYAAADA+m62AAAACXBIWXMAAAsTAAALEwEAmpwYAAABKklEQVR4nC3MyUoCAQCA4XmOCIrIByg6GToSkdsszmI5jskMElEQUmkJuQwaUjZmYlmXLpWX6hYEnYLooYKoP5DuH58Q2z/91SsdzMoJh94OfitHp25RPtgmttlALDSYW/d+BNvNk7MyOPkcR5UK1WqNWr1BubRH3rbI2Vnc9VUEx1ommlDQDQPf9+n3L7i5uabZbJGUJKJJFVcPITjmEnFJwzTTdLtdzns9hsMrPM9DkWWSikZBXURw0xHisoZumLTbbc7Ga38MVeUfyiGEjJEgKSu4rsPz0yOXgwGj0QPHzSaaqqBqBoVU+FdYVjKkUxLZrM3L6xvvH5/c3t2zUSyS0k0kcw1HEhFmgunvWdFGXYmwG55kKzjFTmiK4vw0JTHAXjjA/sLE1x8tyq8L3upbDAAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="462"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/new_json_evaluator_light.7562379.600.png" srcset="/docs/assets/ideal-img/new_json_evaluator_light.7562379.600.png 600w,/docs/assets/ideal-img/new_json_evaluator_light.c5729bd.1100.png 1100w,/docs/assets/ideal-img/new_json_evaluator_light.f6cbb70.1600.png 1600w" width="600" height="462"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAICAYAAADA+m62AAAACXBIWXMAAAsTAAALEwEAmpwYAAABD0lEQVR4nCXO20vCUACA8RObm9vO2X3TeQ8vXaZWE3uIiF5KCHrUCJQMCnuop/75IPKL8v3HxyeK1ef2ZL6hWGx4eL7n4/WK9/UF88UdBzcrGtdL6rO3H9Hpd9GEwDYNfFdhOxLHkUjbwjZL6JrGYa+JKPIUvaxQrkeW1ciqFeI4wg9CpFKULJfpIECc9iJ0UyGVR5IkpGlKGIb4vo90dnDSVohiEKKZu+IfjOKIOIoI/AApFSXbZdJyEYNOFc2w/t9GwyHn0ymtZgPPdZGOjW5anLXdrVBZF0PbwzAt0lqDJKvjJxWcNEEzygjdIq9IRDycfXvHt1xORqzHksfcYzX2We17POUByyOPl776+gVbZmuYPtGangAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="462"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/new_json_evaluator_dark.209a965.600.png" srcset="/docs/assets/ideal-img/new_json_evaluator_dark.209a965.600.png 600w,/docs/assets/ideal-img/new_json_evaluator_dark.52cfa05.1100.png 1100w,/docs/assets/ideal-img/new_json_evaluator_dark.f63fc9a.1600.png 1600w" width="600" height="462"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="improved-error-handling-in-evaluation"><a class="" href="https://agenta.ai/docs/changelog/improved-error-handling-in-evaluation">Improved error handling in evaluation</a><a href="https://agenta.ai/docs/changelog/main#improved-error-handling-in-evaluation" class="hash-link" aria-label="Direct link to improved-error-handling-in-evaluation" title="Direct link to improved-error-handling-in-evaluation" translate="no">​</a></h3><p><em>29th January 2024</em></p><p><strong>v0.9.0</strong></p><p>We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA7UlEQVR4nAXBTUvCYADA8X2yDn2BYkT3ahAIHjpU0KsdZhkVYQW9ku0gbI97tmdua1sqoQXRJ0nIEWUdSvz3+2lrx9Xx2aWJd1dAXszhHOg4+1PIQx17z2Bx+5ziVnmkNWomjggQboBUMY/dF7rPr6gwxVNNgjBG3eygiWsTL4gQjo3yffJ8wHD4SZomuMLBDWLEySaarJVRYYLyXKKwSb//Rj54p5Vl+LJBM06RpyU0+7ZK1mqjooTkocPH1zc/v390ek+o6J6w3UNeVdAmC0fj0uoy9XUda2mGYNcg3jCw5nWs4jT1lVmshYnRP81PsCZo0ZZVAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/improved_error_handling_light.a6828bc.600.png" srcset="/docs/assets/ideal-img/improved_error_handling_light.a6828bc.600.png 600w,/docs/assets/ideal-img/improved_error_handling_light.526f69f.1100.png 1100w,/docs/assets/ideal-img/improved_error_handling_light.679c2af.1600.png 1600w" width="600" height="334"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA0ElEQVR4nB2IXUsCQQAA96U89z52b/f2ehA05U7PvZLzKuglUigyrBApUvr//0Kc0IeBmRGLzfb4snpk/1WyW/X5Xebsni37heXnach1+8r98u0g7nxOIBUyUgQyIUwsYWIIQkU3jOnKiLZIEM3YEeoM6xzGOYpqSlFVmCwjtQaZOtpRipiVlk6kSfSJlLGvmfj67InSdGJF01eIcpBzISMu49PUDKe3jPyMQJlzC6nxgxRh5uvj/MazaQzryvD90GNbXfHe03xODB/e8lenh38IUlKQ0TkhgQAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="334"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/improved_error_handling_dark.d1e3f91.600.png" srcset="/docs/assets/ideal-img/improved_error_handling_dark.d1e3f91.600.png 600w,/docs/assets/ideal-img/improved_error_handling_dark.121521b.1100.png 1100w,/docs/assets/ideal-img/improved_error_handling_dark.3811502.1600.png 1600w" width="600" height="334"></noscript></div><p><strong>Improvements</strong>:</p><ul>
<li class="">Added the option in A/B testing human evaluation to mark both variants as correct</li>
<li class="">Improved loading state in Human Evaluation</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="bring-your-own-api-key"><a class="" href="https://agenta.ai/docs/changelog/bring-your-own-api-key">Bring your own API key</a><a href="https://agenta.ai/docs/changelog/main#bring-your-own-api-key" class="hash-link" aria-label="Direct link to bring-your-own-api-key" title="Direct link to bring-your-own-api-key" translate="no">​</a></h3><p><em>25th January 2024</em></p><p><strong>v0.8.3</strong></p><p>Up until know, we required users to use our OpenAI API key when using cloud. Starting now, you can use your own API key for any new application you create.</p><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="improved-human-evaluation-workflow"><a class="" href="https://agenta.ai/docs/changelog/improved-human-evaluation-workflow">Improved human evaluation workflow</a><a href="https://agenta.ai/docs/changelog/main#improved-human-evaluation-workflow" class="hash-link" aria-label="Direct link to improved-human-evaluation-workflow" title="Direct link to improved-human-evaluation-workflow" translate="no">​</a></h3><p><em>24th January 2024</em></p><p><strong>v0.8.2</strong></p><p><strong>Faster human evaluation workflow</strong></p><p>We have updated the human evaluation table view to add annotation and correct answer columns.</p><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAHCAYAAAAxrNxjAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA5klEQVR4nC2OMUsDQRSE90da+w80kspSURRBC1vFIHbaBQVBDqyCqCAGo0KaHEQCyQUEL7d7u/v2fZLTgQ+GYWDG7D7M9Pjlm6f8Eaan+OEh/mOH9HZAlt2x2c1pXw2SmU6eSQnQRFIaYgLRZSQsFb56mGJ0i/svgqKAJEVVEUlEwH52MVWeUSfwtUNEcHVNCIEYPNFbrCjV6zXmZ3SvVsDbBSk4Ym0bxDl8OaMs51T9GzX9Yd78iPo3W3nBS2p8CAsCSv2eYVaPMtm+6DG43KM4bzHubDA5aTHZX6M4W2feaTPeWgm/tz3+d7w82vcAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="408"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/improved_human_eval_workflow_light.6bead5c.600.png" srcset="/docs/assets/ideal-img/improved_human_eval_workflow_light.6bead5c.600.png 600w,/docs/assets/ideal-img/improved_human_eval_workflow_light.3ae92fb.1100.png 1100w,/docs/assets/ideal-img/improved_human_eval_workflow_light.075592d.1600.png 1600w" width="600" height="408"></noscript></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAHCAYAAAAxrNxjAAAACXBIWXMAAAsTAAALEwEAmpwYAAABCElEQVR4nB3HvUsCYRzA8dsKze7Ou99zb8/pGUphRVAUCEIUJQUqZIRCZEPQ2pS09rYFDWKQ0eRQs0M09JdF+Q36bB+jPfya7N2/cTe85XN0xLhfY/xY5eNhm17vkuXzJ7auRr9G57CClTGRrIVpOpiWi5X1MLOCJw4z6VmOd4oYJ60yGSdEKUHERVwXpXxEBFE+KVPobvoYZ80yKkqIdUQQBEQ6Joyi/ye5CD+f0K1ojM5uaWKKJh+H6EChA4849NCex2JJszA/x2k1NzGWVjew01OYjpCyFRLlsVTItC2EuRA71jRWfIxi6/qnXL/gpr3Oc0MzOCgw2M/RXwt4bRZ4qSe814LvPzJ/cIKSlrkAAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="408"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/improved_human_eval_workflow_dark.b8b5325.600.png" srcset="/docs/assets/ideal-img/improved_human_eval_workflow_dark.b8b5325.600.png 600w,/docs/assets/ideal-img/improved_human_eval_workflow_dark.f840005.1100.png 1100w,/docs/assets/ideal-img/improved_human_eval_workflow_dark.7ee650e.1600.png 1600w" width="600" height="408"></noscript></div><p><strong>Improvements</strong>:</p><ul>
<li class="">Simplified the database migration process</li>
<li class="">Fixed environment variable injection to enable cloud users to use their own keys</li>
<li class="">Disabled import from endpoint in cloud due to security reasons</li>
<li class="">Improved query lookup speed for evaluation scenarios</li>
<li class="">Improved error handling in playground</li>
</ul><p><strong>Bug fixes</strong>:</p><ul>
<li class="">Resolved failing Backend tests</li>
<li class="">Fixed a bug in rate limit configuration validation</li>
<li class="">Fixed issue with all aggregated results</li>
<li class="">Resolved issue with live results in A/B testing evaluation not updating</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="revamping-evaluation"><a class="" href="https://agenta.ai/docs/changelog/revamping-evaluation">Revamping evaluation</a><a href="https://agenta.ai/docs/changelog/main#revamping-evaluation" class="hash-link" aria-label="Direct link to revamping-evaluation" title="Direct link to revamping-evaluation" translate="no">​</a></h3><p><em>22nd January 2024</em></p><p><strong>v0.8.0</strong></p><p>We've spent the past month re-engineering our evaluation workflow. Here's what's new:</p><p><strong>Running Evaluations</strong></p><ol>
<li class="">Simultaneous Evaluations: You can now run multiple evaluations for different app variants and evaluators concurrently.</li>
</ol><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAJCAYAAAALpr0TAAAACXBIWXMAABYlAAAWJQFJUiTwAAAA6ElEQVR4nFWN70rEMBDE8/7gK/hdUatcq9Ve767v4MeKHNc2TZs0JilS+m8kC4VzYZhd9scMi+MYRVEgz3MopUhSSlLbttBa4zkIwA7HI6ZpgrUWwzBgHEdss64r+VPwApamKYHGGHIPe/fQsiwEPgY7n5ih7x1E04DXAs45Sr0GX6MILEkzSlFSwhmJ3lmCPbRVRx5835/QiBqf5wtKVaMVJYyxBCzLFfixP8FZg6rVqJSD1j+Y5/lfdRiGYMkho2N7bnV+hFtR/wKhTwyjN3Rdh7IsUXEOzjnt5+8v3CYFbnYN7u4f8Acfi00iacZMgAAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="565"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/eval_1.cce55e7.600.png" srcset="/docs/assets/ideal-img/eval_1.cce55e7.600.png 600w,/docs/assets/ideal-img/eval_1.1415869.1100.png 1100w" width="600" height="565"></noscript></div><ol start="2">
<li class="">Rate Limit Parameters: Specify these during evaluations and reattempts to ensure reliable results without exceeding open AI rate limits.</li>
</ol><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAGCAYAAAD68A/GAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAnUlEQVR4nFWOQQrCMBBFe/+NN3AjughupIs2C2P3UrFdaG6gIAiaJsZkki8J2NKBz/CHx2OKXSXweBOGQUEpBa0N/hNjzJtzjqKsD3h+gEB+BIgI3vs5uBcCgRystfkYQsjm1D2FCRRC4OscjDHZkGwJTDH6NYFN04ym+W8R13tEewOqmqNgjEFKia7r0Pf9mFN7xLI8Y7G9YLXe4AeNR92m6UM/5wAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="361"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/eval_2.b64ca6c.600.png" srcset="/docs/assets/ideal-img/eval_2.b64ca6c.600.png 600w,/docs/assets/ideal-img/eval_2.559f14b.1100.png 1100w" width="600" height="361"></noscript></div><ol start="3">
<li class="">Reusable Evaluators: Configure evaluators such as similarity match, regex match, or AI critique and use them across multiple evaluations.</li>
</ol><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAICAYAAADA+m62AAAACXBIWXMAABYlAAAWJQFJUiTwAAAAwElEQVR4nF2PW2rDMBREtf9FdA3dQqEbKP5LwVi1nMbYlqgsK8V6nCDl0dCB4V7EXOZINE1D13WM48iyLMzz/PA0TRhjaNsWMQyKGCM5Z/4r3960NggpJc65Go4pERJ1DyGQUroFNUIphbUWYzQ/68ZxXnGb5+x9PSgq9aLvezZ/Zo+JlCIxhkflfdag/FI4v/O751pbvMd8dXhi/DicsL585srzrMLoAnxPBnH4bLF2rYyl4s+a4Xji5c3y+i65ACW6M4pykrLUAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="488"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/eval_3.df39657.600.png" srcset="/docs/assets/ideal-img/eval_3.df39657.600.png 600w,/docs/assets/ideal-img/eval_3.191fb36.1100.png 1100w,/docs/assets/ideal-img/eval_3.7b3b11f.1252.png 1252w" width="600" height="488"></noscript></div><p><strong>Evaluation Reports</strong></p><ol>
<li class="">Dashboard Improvements: We've upgraded our dashboard interface to better display evaluation results. You can now filter and sort results by evaluator, test set, and outcomes.</li>
</ol><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAECAYAAAC3OK7NAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAZklEQVR4nCWMQQ4EIQzD+P9jdwWlUGiLR8wcrESykjLnRHUgInT5U/ugqVFlcF1tyq8OipqjlqgFJ5y1A1v+cg8uay1KHUGbiczknMTdyfy6SEd6x8woy5Md5+XKc25+3NF2JyJ4ANk7nFFMk+jIAAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="217"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/eval_4.d2fa77b.600.png" srcset="/docs/assets/ideal-img/eval_4.d2fa77b.600.png 600w,/docs/assets/ideal-img/eval_4.d65306f.1100.png 1100w,/docs/assets/ideal-img/eval_4.56bb2f0.1600.png 1600w" width="600" height="217"></noscript></div><ol start="2">
<li class="">Comparative Analysis: Select multiple evaluation runs and view the results of various LLM applications side-by-side.</li>
</ol><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAADCAYAAACqPZ51AAAACXBIWXMAABYlAAAWJQFJUiTwAAAAXElEQVR4nB3FWQrDMAxAQd//moHSkIK1xI4lGV4h8zPNfXIN5fwczKsje1BPkKeyKvgO4Ymi+e24GrGSmYtuQqxAf52I4F6TzKSpGaKKmZNRiChVmy7yXlXsvfkDwRhz8Itct1sAAAAASUVORK5CYII=&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="178"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/eval_5.fc40fa9.600.png" srcset="/docs/assets/ideal-img/eval_5.fc40fa9.600.png 600w,/docs/assets/ideal-img/eval_5.d373dae.1100.png 1100w,/docs/assets/ideal-img/eval_5.a5b8e15.1600.png 1600w" width="600" height="178"></noscript></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="adding-cost-and-token-usage-to-the-playground"><a class="" href="https://agenta.ai/docs/changelog/adding-cost-and-token-usage-to-the-playground">Adding Cost and Token Usage to the Playground</a><a href="https://agenta.ai/docs/changelog/main#adding-cost-and-token-usage-to-the-playground" class="hash-link" aria-label="Direct link to adding-cost-and-token-usage-to-the-playground" title="Direct link to adding-cost-and-token-usage-to-the-playground" translate="no">​</a></h3><p><em>12th January 2024</em></p><p><strong>v0.7.1</strong></p><div class="theme-admonition theme-admonition-caution admonition_Z3eq alert alert--warning"><div class="admonitionHeading_dNDH"><span class="admonitionIcon_YCMn"><svg viewBox="0 0 16 16"><path fill-rule="evenodd" d="M8.893 1.5c-.183-.31-.52-.5-.887-.5s-.703.19-.886.5L.138 13.499a.98.98 0 0 0 0 1.001c.193.31.53.501.886.501h13.964c.367 0 .704-.19.877-.5a1.03 1.03 0 0 0 .01-1.002L8.893 1.5zm.133 11.497H6.987v-2.003h2.039v2.003zm0-3.004H6.987V5.987h2.039v4.006z"></path></svg></span>caution</div><div class="admonitionContent_qAg3"><p>This change requires you to pull the latest version of the agenta platform if you're using the self-serve version.</p></div></div><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFCAYAAAB8ZH1oAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAk0lEQVR4nB3MSw6DIAAAUe5/sSa9QRPTTSMfLT9BVECnqft5I47jIKUEVycUeNvAkj1b2eCCj+/IxSJqrSgpUUoxTxP12G8opURrTfCW3hqinx3rHN57ckq43EhrwTlHjJElF8LaEf/a2i/GmPs4zpmUC9OkMUYzSs0cdoTPncdQ8blx9gacvFTlOayUUtj3jRgCP6TNvvbcjBy3AAAAAElFTkSuQmCC&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="323"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/screenshot_cost_and_token_usage.961ab81.600.png" srcset="/docs/assets/ideal-img/screenshot_cost_and_token_usage.961ab81.600.png 600w,/docs/assets/ideal-img/screenshot_cost_and_token_usage.8eeec59.1100.png 1100w,/docs/assets/ideal-img/screenshot_cost_and_token_usage.b5c57c3.1600.png 1600w" width="600" height="323"></noscript></div><p>We've added a feature that allows you to compare the time taken by an LLM app, its cost, and track token usage, all in one place.</p><p>----#</p><h3 class="anchor anchorTargetStickyNavbar_vHny" id="changes-to-the-sdk"><a class="" href="https://agenta.ai/docs/changelog/changes-to-the-sdk">Changes to the SDK</a><a href="https://agenta.ai/docs/changelog/main#changes-to-the-sdk" class="hash-link" aria-label="Direct link to changes-to-the-sdk" title="Direct link to changes-to-the-sdk" translate="no">​</a></h3><p>This necessitated modifications to the SDK. Now, the LLM application API returns a JSON instead of a string. The JSON includes the output message, usage details, and cost:</p><div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"message": string,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"usage": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"prompt_tokens": int,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"completion_tokens": int,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"total_tokens": int</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">},</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"cost": float</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="improving-side-by-side-comparison-in-the-playground"><a class="" href="https://agenta.ai/docs/changelog/improving-side-by-side-comparison-in-the-playground">Improving Side-by-side Comparison in the Playground</a><a href="https://agenta.ai/docs/changelog/main#improving-side-by-side-comparison-in-the-playground" class="hash-link" aria-label="Direct link to improving-side-by-side-comparison-in-the-playground" title="Direct link to improving-side-by-side-comparison-in-the-playground" translate="no">​</a></h3><p><em>19th December 2023</em></p><p><strong>v0.6.6</strong></p><ul>
<li class="">Enhanced the side-by-side comparison in the playground for better user experience</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="resolved-batch-logic-issue-in-evaluation"><a class="" href="https://agenta.ai/docs/changelog/resolved-batch-logic-issue-in-evaluation">Resolved Batch Logic Issue in Evaluation</a><a href="https://agenta.ai/docs/changelog/main#resolved-batch-logic-issue-in-evaluation" class="hash-link" aria-label="Direct link to resolved-batch-logic-issue-in-evaluation" title="Direct link to resolved-batch-logic-issue-in-evaluation" translate="no">​</a></h3><p><em>18th December 2023</em></p><p><strong>v0.6.5</strong></p><ul>
<li class="">Resolved an issue with batch logic in evaluation (users can now run extensive evaluations)</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="comprehensive-updates-and-bug-fixes"><a class="" href="https://agenta.ai/docs/changelog/comprehensive-updates-and-bug-fixes">Comprehensive Updates and Bug Fixes</a><a href="https://agenta.ai/docs/changelog/main#comprehensive-updates-and-bug-fixes" class="hash-link" aria-label="Direct link to comprehensive-updates-and-bug-fixes" title="Direct link to comprehensive-updates-and-bug-fixes" translate="no">​</a></h3><p><em>12th December 2023</em></p><p><strong>v0.6.4</strong></p><ul>
<li class="">Incorporated all chat turns to the chat set</li>
<li class="">Rectified self-hosting documentation</li>
<li class="">Introduced asynchronous support for applications</li>
<li class="">Added 'register_default' alias</li>
<li class="">Fixed a bug in the side-by-side feature</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="integrated-file-input-and-ui-enhancements"><a class="" href="https://agenta.ai/docs/changelog/integrated-file-input-and-ui-enhancements">Integrated File Input and UI Enhancements</a><a href="https://agenta.ai/docs/changelog/main#integrated-file-input-and-ui-enhancements" class="hash-link" aria-label="Direct link to integrated-file-input-and-ui-enhancements" title="Direct link to integrated-file-input-and-ui-enhancements" translate="no">​</a></h3><p><em>12th December 2023</em></p><p><strong>v0.6.3</strong></p><ul>
<li class="">Integrated file input feature in the SDK</li>
<li class="">Provided an example that includes images</li>
<li class="">Upgraded the human evaluation view to present larger inputs</li>
<li class="">Fixed issues related to data overwriting in the cloud</li>
<li class="">Implemented UI enhancements to the side bar</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="minor-adjustments-for-better-performance"><a class="" href="https://agenta.ai/docs/changelog/minor-adjustments-for-better-performance">Minor Adjustments for Better Performance</a><a href="https://agenta.ai/docs/changelog/main#minor-adjustments-for-better-performance" class="hash-link" aria-label="Direct link to minor-adjustments-for-better-performance" title="Direct link to minor-adjustments-for-better-performance" translate="no">​</a></h3><p><em>7th December 2023</em></p><p><strong>v0.6.2</strong></p><ul>
<li class="">Made minor adjustments</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="bug-fix-for-application-saving"><a class="" href="https://agenta.ai/docs/changelog/bug-fix-for-application-saving">Bug Fix for Application Saving</a><a href="https://agenta.ai/docs/changelog/main#bug-fix-for-application-saving" class="hash-link" aria-label="Direct link to bug-fix-for-application-saving" title="Direct link to bug-fix-for-application-saving" translate="no">​</a></h3><p><em>7th December 2023</em></p><p><strong>v0.6.1</strong></p><ul>
<li class="">Resolved a bug related to saving the application</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="introduction-of-chat-based-applications"><a class="" href="https://agenta.ai/docs/changelog/introduction-of-chat-based-applications">Introduction of Chat-based Applications</a><a href="https://agenta.ai/docs/changelog/main#introduction-of-chat-based-applications" class="hash-link" aria-label="Direct link to introduction-of-chat-based-applications" title="Direct link to introduction-of-chat-based-applications" translate="no">​</a></h3><p><em>1st December 2023</em></p><p><strong>v0.6.0</strong></p><ul>
<li class="">Introduced chat-based applications</li>
<li class="">Fixed a bug in 'export csv' feature in auto evaluation</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="multiple-ui-and-csv-reader-fixes"><a class="" href="https://agenta.ai/docs/changelog/multiple-ui-and-csv-reader-fixes">Multiple UI and CSV Reader Fixes</a><a href="https://agenta.ai/docs/changelog/main#multiple-ui-and-csv-reader-fixes" class="hash-link" aria-label="Direct link to multiple-ui-and-csv-reader-fixes" title="Direct link to multiple-ui-and-csv-reader-fixes" translate="no">​</a></h3><p><em>1st December 2023</em></p><p><strong>v0.5.8</strong></p><ul>
<li class="">Fixed a bug impacting the csv reader</li>
<li class="">Addressed an issue of variant overwriting</li>
<li class="">Made tabs draggable for better UI navigation</li>
<li class="">Implemented support for multiple LLM keys in the UI</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="enhanced-self-hosting-and-mistral-model-tutorial"><a class="" href="https://agenta.ai/docs/changelog/enhanced-self-hosting-and-mistral-model-tutorial">Enhanced Self-hosting and Mistral Model Tutorial</a><a href="https://agenta.ai/docs/changelog/main#enhanced-self-hosting-and-mistral-model-tutorial" class="hash-link" aria-label="Direct link to enhanced-self-hosting-and-mistral-model-tutorial" title="Direct link to enhanced-self-hosting-and-mistral-model-tutorial" translate="no">​</a></h3><p><em>17th November 2023</em></p><p><strong>v0.5.7</strong></p><ul>
<li class="">Enhanced and simplified self-hosting feature</li>
<li class="">Added a tutorial for the Mistral model</li>
<li class="">Resolved a race condition issue in deployment</li>
<li class="">Fixed an issue with saving in the playground</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="sentry-integration-and-user-communication-improvements"><a class="" href="https://agenta.ai/docs/changelog/sentry-integration-and-user-communication-improvements">Sentry Integration and User Communication Improvements</a><a href="https://agenta.ai/docs/changelog/main#sentry-integration-and-user-communication-improvements" class="hash-link" aria-label="Direct link to sentry-integration-and-user-communication-improvements" title="Direct link to sentry-integration-and-user-communication-improvements" translate="no">​</a></h3><p><em>12th November 2023</em></p><p><strong>v0.5.6</strong></p><ul>
<li class="">Enhanced bug tracking with Sentry integration in the cloud</li>
<li class="">Integrated Intercom for better user communication in the cloud</li>
<li class="">Upgraded to the latest version of OpenAI</li>
<li class="">Cleaned up files post serving in CLI</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="cypress-tests-and-ui-improvements"><a class="" href="https://agenta.ai/docs/changelog/cypress-tests-and-ui-improvements">Cypress Tests and UI Improvements</a><a href="https://agenta.ai/docs/changelog/main#cypress-tests-and-ui-improvements" class="hash-link" aria-label="Direct link to cypress-tests-and-ui-improvements" title="Direct link to cypress-tests-and-ui-improvements" translate="no">​</a></h3><p><em>2nd November 2023</em></p><p><strong>v0.5.5</strong></p><ul>
<li class="">Conducted extensive Cypress tests for improved application stability</li>
<li class="">Added a collapsible sidebar for better navigation</li>
<li class="">Improved error handling mechanisms</li>
<li class="">Added documentation for the evaluation feature</li>
</ul><hr><h3 class="anchor anchorTargetStickyNavbar_vHny" id="launch-of-sdk-version-2-and-cloud-hosted-version"><a class="" href="https://agenta.ai/docs/changelog/launch-of-sdk-version-2-and-cloud-hosted-version">Launch of SDK Version 2 and Cloud-hosted Version</a><a href="https://agenta.ai/docs/changelog/main#launch-of-sdk-version-2-and-cloud-hosted-version" class="hash-link" aria-label="Direct link to launch-of-sdk-version-2-and-cloud-hosted-version" title="Direct link to launch-of-sdk-version-2-and-cloud-hosted-version" translate="no">​</a></h3><p><em>23rd October 2023</em></p><p><strong>v0.5.0</strong></p><ul>
<li class="">Launched SDK version 2</li>
<li class="">Launched the cloud-hosted version</li>
<li class="">Completed a comprehensive refactoring of the application</li>
</ul></section>
<pre tabindex="0" class="codeBlockStandalone_ol1s thin-scrollbar codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><code class="codeBlockLines_W7pG"></code></pre>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Webhooks and GitHub Automations for Prompt Deployments]]></title>
            <link>https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations</link>
            <guid>https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations</guid>
            <pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Trigger webhooks and GitHub Actions when you deploy a prompt. Use repository dispatch, workflow dispatch, or a custom HTTPS endpoint.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-problem">The Problem<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#the-problem" class="hash-link" aria-label="Direct link to The Problem" title="Direct link to The Problem" translate="no">​</a></h2>
<p>Deploying a prompt often needs follow-up work outside Agenta. You may want to sync prompt files into a repository, trigger CI, open a pull request, or notify an internal platform. Before this change, that required custom glue code.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-solution">The Solution<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#the-solution" class="hash-link" aria-label="Direct link to The Solution" title="Direct link to The Solution" translate="no">​</a></h2>
<p>You can now trigger webhooks and GitHub automations directly from Agenta when a deployment event happens. Point Agenta at your own HTTPS endpoint, or send the event straight to GitHub with <code>repository_dispatch</code> or <code>workflow_dispatch</code>.</p>
<p>This gives you a simple way to connect prompt deployments to the rest of your delivery flow. You can keep prompt changes, infrastructure checks, and repository updates in sync without building a separate integration service.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-you-can-do">What You Can Do<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#what-you-can-do" class="hash-link" aria-label="Direct link to What You Can Do" title="Direct link to What You Can Do" translate="no">​</a></h2>
<ul>
<li class="">Send deployment events to any HTTPS endpoint</li>
<li class="">Verify deliveries with HMAC signatures or use a bearer token</li>
<li class="">Trigger GitHub <code>repository_dispatch</code> with a structured event payload</li>
<li class="">Trigger GitHub <code>workflow_dispatch</code> for one known workflow on a branch</li>
<li class="">Fetch the latest prompt in GitHub Actions and open a pull request automatically</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-it-works">How It Works<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#how-it-works" class="hash-link" aria-label="Direct link to How It Works" title="Direct link to How It Works" translate="no">​</a></h2>
<p>Create an automation in your project settings and subscribe it to deployment events. When a prompt revision is committed to an environment, Agenta sends an HTTP <code>POST</code> to your target.</p>
<p>For generic webhooks, Agenta sends the event payload plus delivery headers such as <code>X-Agenta-Event-Type</code>, <code>X-Agenta-Delivery-Id</code>, and <code>Idempotency-Key</code>. In signature mode, Agenta also signs the raw request body with HMAC-SHA256.</p>
<p>For GitHub automations, Agenta calls the GitHub API directly. <code>repository_dispatch</code> sends a richer JSON payload that includes event metadata and references. <code>workflow_dispatch</code> sends a smaller set of string inputs for a specific workflow file and branch.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="typical-use-cases">Typical Use Cases<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#typical-use-cases" class="hash-link" aria-label="Direct link to Typical Use Cases" title="Direct link to Typical Use Cases" translate="no">​</a></h2>
<ul>
<li class="">Sync the latest deployed prompt into a repository</li>
<li class="">Open a pull request for prompt changes after each deployment</li>
<li class="">Trigger validation or approval workflows in GitHub Actions</li>
<li class="">Notify internal tools when a production prompt changes</li>
<li class="">Mirror deployment metadata into another system for auditing</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/deployment-webhooks-and-github-automations#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Start with these guides:</p>
<ul>
<li class=""><a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/webhooks">Webhooks</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/github">GitHub</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/prompt-engineering/integrating-prompts/fetch-prompt-programatically">Fetch Prompts via SDK/API</a></li>
</ul>]]></content:encoded>
            <category>v0.94.0</category>
        </item>
        <item>
            <title><![CDATA[Tool Integrations in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/tool-integrations</link>
            <guid>https://agenta.ai/docs/changelog/tool-integrations</guid>
            <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Connect 150+ tools to your prompts directly from the playground. Gmail, Slack, Notion, Google Sheets, GitHub, and more.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-problem">The Problem<a href="https://agenta.ai/docs/changelog/tool-integrations#the-problem" class="hash-link" aria-label="Direct link to The Problem" title="Direct link to The Problem" translate="no">​</a></h2>
<p>Building agentic applications that interact with external services means writing integration code: OAuth flows, API wrappers, error handling. You do all of that before you can even test whether your prompt works with the tool.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-solution">The Solution<a href="https://agenta.ai/docs/changelog/tool-integrations#the-solution" class="hash-link" aria-label="Direct link to The Solution" title="Direct link to The Solution" translate="no">​</a></h2>
<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/nEbwJhdTQds" title="Tool Integrations in the Playground" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<p>You can now connect external tools to your prompts and call them from the playground. Browse a catalog of 150+ integrations, authenticate with OAuth or an API key, and attach tool actions to your prompt config. When the LLM generates a tool call, you execute it with one click and send the result back to the chat.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-you-can-do">What You Can Do<a href="https://agenta.ai/docs/changelog/tool-integrations#what-you-can-do" class="hash-link" aria-label="Direct link to What You Can Do" title="Direct link to What You Can Do" translate="no">​</a></h2>
<ul>
<li class=""><strong>Use Google Sheets or Notion as data sources.</strong> Connect a spreadsheet or Notion database and let the LLM query it. Build RAG applications directly from the playground without writing any code.</li>
<li class=""><strong>Send emails and messages.</strong> Attach Gmail or Slack and your prompt can draft and send emails, post to channels, or create threads.</li>
<li class=""><strong>Automate developer workflows.</strong> Connect GitHub to create issues, Jira to update tickets, or any of the 150+ available integrations.</li>
<li class=""><strong>Manage connections in Settings.</strong> A dedicated Tools page shows all your connected integrations, their status, and lets you add or revoke connections.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-it-works">How It Works<a href="https://agenta.ai/docs/changelog/tool-integrations#how-it-works" class="hash-link" aria-label="Direct link to How It Works" title="Direct link to How It Works" translate="no">​</a></h2>
<ol>
<li class="">Go to <strong>Settings &gt; Tools</strong> and connect an integration (e.g., Gmail). You authenticate via OAuth or paste an API key.</li>
<li class="">In the Playground, open the tools panel in your prompt config. Your connected integrations appear there.</li>
<li class="">Select the actions you want available to the LLM (e.g., "Send Email", "List Emails").</li>
<li class="">Run your prompt. When the LLM generates a tool call, click "Execute" to run it and return the result to the conversation.</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/tool-integrations#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Tool integrations are available on Agenta Cloud. Connect your first integration in <strong>Settings &gt; Tools</strong>.</p>]]></content:encoded>
            <category>v0.87.0</category>
        </item>
        <item>
            <title><![CDATA[AI-Powered Prompt Refinement in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/refine-ai</link>
            <guid>https://agenta.ai/docs/changelog/refine-ai</guid>
            <pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Refine your prompts with AI directly in the playground. Describe what you want to improve and get a refined version with an explanation of the changes.]]></description>
            <content:encoded><![CDATA[
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-problem">The Problem<a href="https://agenta.ai/docs/changelog/refine-ai#the-problem" class="hash-link" aria-label="Direct link to The Problem" title="Direct link to The Problem" translate="no">​</a></h2>
<p>Prompt engineering is iterative. You write a prompt, test it, notice it's too vague or missing constraints, and rewrite it. Each cycle requires you to figure out <em>how</em> to improve it, not just <em>what</em> to improve. That's the hard part.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-solution">The Solution<a href="https://agenta.ai/docs/changelog/refine-ai#the-solution" class="hash-link" aria-label="Direct link to The Solution" title="Direct link to The Solution" translate="no">​</a></h2>
<p>You can now refine prompts with AI directly in the playground. Click the wand icon on any prompt section, describe what you want to change in plain English, and get back a refined version with an explanation of what changed and why.</p>
<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/V2hHC8hZEeE" title="AI-Powered Prompt Refinement" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-it-works">How It Works<a href="https://agenta.ai/docs/changelog/refine-ai#how-it-works" class="hash-link" aria-label="Direct link to How It Works" title="Direct link to How It Works" translate="no">​</a></h2>
<ol>
<li class="">Open a prompt in the playground and click the <strong>wand icon</strong> in the prompt section header.</li>
<li class="">A two-panel modal opens. On the left, type what you want to improve (e.g., "add output format instructions" or "make it more concise"). On the right, see the refined prompt.</li>
<li class="">Each refinement builds on the previous result, so you can iterate. Ask for one change, review it, then ask for another.</li>
<li class="">Toggle the <strong>Diff view</strong> to see exactly what changed compared to the original.</li>
<li class="">Edit the refined prompt directly if you want to adjust anything before applying.</li>
<li class="">Click <strong>Use refined prompt</strong> to apply the changes to your playground session.</li>
</ol>
<p>You can also use the built-in quick action "Optimize the prompt using best practices" for a one-click improvement.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-you-can-do">What You Can Do<a href="https://agenta.ai/docs/changelog/refine-ai#what-you-can-do" class="hash-link" aria-label="Direct link to What You Can Do" title="Direct link to What You Can Do" translate="no">​</a></h2>
<ul>
<li class=""><strong>Describe improvements in plain English</strong>: "Add a constraint that the output must be valid JSON" or "Make the tone more professional."</li>
<li class=""><strong>Iterate</strong>: Each refinement uses the latest version, so you can make incremental changes across multiple rounds.</li>
<li class=""><strong>Review diffs</strong>: Toggle the diff view to see a side-by-side comparison of the original and refined prompts.</li>
<li class=""><strong>Edit before applying</strong>: The refined prompt is fully editable. Adjust anything before committing.</li>
<li class=""><strong>Quick optimize</strong>: Use the built-in "Optimize the prompt using best practices" shortcut for an instant improvement.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/refine-ai#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Open any prompt in the playground. Look for the wand icon in the prompt section header. If you don't see it, the feature may not be enabled for your organization yet.</p>]]></content:encoded>
            <category>v0.84.0</category>
        </item>
        <item>
            <title><![CDATA[Enterprise Compliance Features]]></title>
            <link>https://agenta.ai/docs/changelog/enterprise-compliance-features</link>
            <guid>https://agenta.ai/docs/changelog/enterprise-compliance-features</guid>
            <pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Multi-organization support, SSO with any OIDC provider, domain verification, and a new US region.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>Agenta has new enterprise features. Multi-organization support, SSO, domain verification, and a US region.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="multi-organization-support">Multi-Organization Support<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#multi-organization-support" class="hash-link" aria-label="Direct link to Multi-Organization Support" title="Direct link to Multi-Organization Support" translate="no">​</a></h2>
<p>You can now create separate organizations for different teams or clients. Each organization has its own billing, projects, and roles. One account can belong to multiple organizations, and you switch between them without signing out.</p>
<p>Inside each organization, you create workspaces and projects. Workspaces group related projects together. Projects scope your prompts, evaluations, and traces.</p>
<p>Roles are per-organization. You can be an owner in one org and a viewer in another. Available roles include owner, workspace admin, editor, viewer, evaluator, and deployment manager.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="sso">SSO<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#sso" class="hash-link" aria-label="Direct link to SSO" title="Direct link to SSO" translate="no">​</a></h2>
<p>Connect your identity provider to Agenta. We support any OIDC-compliant provider: Okta, Azure AD, Auth0, OneLogin, Google Workspace, and others.</p>
<p>You configure SSO per-organization through the API. Add your provider's issuer URL, client ID, and client secret. Agenta discovers the rest from the OIDC discovery endpoint. You can test the connection before enabling it.</p>
<p>If your security policy requires it, you can enforce SSO-only for an organization and disable password login entirely.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="domain-verification">Domain Verification<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#domain-verification" class="hash-link" aria-label="Direct link to Domain Verification" title="Direct link to Domain Verification" translate="no">​</a></h2>
<p>Verify your company domain via a DNS TXT record. Once verified, anyone who signs up with a matching email address joins your organization automatically. No need to send invitations one by one.</p>
<p>Domain verification also works with SSO enforcement. Users with a verified domain email are automatically routed to your SSO provider during login.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="us-region">US Region<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#us-region" class="hash-link" aria-label="Direct link to US Region" title="Direct link to US Region" translate="no">​</a></h2>
<p>Agenta Cloud now has a US-based region. If your data needs to stay in the United States, you can run your projects there. The US region has the same features as the EU region.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="availability">Availability<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#availability" class="hash-link" aria-label="Direct link to Availability" title="Direct link to Availability" translate="no">​</a></h2>
<p>SSO, and domain verification are available on Business and Enterprise plans. The US region is available on all plans.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/enterprise-compliance-features#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<ul>
<li class=""><a class="" href="https://agenta.ai/docs/administration/access-control/organizations">Organizations documentation</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/administration/access-control/sso">SSO documentation</a></li>
</ul>]]></content:encoded>
            <category>v0.83.0</category>
        </item>
        <item>
            <title><![CDATA[Folders for Prompt Organization]]></title>
            <link>https://agenta.ai/docs/changelog/prompt-folders</link>
            <guid>https://agenta.ai/docs/changelog/prompt-folders</guid>
            <pubDate>Wed, 04 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Organize prompts with folders and subfolders. Create, move, and search prompts within a familiar file-system structure.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-problem">The Problem<a href="https://agenta.ai/docs/changelog/prompt-folders#the-problem" class="hash-link" aria-label="Direct link to The Problem" title="Direct link to The Problem" translate="no">​</a></h2>
<p>When you're building agents or managing multiple use cases, prompts multiply fast. Finding the right one turns into scrolling through a flat list.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="the-solution">The Solution<a href="https://agenta.ai/docs/changelog/prompt-folders#the-solution" class="hash-link" aria-label="Direct link to The Solution" title="Direct link to The Solution" translate="no">​</a></h2>
<p>You can now create folders and subfolders to organize your prompts. It works like any file system you've used before.</p>
<p>Create a folder. Drag prompts into it. Create subfolders. Move things around as your structure evolves.</p>
<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/2oy6ymnOq7I" title="Folders for Prompt Organization" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-you-can-do">What You Can Do<a href="https://agenta.ai/docs/changelog/prompt-folders#what-you-can-do" class="hash-link" aria-label="Direct link to What You Can Do" title="Direct link to What You Can Do" translate="no">​</a></h2>
<ul>
<li class=""><strong>Create folders</strong> and nest them as deep as you need</li>
<li class=""><strong>Move prompts and folders</strong> via drag-and-drop or the actions menu</li>
<li class=""><strong>Search</strong> across all folders from the root level</li>
<li class=""><strong>Share folder URLs</strong> with teammates (folder location persists in the URL)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/prompt-folders#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Go to the Prompts page. Click "Create new" and select "New folder". Name it, and start organizing.</p>]]></content:encoded>
            <category>v0.82.0</category>
        </item>
        <item>
            <title><![CDATA[Onboarding Widget and Guided Walkthroughs]]></title>
            <link>https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs</link>
            <guid>https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs</guid>
            <pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[New onboarding widget with guided walkthroughs to help you get started with Agenta's key features.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>Getting started with a new platform can be overwhelming. You now have an onboarding widget that guides you through Agenta's key features step by step. The widget appears in the sidebar and tracks your progress as you explore the platform.</p>
<p>Each walkthrough highlights the relevant UI elements and explains what they do. You learn by doing, not by reading documentation.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="key-capabilities">Key Capabilities<a href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs#key-capabilities" class="hash-link" aria-label="Direct link to Key Capabilities" title="Direct link to Key Capabilities" translate="no">​</a></h2>
<ul>
<li class=""><strong>Progress Tracking</strong>: The widget shows which features you've explored and what's left to discover</li>
<li class=""><strong>Interactive Walkthroughs</strong>: Step-by-step tours that highlight UI elements as you go</li>
<li class=""><strong>Contextual Guidance</strong>: Each walkthrough focuses on a specific workflow, from creating prompts to running evaluations</li>
<li class=""><strong>Dismissible</strong>: Once you're comfortable, dismiss the widget and it stays out of your way</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="available-walkthroughs">Available Walkthroughs<a href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs#available-walkthroughs" class="hash-link" aria-label="Direct link to Available Walkthroughs" title="Direct link to Available Walkthroughs" translate="no">​</a></h2>
<p>The onboarding widget includes guided tours for:</p>
<ul>
<li class=""><strong>Playground Basics</strong>: Create and test prompts, compare variants, and iterate on your configurations</li>
<li class=""><strong>Running Evaluations</strong>: Set up test sets, configure evaluators, and analyze results</li>
<li class=""><strong>Observability</strong>: Trace your LLM calls, add annotations, and debug production issues</li>
<li class=""><strong>Deployment</strong>: Deploy prompts to environments and fetch configurations in your code</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-it-works">How It Works<a href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs#how-it-works" class="hash-link" aria-label="Direct link to How It Works" title="Direct link to How It Works" translate="no">​</a></h2>
<p>The onboarding widget appears automatically for new users. Each walkthrough takes you through the actual UI, highlighting buttons, menus, and features as you go. Complete a walkthrough to mark it done, or skip ahead if you already know the basics.</p>
<p>You can reopen the widget anytime from the sidebar to revisit walkthroughs or check your progress.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/onboarding-widget-walkthroughs#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>The widget appears automatically when you sign up or create a new project. If you've dismissed it and want it back, look for the help icon in the sidebar.</p>
<p>For detailed documentation on each feature, check out:</p>
<ul>
<li class=""><a class="" href="https://agenta.ai/docs/prompt-engineering/quick-start">Playground Quick Start</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/evaluation/evaluation-from-ui/quick-start">Evaluation Quick Start</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/observability/quickstart-python">Observability Quick Start</a></li>
</ul>]]></content:encoded>
            <category>v0.81.1</category>
        </item>
        <item>
            <title><![CDATA[Navigation Links from Traces to App/Environment/Variant]]></title>
            <link>https://agenta.ai/docs/changelog/trace-navigation-links</link>
            <guid>https://agenta.ai/docs/changelog/trace-navigation-links</guid>
            <pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Clickable links in observability traces to navigate directly to the application, variant, and environment that generated each trace.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/trace-navigation-links#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>You can now click directly from any trace to the application, variant, version, or environment that generated it. This makes debugging much faster. Instead of manually searching for which configuration produced a specific output, you can jump there in one click.</p>
<p>This is especially useful when you're investigating production issues. You see a problematic trace, click through to the exact prompt version, and start iterating immediately.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="key-capabilities">Key Capabilities<a href="https://agenta.ai/docs/changelog/trace-navigation-links#key-capabilities" class="hash-link" aria-label="Direct link to Key Capabilities" title="Direct link to Key Capabilities" translate="no">​</a></h2>
<ul>
<li class=""><strong>Clickable Application Links</strong>: Jump from a trace to the application that generated it</li>
<li class=""><strong>Variant Navigation</strong>: Go directly to the specific variant and version used</li>
<li class=""><strong>Environment Context</strong>: See and navigate to the environment (production, staging, development) where the trace originated</li>
<li class=""><strong>Drawer Integration</strong>: Links appear in both the trace table and the detailed drawer view</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-to-add-references-to-your-traces">How to Add References to Your Traces<a href="https://agenta.ai/docs/changelog/trace-navigation-links#how-to-add-references-to-your-traces" class="hash-link" aria-label="Direct link to How to Add References to Your Traces" title="Direct link to How to Add References to Your Traces" translate="no">​</a></h2>
<p>For the navigation links to appear, you need to store references in your traces. Here's how to do it with the Python SDK and OpenTelemetry.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="using-the-python-sdk">Using the Python SDK<a href="https://agenta.ai/docs/changelog/trace-navigation-links#using-the-python-sdk" class="hash-link" aria-label="Direct link to Using the Python SDK" title="Direct link to Using the Python SDK" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Store references to link traces to your configuration</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tracing</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">store_refs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"application.slug"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"my-chatbot"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"variant.slug"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpt-4-optimized"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"variant.version"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"3"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"environment.slug"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"production"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Your LLM calls are now linked to this configuration</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello!"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="using-opentelemetry">Using OpenTelemetry<a href="https://agenta.ai/docs/changelog/trace-navigation-links#using-opentelemetry" class="hash-link" aria-label="Direct link to Using OpenTelemetry" title="Direct link to Using OpenTelemetry" translate="no">​</a></h3>
<p>If you're using OpenTelemetry for instrumentation, add references as span attributes:</p>
<div class="language-javascript codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-javascript codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports punctuation" style="color:#393A34">{</span><span class="token imports"> trace </span><span class="token imports punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'@opentelemetry/api'</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> tracer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">getTracer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'my-app'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> span </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> tracer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">startSpan</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'chat-interaction'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// Add references to link the trace</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setAttribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ag.refs.application.slug'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'my-chatbot'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setAttribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ag.refs.variant.slug'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'gpt-4-optimized'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setAttribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ag.refs.variant.version'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'3'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setAttribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ag.refs.environment.slug'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'production'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// Your code here</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">end</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="available-reference-keys">Available Reference Keys<a href="https://agenta.ai/docs/changelog/trace-navigation-links#available-reference-keys" class="hash-link" aria-label="Direct link to Available Reference Keys" title="Direct link to Available Reference Keys" translate="no">​</a></h2>
<p>You can use any combination of these reference keys:</p>
<table><thead><tr><th>Category</th><th>Keys</th></tr></thead><tbody><tr><td>Application</td><td><code>application.slug</code>, <code>application.id</code></td></tr><tr><td>Variant</td><td><code>variant.slug</code>, <code>variant.id</code>, <code>variant.version</code></td></tr><tr><td>Environment</td><td><code>environment.slug</code>, <code>environment.id</code>, <code>environment.version</code></td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="use-cases">Use Cases<a href="https://agenta.ai/docs/changelog/trace-navigation-links#use-cases" class="hash-link" aria-label="Direct link to Use Cases" title="Direct link to Use Cases" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="debug-production-issues">Debug Production Issues<a href="https://agenta.ai/docs/changelog/trace-navigation-links#debug-production-issues" class="hash-link" aria-label="Direct link to Debug Production Issues" title="Direct link to Debug Production Issues" translate="no">​</a></h3>
<p>When a user reports a problem, find the trace and click through to see exactly which prompt version was used. No more guessing or searching through commit history.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="compare-configurations">Compare Configurations<a href="https://agenta.ai/docs/changelog/trace-navigation-links#compare-configurations" class="hash-link" aria-label="Direct link to Compare Configurations" title="Direct link to Compare Configurations" translate="no">​</a></h3>
<p>Filter traces by variant, then click through to compare how different configurations perform. Jump between variants to understand what changed.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="track-deployments">Track Deployments<a href="https://agenta.ai/docs/changelog/trace-navigation-links#track-deployments" class="hash-link" aria-label="Direct link to Track Deployments" title="Direct link to Track Deployments" translate="no">​</a></h3>
<p>See which environment generated each trace. Verify that production is running the expected version by clicking through from any trace.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/trace-navigation-links#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Learn more about storing references in our documentation:</p>
<ul>
<li class=""><a class="" href="https://agenta.ai/docs/observability/trace-with-python-sdk/reference-prompt-versions">Reference Prompt Versions (Python SDK)</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/observability/trace-with-opentelemetry/semantic-conventions">Semantic Conventions (OpenTelemetry)</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/observability/overview">Observability Overview</a></li>
</ul>]]></content:encoded>
            <category>v0.81.0</category>
        </item>
        <item>
            <title><![CDATA[Test Set Versioning and New Test Set UI]]></title>
            <link>https://agenta.ai/docs/changelog/testset-versioning</link>
            <guid>https://agenta.ai/docs/changelog/testset-versioning</guid>
            <pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Track test set changes with versioning and link evaluations to specific versions. Plus a completely rebuilt test set UI that scales to hundreds of thousands of rows.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/testset-versioning#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>When you compare evaluation results from last week to today, how do you know the test data didn't change? You don't. Until now.</p>
<p>Test set versioning tracks every change to your test sets. Each edit, upload, or programmatic update creates a new version. Evaluations link to specific versions, so you can trust your comparisons.</p>
<p>We also rebuilt the test set UI from scratch. It handles hundreds of thousands of rows without slowing down. Editing is faster, especially for chat messages and complex JSON data.</p>
<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/hh1OHhzak6Q" title="Test Set Versioning Demo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="test-set-versioning">Test Set Versioning<a href="https://agenta.ai/docs/changelog/testset-versioning#test-set-versioning" class="hash-link" aria-label="Direct link to Test Set Versioning" title="Direct link to Test Set Versioning" translate="no">​</a></h2>
<p>Every change to a test set creates a new version. You can see the version history, compare versions, and revert to previous versions.</p>
<p><strong>What gets versioned:</strong></p>
<ul>
<li class="">Adding, editing, or deleting test cases</li>
<li class="">Uploading new data (CSV, JSON)</li>
<li class="">Programmatic updates via SDK or API</li>
<li class="">Column changes</li>
</ul>
<p><strong>Evaluation linking:</strong>
When you run an evaluation, it links to the specific test set version used. This means:</p>
<ul>
<li class="">You can compare evaluations knowing they used the same test data</li>
<li class="">If someone updates the test set, your historical evaluations still reference the original version</li>
<li class="">You can filter evaluations by test set version</li>
</ul>
<p><strong>Programmatic versioning:</strong>
Upload test sets via the SDK or API. The system detects changes and creates new versions automatically.</p>
<div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Upload a test set - creates a new version if content changed</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">testset </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">testsets</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">upload</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"my-test-set"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">test_cases</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Your test case data</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># The testset object includes version information</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Version: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">testset</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">version</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="new-test-set-ui">New Test Set UI<a href="https://agenta.ai/docs/changelog/testset-versioning#new-test-set-ui" class="hash-link" aria-label="Direct link to New Test Set UI" title="Direct link to New Test Set UI" translate="no">​</a></h2>
<p>The test set view is completely rebuilt. It uses virtualized rendering, so it stays fast with large datasets.</p>
<p><strong>What's new:</strong></p>
<ul>
<li class=""><strong>Scale</strong>: Handle 100,000+ rows without performance issues</li>
<li class=""><strong>JSON support</strong>: View and edit complex JSON directly. Toggle between raw JSON and formatted views</li>
<li class=""><strong>String or JSON columns</strong>: Choose how each column stores data. Use JSON for structured data like chat messages</li>
</ul>
<p><strong>Chat message editing:</strong>
Test cases with chat messages (like <code>[{"role": "user", "content": "..."}]</code>) now have a dedicated editor. Add, remove, or reorder messages. Edit content with proper formatting.</p>
<p><strong>Upload options:</strong></p>
<ul>
<li class="">Upload CSV or JSON files</li>
<li class="">Create test sets in the UI</li>
<li class="">Create programmatically via SDK</li>
<li class="">Add spans from observability to test sets</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="traceability">Traceability<a href="https://agenta.ai/docs/changelog/testset-versioning#traceability" class="hash-link" aria-label="Direct link to Traceability" title="Direct link to Traceability" translate="no">​</a></h2>
<p>Everything connects. When you view a trace in observability:</p>
<ul>
<li class="">See which test case it came from</li>
<li class="">See which test set version</li>
<li class="">Filter traces by test case or test set</li>
</ul>
<p>When you view an evaluation:</p>
<ul>
<li class="">See the exact test set version used</li>
<li class="">Compare only evaluations that used the same version</li>
<li class="">Navigate to the test set to see the data</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/testset-versioning#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Test set versioning is automatic. Any change creates a new version.</p>
<p>To use versioned test sets in evaluations:</p>
<ol>
<li class="">Create or upload a test set</li>
<li class="">Make your edits (each save creates a version)</li>
<li class="">Run an evaluation (it links to the current version)</li>
<li class="">Later, compare evaluations knowing they used the same test data</li>
</ol>
<p>For programmatic access, check the <a class="" href="https://agenta.ai/docs/evaluation/evaluation-from-sdk/managing-testsets">test sets documentation</a>.</p>]]></content:encoded>
            <category>v0.74.0</category>
        </item>
        <item>
            <title><![CDATA[Playground UX Improvements]]></title>
            <link>https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026</link>
            <guid>https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026</guid>
            <pubDate>Tue, 13 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[See provider costs upfront, run evaluations directly from the Playground, and collapse test cases for easier navigation.]]></description>
            <content:encoded><![CDATA[<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/JqVj-gsnSgk" title="Playground UX Improvements" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>This release brings three quality-of-life improvements to the Playground that make testing and iterating on prompts faster and more convenient.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="whats-new">What's New<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#whats-new" class="hash-link" aria-label="Direct link to What's New" title="Direct link to What's New" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="provider-cost-display">Provider Cost Display<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#provider-cost-display" class="hash-link" aria-label="Direct link to Provider Cost Display" title="Direct link to Provider Cost Display" translate="no">​</a></h3>
<p>You can now see the cost per million tokens directly in the provider selection dropdown. This helps you make informed decisions about which model to use based on both capability and cost. No more switching to external pricing pages to compare costs.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="run-evaluations-from-the-playground">Run Evaluations from the Playground<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#run-evaluations-from-the-playground" class="hash-link" aria-label="Direct link to Run Evaluations from the Playground" title="Direct link to Run Evaluations from the Playground" translate="no">​</a></h3>
<p>You can now trigger evaluations directly from the Playground without navigating to the evaluation menu. When you're testing a prompt and want to run a full evaluation, click the evaluate button to start an evaluation run with your current configuration. This keeps you in flow when iterating on prompts.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="collapsible-test-cases">Collapsible Test Cases<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#collapsible-test-cases" class="hash-link" aria-label="Direct link to Collapsible Test Cases" title="Direct link to Collapsible Test Cases" translate="no">​</a></h3>
<p>Test cases in the Playground can now be collapsed. This is especially useful when working with large test sets or test cases with long inputs and outputs. Collapse completed test cases to focus on what you're working on. You can still see a preview of each test case to maintain context while navigating through your data.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/playground-ux-improvements-jan-2026#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>These features are available now in the Playground. Start using them today:</p>
<ul>
<li class="">Open the Playground and check the provider dropdown to see costs</li>
<li class="">Click the evaluate button to run evaluations directly</li>
<li class="">Use the collapse controls on test cases to manage your view</li>
</ul>]]></content:encoded>
            <category>v0.73.0</category>
        </item>
        <item>
            <title><![CDATA[Chat Sessions in Observability]]></title>
            <link>https://agenta.ai/docs/changelog/chat-sessions-observability</link>
            <guid>https://agenta.ai/docs/changelog/chat-sessions-observability</guid>
            <pubDate>Fri, 09 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Track and analyze multi-turn conversations with session grouping, cost analytics, and conversation flow visualization.]]></description>
            <content:encoded><![CDATA[<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/gOcLTuaIwXc" title="Chat Sessions in Observability - Demo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="overview">Overview<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#overview" class="hash-link" aria-label="Direct link to Overview" title="Direct link to Overview" translate="no">​</a></h2>
<p>Chat sessions bring conversation-level observability to Agenta. You can now group related traces from multi-turn conversations together, making it easy to analyze complete user interactions rather than individual requests.</p>
<p>This feature is essential for debugging chatbots, AI assistants, and any application with multi-turn conversations. You get visibility into the entire conversation flow, including costs, latency, and intermediate steps.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="key-capabilities">Key Capabilities<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#key-capabilities" class="hash-link" aria-label="Direct link to Key Capabilities" title="Direct link to Key Capabilities" translate="no">​</a></h2>
<ul>
<li class=""><strong>Automatic Grouping</strong>: All traces with the same <code>ag.session.id</code> attribute are automatically grouped together</li>
<li class=""><strong>Session Analytics</strong>: Track total cost, latency, and token usage per conversation</li>
<li class=""><strong>Session Browser</strong>: Dedicated UI showing all sessions with first input, last output, and key metrics</li>
<li class=""><strong>Session Drawer</strong>: Detailed view of all traces within a session with parent-child relationships</li>
<li class=""><strong>Real-time Monitoring</strong>: Auto-refresh mode for monitoring active conversations</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-to-use-sessions">How to Use Sessions<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#how-to-use-sessions" class="hash-link" aria-label="Direct link to How to Use Sessions" title="Direct link to How to Use Sessions" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="using-the-python-sdk">Using the Python SDK<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#using-the-python-sdk" class="hash-link" aria-label="Direct link to Using the Python SDK" title="Direct link to Using the Python SDK" translate="no">​</a></h3>
<p>Add session tracking to your application with one line of code:</p>
<div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Initialize Agenta</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Store the session ID for all subsequent traces</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tracing</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">store_session</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">session_id</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"conversation_123"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Your LLM calls are automatically tracked with this session</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello!"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="using-the-chat-run-endpoint">Using the Chat Run Endpoint<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#using-the-chat-run-endpoint" class="hash-link" aria-label="Direct link to Using the Chat Run Endpoint" title="Direct link to Using the Chat Run Endpoint" translate="no">​</a></h3>
<p>You can also instrument sessions when calling Agenta-managed prompts via the <code>/chat/run</code> endpoint:</p>
<div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Initialize the Agenta client</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">agenta </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">Agenta</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your_api_key"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Call the chat endpoint with session tracking</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> agenta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    base_id</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your_base_id"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    environment</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"production"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    inputs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"chat_history"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is the weather like?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Add session metadata to group related conversations</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"ag.session.id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user_456_conv_789"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Follow-up in the same session</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">follow_up </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> agenta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    base_id</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your_base_id"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    environment</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"production"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    inputs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"chat_history"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is the weather like?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"assistant"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"message"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What about tomorrow?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"ag.session.id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user_456_conv_789"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Same session ID</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="using-opentelemetry">Using OpenTelemetry<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#using-opentelemetry" class="hash-link" aria-label="Direct link to Using OpenTelemetry" title="Direct link to Using OpenTelemetry" translate="no">​</a></h3>
<p>If you're using OpenTelemetry for instrumentation:</p>
<div class="language-javascript codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-javascript codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword module" style="color:#00009f">import</span><span class="token plain"> </span><span class="token imports punctuation" style="color:#393A34">{</span><span class="token imports"> trace </span><span class="token imports punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token keyword module" style="color:#00009f">from</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'@opentelemetry/api'</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> tracer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">getTracer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'my-app'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> span </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> tracer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">startSpan</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'chat-interaction'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// Add session ID as a span attribute</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setAttribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'ag.session.id'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'conversation_123'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// Your code here</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">span</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">end</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre></div></div>
<p>The UI automatically detects session IDs and groups traces together. You can use any format for session IDs: UUIDs, composite IDs like <code>user_123_session_456</code>, or custom formats.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="use-cases">Use Cases<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#use-cases" class="hash-link" aria-label="Direct link to Use Cases" title="Direct link to Use Cases" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="debug-chatbots">Debug Chatbots<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#debug-chatbots" class="hash-link" aria-label="Direct link to Debug Chatbots" title="Direct link to Debug Chatbots" translate="no">​</a></h3>
<p>See the complete conversation flow when users report issues. Instead of viewing isolated requests, you can analyze the entire conversation context and understand why a particular response was generated.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="monitor-multi-turn-agents">Monitor Multi-turn Agents<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#monitor-multi-turn-agents" class="hash-link" aria-label="Direct link to Monitor Multi-turn Agents" title="Direct link to Monitor Multi-turn Agents" translate="no">​</a></h3>
<p>Track how your agent handles follow-up questions and maintains context across turns. See which turns are expensive, identify where latency spikes occur, and understand conversation patterns.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="analyze-conversation-costs">Analyze Conversation Costs<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#analyze-conversation-costs" class="hash-link" aria-label="Direct link to Analyze Conversation Costs" title="Direct link to Analyze Conversation Costs" translate="no">​</a></h3>
<p>Understand which conversations are expensive and why. Session-level cost tracking helps you identify optimization opportunities and set appropriate pricing for your application.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="optimize-performance">Optimize Performance<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#optimize-performance" class="hash-link" aria-label="Direct link to Optimize Performance" title="Direct link to Optimize Performance" translate="no">​</a></h3>
<p>Identify latency issues across entire conversations, not just single requests. See which conversational patterns lead to performance problems and optimize accordingly.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Learn more in our documentation:</p>
<ul>
<li class=""><a class="" href="https://agenta.ai/docs/observability/trace-with-python-sdk/track-chat-sessions">Track Chat Sessions (Python SDK)</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/observability/trace-with-opentelemetry/session-tracking">Session Tracking (OpenTelemetry)</a></li>
<li class=""><a class="" href="https://agenta.ai/docs/observability/overview">Observability Overview</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="whats-next">What's Next<a href="https://agenta.ai/docs/changelog/chat-sessions-observability#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next" translate="no">​</a></h2>
<p>We're continuing to enhance session tracking with upcoming features like session-level annotations, session comparisons, and automated session analysis.</p>]]></content:encoded>
            <category>v0.73.0</category>
        </item>
        <item>
            <title><![CDATA[JSON Multi-Field Match Evaluator]]></title>
            <link>https://agenta.ai/docs/changelog/json-multi-field-match</link>
            <guid>https://agenta.ai/docs/changelog/json-multi-field-match</guid>
            <pubDate>Wed, 31 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Compare multiple fields between JSON objects with the new JSON Multi-Field Match evaluator. Ideal for entity extraction validation with per-field scoring and support for nested paths.]]></description>
            <content:encoded><![CDATA[<p>The JSON Multi-Field Match evaluator lets you validate multiple fields in JSON outputs simultaneously. This makes it ideal for entity extraction tasks where you need to check if your model correctly extracted name, email, address, and other structured fields.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-is-json-multi-field-match">What is JSON Multi-Field Match?<a href="https://agenta.ai/docs/changelog/json-multi-field-match#what-is-json-multi-field-match" class="hash-link" aria-label="Direct link to What is JSON Multi-Field Match?" title="Direct link to What is JSON Multi-Field Match?" translate="no">​</a></h2>
<p>This evaluator compares specific fields between your model's JSON output and the expected JSON values from your test set. Unlike the old JSON Field Match evaluator (which only checked one field), this evaluator handles any number of fields at once.</p>
<p>For each field you configure, the evaluator produces a separate score (either 1 for a match or 0 for no match). It also calculates an aggregate score showing the percentage of fields that matched correctly.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="key-features">Key Features<a href="https://agenta.ai/docs/changelog/json-multi-field-match#key-features" class="hash-link" aria-label="Direct link to Key Features" title="Direct link to Key Features" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="multiple-field-comparison">Multiple Field Comparison<a href="https://agenta.ai/docs/changelog/json-multi-field-match#multiple-field-comparison" class="hash-link" aria-label="Direct link to Multiple Field Comparison" title="Direct link to Multiple Field Comparison" translate="no">​</a></h3>
<p>Configure as many fields as you need to validate. The evaluator checks each field independently and reports results for all of them.</p>
<p>If you're extracting user information, you might configure fields like <code>name</code>, <code>email</code>, <code>phone</code>, and <code>address.city</code>. Each field gets its own score, so you can see exactly which extractions succeeded and which failed.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="three-path-format-options">Three Path Format Options<a href="https://agenta.ai/docs/changelog/json-multi-field-match#three-path-format-options" class="hash-link" aria-label="Direct link to Three Path Format Options" title="Direct link to Three Path Format Options" translate="no">​</a></h3>
<p>The evaluator supports three different ways to specify field paths:</p>
<p><strong>Dot notation</strong> (recommended for most cases):</p>
<ul>
<li class="">Simple fields: <code>name</code>, <code>email</code></li>
<li class="">Nested fields: <code>user.address.city</code></li>
<li class="">Array indices: <code>items.0.name</code></li>
</ul>
<p><strong>JSON Path</strong> (standard JSON Path syntax):</p>
<ul>
<li class="">Simple fields: <code>$.name</code>, <code>$.email</code></li>
<li class="">Nested fields: <code>$.user.address.city</code></li>
<li class="">Array indices: <code>$.items[0].name</code></li>
</ul>
<p><strong>JSON Pointer</strong> (RFC 6901):</p>
<ul>
<li class="">Simple fields: <code>/name</code>, <code>/email</code></li>
<li class="">Nested fields: <code>/user/address/city</code></li>
<li class="">Array indices: <code>/items/0/name</code></li>
</ul>
<p>All three formats work the same way. Use whichever matches your existing tooling or personal preference.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="nested-field-and-array-support">Nested Field and Array Support<a href="https://agenta.ai/docs/changelog/json-multi-field-match#nested-field-and-array-support" class="hash-link" aria-label="Direct link to Nested Field and Array Support" title="Direct link to Nested Field and Array Support" translate="no">​</a></h3>
<p>Access deeply nested fields and array elements without restrictions. The evaluator handles any level of nesting.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="per-field-scoring">Per-Field Scoring<a href="https://agenta.ai/docs/changelog/json-multi-field-match#per-field-scoring" class="hash-link" aria-label="Direct link to Per-Field Scoring" title="Direct link to Per-Field Scoring" translate="no">​</a></h3>
<p>See individual scores for each configured field in the evaluation results. This granular view helps you identify which specific extractions are working well and which need improvement.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="aggregate-score">Aggregate Score<a href="https://agenta.ai/docs/changelog/json-multi-field-match#aggregate-score" class="hash-link" aria-label="Direct link to Aggregate Score" title="Direct link to Aggregate Score" translate="no">​</a></h3>
<p>The aggregate score shows the percentage of matching fields. If you configure five fields and three match, the aggregate score is 0.6 (or 60%).</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="example">Example<a href="https://agenta.ai/docs/changelog/json-multi-field-match#example" class="hash-link" aria-label="Direct link to Example" title="Direct link to Example" translate="no">​</a></h2>
<p>Suppose you're building an entity extraction model that pulls contact information from text. Your ground truth looks like this:</p>
<div class="language-json codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-json codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"John Doe"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"email"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"john@example.com"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"phone"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"555-1234"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"address"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"city"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"New York"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"zip"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"10001"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>Your model produces this output:</p>
<div class="language-json codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-json codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"John Doe"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"email"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"jane@example.com"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"phone"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"555-1234"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"address"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"city"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"New York"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"zip"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"10002"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>You configure these fields: <code>["name", "email", "phone", "address.city", "address.zip"]</code></p>
<p>The evaluator returns:</p>
<table><thead><tr><th>Field</th><th>Score</th></tr></thead><tbody><tr><td><code>name</code></td><td>1.0</td></tr><tr><td><code>email</code></td><td>0.0</td></tr><tr><td><code>phone</code></td><td>1.0</td></tr><tr><td><code>address.city</code></td><td>1.0</td></tr><tr><td><code>address.zip</code></td><td>0.0</td></tr><tr><td><code>aggregate_score</code></td><td>0.6</td></tr></tbody></table>
<p>You can see immediately that the model got the email and zip code wrong but correctly extracted the name, phone, and city.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="auto-detection-in-the-ui">Auto-Detection in the UI<a href="https://agenta.ai/docs/changelog/json-multi-field-match#auto-detection-in-the-ui" class="hash-link" aria-label="Direct link to Auto-Detection in the UI" title="Direct link to Auto-Detection in the UI" translate="no">​</a></h2>
<p>When you configure the evaluator in the web interface, Agenta automatically detects available fields from your test set data. Click to add or remove fields using a tag-based interface. This makes setup fast and reduces configuration errors.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="migration-from-json-field-match">Migration from JSON Field Match<a href="https://agenta.ai/docs/changelog/json-multi-field-match#migration-from-json-field-match" class="hash-link" aria-label="Direct link to Migration from JSON Field Match" title="Direct link to Migration from JSON Field Match" translate="no">​</a></h2>
<p>The old JSON Field Match evaluator only supported checking a single field. If you're using it, consider migrating to JSON Multi-Field Match to gain:</p>
<ul>
<li class="">Support for multiple fields in one evaluator</li>
<li class="">Per-field scoring for detailed analysis</li>
<li class="">Aggregate scoring for overall performance tracking</li>
<li class="">Nested field and array support</li>
</ul>
<p>Existing JSON Field Match configurations continue to work. We recommend migrating to JSON Multi-Field Match for new evaluations.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="next-steps">Next Steps<a href="https://agenta.ai/docs/changelog/json-multi-field-match#next-steps" class="hash-link" aria-label="Direct link to Next Steps" title="Direct link to Next Steps" translate="no">​</a></h2>
<p>Learn more about configuring and using the JSON Multi-Field Match evaluator in the <a class="" href="https://agenta.ai/docs/evaluation/configure-evaluators/classification-entity-extraction#json-multi-field-match">Classification and Entity Extraction Evaluators</a> documentation.</p>]]></content:encoded>
            <category>v0.73.0</category>
        </item>
        <item>
            <title><![CDATA[PDF Support in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/pdf-support-in-playground</link>
            <guid>https://agenta.ai/docs/changelog/pdf-support-in-playground</guid>
            <pubDate>Wed, 17 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Attach PDFs to chat messages in the Playground. Upload files, provide URLs, or use file IDs from provider APIs. Works across evaluations and observability.]]></description>
            <content:encoded><![CDATA[<p>The Playground now supports PDF attachments for chat applications. You can include PDF documents in your prompts to build applications that analyze documents, answer questions about content, or extract information from files.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-is-pdf-support">What is PDF Support?<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#what-is-pdf-support" class="hash-link" aria-label="Direct link to What is PDF Support?" title="Direct link to What is PDF Support?" translate="no">​</a></h2>
<p>PDF support lets you attach PDF documents to chat messages when testing prompts in the Playground. The feature works with vision-capable models from OpenAI, Gemini, and Claude. These models can read and understand PDF content to answer questions or perform analysis.</p>
<p>This is useful when you're building applications that need to work with documents. Examples include invoice processing, contract analysis, document Q&amp;A, or content extraction.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="supported-providers">Supported Providers<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#supported-providers" class="hash-link" aria-label="Direct link to Supported Providers" title="Direct link to Supported Providers" translate="no">​</a></h2>
<p>PDF support works with vision-capable models that handle document inputs.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-to-attach-pdfs">How to Attach PDFs<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#how-to-attach-pdfs" class="hash-link" aria-label="Direct link to How to Attach PDFs" title="Direct link to How to Attach PDFs" translate="no">​</a></h2>
<p>To attach a PDF to a chat message, click "Add attachment" in the message input. You'll see three options:</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="upload-a-file">Upload a File<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#upload-a-file" class="hash-link" aria-label="Direct link to Upload a File" title="Direct link to Upload a File" translate="no">​</a></h3>
<p>Select a PDF from your computer. The file is converted to base64 and sent with your prompt.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="provide-a-url">Provide a URL<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#provide-a-url" class="hash-link" aria-label="Direct link to Provide a URL" title="Direct link to Provide a URL" translate="no">​</a></h3>
<p>Paste the URL to a publicly accessible PDF. The model fetches the PDF from the URL.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="use-a-file-id">Use a File ID<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#use-a-file-id" class="hash-link" aria-label="Direct link to Use a File ID" title="Direct link to Use a File ID" translate="no">​</a></h3>
<p>If you've uploaded a file through a provider's API (like the Gemini Files API), you can use the file ID instead. The model retrieves the file from the provider's storage.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="using-pdfs-in-evaluations">Using PDFs in Evaluations<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#using-pdfs-in-evaluations" class="hash-link" aria-label="Direct link to Using PDFs in Evaluations" title="Direct link to Using PDFs in Evaluations" translate="no">​</a></h2>
<p>PDF attachments work in both automatic and human evaluations. You can include PDFs in your test sets and run evaluations across multiple documents.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="pdfs-in-observability-and-tracing">PDFs in Observability and Tracing<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#pdfs-in-observability-and-tracing" class="hash-link" aria-label="Direct link to PDFs in Observability and Tracing" title="Direct link to PDFs in Observability and Tracing" translate="no">​</a></h2>
<p>When you trace requests that include PDFs, you can see the PDF attachment information in the trace data.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="example-use-cases">Example Use Cases<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#example-use-cases" class="hash-link" aria-label="Direct link to Example Use Cases" title="Direct link to Example Use Cases" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="invoice-processing">Invoice Processing<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#invoice-processing" class="hash-link" aria-label="Direct link to Invoice Processing" title="Direct link to Invoice Processing" translate="no">​</a></h3>
<p>Create a prompt that extracts key information from invoices:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">Extract the following information from this invoice:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Invoice number</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Date</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Total amount</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Vendor name</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Line items</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Return the information as structured JSON.</span><br></span></code></pre></div></div>
<p>Attach sample invoices as PDFs. Test the prompt with different invoice formats to ensure reliable extraction across vendors.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="contract-analysis">Contract Analysis<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#contract-analysis" class="hash-link" aria-label="Direct link to Contract Analysis" title="Direct link to Contract Analysis" translate="no">​</a></h3>
<p>Build a prompt that analyzes legal contracts:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">Review the attached contract and identify:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Key obligations for each party</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Important dates and deadlines</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Termination clauses</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Liability limitations</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Provide a summary in plain language.</span><br></span></code></pre></div></div>
<p>Attach contract PDFs and verify that the model identifies critical terms consistently.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="document-qa">Document Q&amp;A<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#document-qa" class="hash-link" aria-label="Direct link to Document Q&amp;A" title="Direct link to Document Q&amp;A" translate="no">​</a></h3>
<p>Create an assistant that answers questions about documents:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">You are a document assistant. Answer the user's question based on the</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">attached PDF. Be specific and cite page numbers when possible.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Question: {{question}}</span><br></span></code></pre></div></div>
<p>Attach various document types (reports, manuals, research papers) and test question-answering accuracy across different content.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="next-steps">Next Steps<a href="https://agenta.ai/docs/changelog/pdf-support-in-playground#next-steps" class="hash-link" aria-label="Direct link to Next Steps" title="Direct link to Next Steps" translate="no">​</a></h2>
<p>Learn more about <a class="" href="https://agenta.ai/docs/prompt-engineering/playground/using-playground">using the Playground</a> to develop and test prompts with PDF attachments.</p>]]></content:encoded>
            <category>v0.69.0</category>
        </item>
        <item>
            <title><![CDATA[Agenta Documentation MCP Server]]></title>
            <link>https://agenta.ai/docs/changelog/mcp-server</link>
            <guid>https://agenta.ai/docs/changelog/mcp-server</guid>
            <pubDate>Sun, 14 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Connect AI coding agents to Agenta documentation using the Agenta MCP server.]]></description>
            <content:encoded><![CDATA[<p>AI coding agents like Cursor, Claude Code, VS Code Copilot, and Windsurf can now access Agenta documentation directly through the Agenta MCP server.</p>
<p>The MCP server implements the Model Context Protocol, allowing AI assistants to search and retrieve Agenta documentation on demand. Instead of manually searching docs, your AI agent can answer questions about Agenta features, APIs, and code examples.</p>
<p><strong><a class="" href="https://agenta.ai/docs/misc/mcp-server">Read the full setup guide →</a></strong></p>]]></content:encoded>
            <category>v0.68.3</category>
        </item>
        <item>
            <title><![CDATA[Projects within Organizations]]></title>
            <link>https://agenta.ai/docs/changelog/projects-within-organizations</link>
            <guid>https://agenta.ai/docs/changelog/projects-within-organizations</guid>
            <pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Create projects within organizations to divide your work between different AI products. Each project scopes its prompts, traces, and evaluations independently.]]></description>
            <content:encoded><![CDATA[<p>You can now create projects within an organization. This feature helps you organize your work when you're building multiple AI products or managing different teams working on separate initiatives.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-are-projects">What Are Projects?<a href="https://agenta.ai/docs/changelog/projects-within-organizations#what-are-projects" class="hash-link" aria-label="Direct link to What Are Projects?" title="Direct link to What Are Projects?" translate="no">​</a></h2>
<p>Projects provide a way to isolate and organize your AI work within an organization. Each project maintains its own scope for:</p>
<ul>
<li class=""><strong>Prompts</strong>: All prompt templates and variants stay within the project</li>
<li class=""><strong>Traces</strong>: Observability data is scoped to the project that generated it</li>
<li class=""><strong>Evaluations</strong>: Test sets, evaluators, and evaluation results belong to specific projects</li>
</ul>
<p>This scoping prevents clutter and makes it easy to focus on one product at a time.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="creating-and-managing-projects">Creating and Managing Projects<a href="https://agenta.ai/docs/changelog/projects-within-organizations#creating-and-managing-projects" class="hash-link" aria-label="Direct link to Creating and Managing Projects" title="Direct link to Creating and Managing Projects" translate="no">​</a></h2>
<p>You can create a new project directly from the sidebar in the Agenta interface. Once created, you can switch between projects using the sidebar navigation.</p>
<p>Each team member can work in different projects simultaneously. The interface remembers your last active project, making it easy to pick up where you left off.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="when-to-use-projects">When to Use Projects<a href="https://agenta.ai/docs/changelog/projects-within-organizations#when-to-use-projects" class="hash-link" aria-label="Direct link to When to Use Projects" title="Direct link to When to Use Projects" translate="no">​</a></h2>
<p>Projects work well when you need to:</p>
<ul>
<li class="">Build multiple AI products for different use cases</li>
<li class="">Separate development work for different teams or departments</li>
<li class="">Keep client work isolated from internal tools</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="next-steps">Next Steps<a href="https://agenta.ai/docs/changelog/projects-within-organizations#next-steps" class="hash-link" aria-label="Direct link to Next Steps" title="Direct link to Next Steps" translate="no">​</a></h2>
<p>If you're managing complex AI initiatives across multiple products, projects give you the structure to keep everything organized. You can create your first project from the sidebar and start organizing your prompts and evaluations.</p>
<p>For questions about projects or organizational structure, check the <a class="" href="https://agenta.ai/docs/faq">FAQ</a> or reach out through our <a class="" href="https://agenta.ai/docs/misc/getting_support">support channels</a>.</p>]]></content:encoded>
            <category>v0.65.0</category>
        </item>
        <item>
            <title><![CDATA[Provider Built-in Tools in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/provider-built-in-tools</link>
            <guid>https://agenta.ai/docs/changelog/provider-built-in-tools</guid>
            <pubDate>Thu, 20 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Use provider built-in tools like web search, code execution, and file search directly in the Playground. Configure tools with your prompts and automatically invoke them through the LLM gateway.]]></description>
            <content:encoded><![CDATA[<p>The Playground now supports provider built-in tools. You can use web search, code execution, file search, and other native provider tools directly when developing prompts.</p>
<div style="text-align:center;margin:20px auto;max-width:50%;width:50%"><div style="background-size:cover;background-repeat:no-repeat;position:relative;background-image:url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAALCAYAAABGbhwYAAAACXBIWXMAAAsTAAALEwEAmpwYAAABCUlEQVR4nH2Pz0rDQBDG993aR/DgTQSPPXiKuQjtwbthL+JJxIsUL176LtU0hFyWLmGSbcz+md2MbKxSKHXgm2HgN9/wMa31RFu7MM4l3vuUiNJGqfRidpOeXc6S86vrxe3d/YQhIqefwv2kXht6fF7Sw9MrvrytaPm+4gwAMh8Gsg71Hj6UjodCiIzJuuYOA9U14DAMv6bkvY8av2w2G86UUjw6RpCOC2Mry5KzGoCHgaj76jGEcBoEgDGM2u3QGHsa3G4ld+hJCPk/2Kk+i0vX99qhR+/DUeqiKDJmjB1fa2PRWEcHwf8cpZSc5Xk+/cjz+Xr9mVRVlSql0qZpRgFA0rbtXAgx/QYcbY8I+vOtKQAAAABJRU5ErkJggg==&quot;)"><svg style="width:100%;height:auto;max-width:100%;margin-bottom:-4px" width="600" height="646"></svg><noscript><img style="width:100%;height:auto;max-width:100%;margin-bottom:-4px;position:absolute;top:0;left:0" src="/docs/assets/ideal-img/tools-dropdown-playground.4df4bcb.600.png" srcset="/docs/assets/ideal-img/tools-dropdown-playground.4df4bcb.600.png 600w" width="600" height="646"></noscript></div></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-are-provider-built-in-tools">What Are Provider Built-in Tools?<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#what-are-provider-built-in-tools" class="hash-link" aria-label="Direct link to What Are Provider Built-in Tools?" title="Direct link to What Are Provider Built-in Tools?" translate="no">​</a></h2>
<p>Provider built-in tools are capabilities that LLM providers offer natively. Unlike custom tools that you define with JSON schemas, these tools are managed by the provider. When the model needs them, the provider handles execution and returns results automatically.</p>
<p>Common built-in tools include:</p>
<ul>
<li class="">Web search: Fetch current information from the internet</li>
<li class="">Code execution: Run Python or JavaScript code</li>
<li class="">File search: Search through uploaded documents</li>
<li class="">Bash scripting: Execute shell commands (Anthropic)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="supported-providers-and-tools">Supported Providers and Tools<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#supported-providers-and-tools" class="hash-link" aria-label="Direct link to Supported Providers and Tools" title="Direct link to Supported Providers and Tools" translate="no">​</a></h2>
<p>Different providers offer different built-in tools:</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="openai">OpenAI<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#openai" class="hash-link" aria-label="Direct link to OpenAI" title="Direct link to OpenAI" translate="no">​</a></h3>
<ul>
<li class=""><strong>Web Search</strong>: Access current information from the web</li>
<li class=""><strong>File Search</strong>: Search through files you upload to OpenAI</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="anthropic">Anthropic<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#anthropic" class="hash-link" aria-label="Direct link to Anthropic" title="Direct link to Anthropic" translate="no">​</a></h3>
<ul>
<li class=""><strong>Web Search</strong>: Retrieve information from the internet</li>
<li class=""><strong>Bash Scripting</strong>: Execute bash commands in a sandboxed environment</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="gemini">Gemini<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#gemini" class="hash-link" aria-label="Direct link to Gemini" title="Direct link to Gemini" translate="no">​</a></h3>
<ul>
<li class=""><strong>Web Search</strong>: Search the web for current information</li>
<li class=""><strong>Code Execution</strong>: Run Python code to perform calculations and data analysis</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="how-to-use-built-in-tools">How to Use Built-in Tools<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#how-to-use-built-in-tools" class="hash-link" aria-label="Direct link to How to Use Built-in Tools" title="Direct link to How to Use Built-in Tools" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="adding-tools-in-the-playground">Adding Tools in the Playground<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#adding-tools-in-the-playground" class="hash-link" aria-label="Direct link to Adding Tools in the Playground" title="Direct link to Adding Tools in the Playground" translate="no">​</a></h3>
<ol>
<li class="">Open your prompt in the Playground</li>
<li class="">Click the "Add Tool" button in the configuration panel</li>
<li class="">Choose the tools you want to enable for your prompt</li>
<li class="">Test your prompt; the model will automatically use tools when needed</li>
</ol>
<p>The tools are saved with your prompt configuration. When you commit changes, the tool configuration is stored with the variant.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="invoking-with-tools-via-llm-gateway">Invoking with Tools via LLM Gateway<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#invoking-with-tools-via-llm-gateway" class="hash-link" aria-label="Direct link to Invoking with Tools via LLM Gateway" title="Direct link to Invoking with Tools via LLM Gateway" translate="no">​</a></h3>
<p>When you invoke prompts through Agenta as an LLM gateway, the tools are automatically included in the request. The provider handles tool execution during the call.</p>
<p>Your application receives the final response after all tool calls complete. You don't need to handle tool execution yourself.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="tool-definitions-in-the-registry">Tool Definitions in the Registry<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#tool-definitions-in-the-registry" class="hash-link" aria-label="Direct link to Tool Definitions in the Registry" title="Direct link to Tool Definitions in the Registry" translate="no">​</a></h3>
<p>Tool definitions follow the LiteLLM format. You can view the exact tool schemas in the Prompt Registry. This helps you understand what parameters each tool accepts and how the provider will use it.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="example-use-cases">Example Use Cases<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#example-use-cases" class="hash-link" aria-label="Direct link to Example Use Cases" title="Direct link to Example Use Cases" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="research-assistant-with-web-search">Research Assistant with Web Search<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#research-assistant-with-web-search" class="hash-link" aria-label="Direct link to Research Assistant with Web Search" title="Direct link to Research Assistant with Web Search" translate="no">​</a></h3>
<p>Create a prompt that answers questions using current information:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">You are a research assistant. Answer the user's question with accurate,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">current information. Use web search when you need recent data.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Question: {{question}}</span><br></span></code></pre></div></div>
<p>Enable web search in the tool configuration. When users ask about current events or recent data, the model automatically searches the web for information.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="data-analysis-with-code-execution">Data Analysis with Code Execution<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#data-analysis-with-code-execution" class="hash-link" aria-label="Direct link to Data Analysis with Code Execution" title="Direct link to Data Analysis with Code Execution" translate="no">​</a></h3>
<p>Build a data analysis prompt that performs calculations:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">Analyze the following data and provide insights:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{{data}}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Calculate statistics and create visualizations as needed.</span><br></span></code></pre></div></div>
<p>Enable code execution for Gemini. The model can run Python code to calculate statistics, process data, and generate visualizations.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="document-qa-with-file-search">Document Q&amp;A with File Search<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#document-qa-with-file-search" class="hash-link" aria-label="Direct link to Document Q&amp;A with File Search" title="Direct link to Document Q&amp;A with File Search" translate="no">​</a></h3>
<p>Create a prompt that answers questions about uploaded documents:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">Answer the user's question based on the uploaded documentation.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Be specific and cite relevant sections.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Question: {{question}}</span><br></span></code></pre></div></div>
<p>Enable file search for OpenAI. The model searches through your uploaded files to find relevant information.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="next-steps">Next Steps<a href="https://agenta.ai/docs/changelog/provider-built-in-tools#next-steps" class="hash-link" aria-label="Direct link to Next Steps" title="Direct link to Next Steps" translate="no">​</a></h2>
<p>Learn more about <a class="" href="https://agenta.ai/docs/prompt-engineering/playground/using-playground">using the Playground</a> to develop and test prompts with provider built-in tools.</p>]]></content:encoded>
            <category>v0.66.0</category>
        </item>
        <item>
            <title><![CDATA[Reasoning Effort Support in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/reasoning-effort-support</link>
            <guid>https://agenta.ai/docs/changelog/reasoning-effort-support</guid>
            <pubDate>Tue, 18 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[You can now configure reasoning effort for models that support this parameter, such as OpenAI's o1 series and Google's Gemini 2.5 Pro.]]></description>
            <content:encoded><![CDATA[<p>You can now configure reasoning effort for models that support this parameter, such as OpenAI's o1 series and Google's Gemini 2.5 Pro.</p>
<p>Reasoning effort controls how much computational thinking the model applies before generating a response. This is particularly useful for complex reasoning tasks where you want to balance response quality with latency and cost.</p>
<p>The reasoning effort parameter is part of your prompt template configuration. When you fetch prompts via the SDK or invoke them through Agenta as an LLM gateway, the reasoning effort setting is included in the configuration and applied to your requests automatically.</p>
<p>This gives you fine-grained control over model behavior directly from the playground, making it easier to optimize for your specific use case.</p>]]></content:encoded>
            <category>v0.62.5</category>
        </item>
        <item>
            <title><![CDATA[Jinja2 Template Support in the Playground]]></title>
            <link>https://agenta.ai/docs/changelog/jinja2-template-support</link>
            <guid>https://agenta.ai/docs/changelog/jinja2-template-support</guid>
            <pubDate>Mon, 17 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[We're excited to announce a powerful update to the Agenta playground. You can now use Jinja2 templating in your prompts.]]></description>
            <content:encoded><![CDATA[<p>We're excited to announce a powerful update to the Agenta playground. You can now use Jinja2 templating in your prompts.</p>
<p>This means you can add sophisticated logic directly into your prompt templates. Use conditional statements, apply filters to variables, and transform data on the fly.</p>
<p>Learn more in our <a href="https://agenta.ai/blog/launch-week-2-day-5-jinja2-prompt-templates" target="_blank" rel="noopener noreferrer" class="">blog post</a> or check the <a class="" href="https://agenta.ai/docs/prompt-engineering/playground/using-playground#switching-template-formats">documentation</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="example">Example<a href="https://agenta.ai/docs/changelog/jinja2-template-support#example" class="hash-link" aria-label="Direct link to Example" title="Direct link to Example" translate="no">​</a></h2>
<p>Here's a prompt template that uses Jinja2 to adapt based on user expertise level:</p>
<div class="language-text codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-text codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">You are {% if expertise_level == "beginner" %}a friendly teacher who explains concepts in simple terms{% else %}a technical expert providing detailed analysis{% endif %}.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Explain {{ topic }} {% if include_examples %}with practical examples{% endif %}.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{% if False %} {{expertise_level}} {{include_examples}} {% endif %}</span><br></span></code></pre></div></div>
<p>Note: The <code>{% if False %}</code> block makes variables available to the playground without including them in the final prompt.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="using-jinja2-prompts">Using Jinja2 Prompts<a href="https://agenta.ai/docs/changelog/jinja2-template-support#using-jinja2-prompts" class="hash-link" aria-label="Direct link to Using Jinja2 Prompts" title="Direct link to Using Jinja2 Prompts" translate="no">​</a></h2>
<p>When you fetch a Jinja2 prompt via the SDK, you get the template format included in the configuration:</p>
<div class="language-json codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-json codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"prompt"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"messages"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"role"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"content"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"You are {% if expertise_level == \"beginner\" %}a friendly teacher...{% endif %}"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"llm_config"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"model"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpt-4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"temperature"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.7</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"template_format"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"jinja2"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>The <code>template_format</code> field tells Agenta how to process your variables. This works both when invoking prompts through Agenta as an LLM gateway and when fetching prompts programmatically via the SDK.</p>
<hr>]]></content:encoded>
            <category>v0.62.3</category>
        </item>
        <item>
            <title><![CDATA[Agenta Core is Now Open Source]]></title>
            <link>https://agenta.ai/docs/changelog/open-sourcing-agenta</link>
            <guid>https://agenta.ai/docs/changelog/open-sourcing-agenta</guid>
            <pubDate>Thu, 13 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Agenta's core product is now open source under the MIT license. All functional features including evaluation, prompt management, and observability are available to the community.]]></description>
            <content:encoded><![CDATA[<p>We're open sourcing the core of Agenta under the MIT license. All functional features are now available to the community.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="whats-open-source">What's Open Source<a href="https://agenta.ai/docs/changelog/open-sourcing-agenta#whats-open-source" class="hash-link" aria-label="Direct link to What's Open Source" title="Direct link to What's Open Source" translate="no">​</a></h2>
<p>Every feature you need to build, test, and deploy LLM applications is now open source. This includes the evaluation system, prompt playground and management, observability, and all core workflows.</p>
<p>You can run evaluations using LLM-as-a-Judge, custom code evaluators, or any built-in evaluator. Create and manage test sets. Evaluate end-to-end workflows or specific spans in traces.</p>
<p>Experiment with prompts in the playground. Version and commit changes. Deploy to environments. Fetch configurations programmatically.</p>
<p>Trace your LLM applications with OpenTelemetry support. View detailed execution traces. Monitor costs and performance. Filter and search traces.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="building-in-public-again">Building in Public Again<a href="https://agenta.ai/docs/changelog/open-sourcing-agenta#building-in-public-again" class="hash-link" aria-label="Direct link to Building in Public Again" title="Direct link to Building in Public Again" translate="no">​</a></h2>
<p>We've moved development back to the public repository. You can see what we're building, contribute features, and shape the product direction.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-remains-under-commercial-license">What Remains Under Commercial License<a href="https://agenta.ai/docs/changelog/open-sourcing-agenta#what-remains-under-commercial-license" class="hash-link" aria-label="Direct link to What Remains Under Commercial License" title="Direct link to What Remains Under Commercial License" translate="no">​</a></h2>
<p>Only enterprise collaboration features stay under a separate license. This includes role-based access control (RBAC), single sign-on (SSO), and audit logs. These features support teams with specific compliance and security requirements.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="get-started">Get Started<a href="https://agenta.ai/docs/changelog/open-sourcing-agenta#get-started" class="hash-link" aria-label="Direct link to Get Started" title="Direct link to Get Started" translate="no">​</a></h2>
<p>Follow the <a class="" href="https://agenta.ai/docs/self-host/quick-start">self-hosting quick start guide</a> to deploy Agenta on your infrastructure. View the source code and contribute on <a href="https://github.com/Agenta-AI/agenta" target="_blank" rel="noopener noreferrer" class="">GitHub</a>. Read why we made this decision at <a href="https://agenta.ai/blog/commercial-open-source-is-hard-our-journey" target="_blank" rel="noopener noreferrer" class="">agenta.ai/blog/commercial-open-source-is-hard-our-journey</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="what-this-means-for-you">What This Means for You<a href="https://agenta.ai/docs/changelog/open-sourcing-agenta#what-this-means-for-you" class="hash-link" aria-label="Direct link to What This Means for You" title="Direct link to What This Means for You" translate="no">​</a></h2>
<p>You can run Agenta on your infrastructure with full access to evaluation, prompting, and observability features. You can modify the code to fit your needs. You can contribute back to the project.</p>
<p>The MIT license gives you freedom to use, modify, and distribute Agenta. We believe open source creates better products through community collaboration.</p>]]></content:encoded>
            <category>announcement</category>
        </item>
        <item>
            <title><![CDATA[Evaluation SDK]]></title>
            <link>https://agenta.ai/docs/changelog/evaluation-sdk</link>
            <guid>https://agenta.ai/docs/changelog/evaluation-sdk</guid>
            <pubDate>Wed, 12 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Run programmatic evaluations of complex AI agents and workflows from code. Evaluate agents built with any framework with full control over test data and evaluation logic. View results in the Agenta dashboard with traces and comparison views.]]></description>
            <content:encoded><![CDATA[<p>The Evaluation SDK lets you run evaluations programmatically from code. You get full control over test data and evaluation logic. You can evaluate agents built with any framework and view results in the Agenta dashboard.</p>
<div style="display:flex;justify-content:center;margin-top:20px;margin-bottom:20px;flex-direction:column;align-items:center"><iframe width="100%" height="500" src="https://www.youtube.com/embed/1sZASEjvoOA" title="Evaluation SDK - Demonstration" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="why-programmatic-evaluation">Why Programmatic Evaluation?<a href="https://agenta.ai/docs/changelog/evaluation-sdk#why-programmatic-evaluation" class="hash-link" aria-label="Direct link to Why Programmatic Evaluation?" title="Direct link to Why Programmatic Evaluation?" translate="no">​</a></h2>
<p>Complex AI agents need evaluation that goes beyond UI-based testing. The Evaluation SDK provides code-level control over test data and evaluation logic. You can test agents built with any framework. Run evaluations in your CI/CD pipeline. Debug complex workflows with full trace visibility.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="key-capabilities">Key Capabilities<a href="https://agenta.ai/docs/changelog/evaluation-sdk#key-capabilities" class="hash-link" aria-label="Direct link to Key Capabilities" title="Direct link to Key Capabilities" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="test-data-management">Test Data Management<a href="https://agenta.ai/docs/changelog/evaluation-sdk#test-data-management" class="hash-link" aria-label="Direct link to Test Data Management" title="Direct link to Test Data Management" translate="no">​</a></h3>
<p>Create test sets directly in your code or fetch existing ones from Agenta. Test sets can include ground truth data for reference-based evaluation or work without it for evaluators that only need the output.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="built-in-evaluators">Built-in Evaluators<a href="https://agenta.ai/docs/changelog/evaluation-sdk#built-in-evaluators" class="hash-link" aria-label="Direct link to Built-in Evaluators" title="Direct link to Built-in Evaluators" translate="no">​</a></h3>
<p>The SDK includes LLM-as-a-Judge, semantic similarity, and regex matching evaluators. You can also write custom Python evaluators for your specific requirements.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="reusable-configurations">Reusable Configurations<a href="https://agenta.ai/docs/changelog/evaluation-sdk#reusable-configurations" class="hash-link" aria-label="Direct link to Reusable Configurations" title="Direct link to Reusable Configurations" translate="no">​</a></h3>
<p>Save evaluator configurations in Agenta to reuse them across runs. Configure an evaluator once, then reference it in multiple evaluations.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="span-level-evaluation">Span-Level Evaluation<a href="https://agenta.ai/docs/changelog/evaluation-sdk#span-level-evaluation" class="hash-link" aria-label="Direct link to Span-Level Evaluation" title="Direct link to Span-Level Evaluation" translate="no">​</a></h3>
<p>Evaluate your agent end to end or test specific spans in the execution trace. Test individual components like retrieval steps or tool calls separately.</p>
<h3 class="anchor anchorTargetStickyNavbar_vHny" id="run-on-your-infrastructure">Run on Your Infrastructure<a href="https://agenta.ai/docs/changelog/evaluation-sdk#run-on-your-infrastructure" class="hash-link" aria-label="Direct link to Run on Your Infrastructure" title="Direct link to Run on Your Infrastructure" translate="no">​</a></h3>
<p>Evaluations run on your infrastructure. Results appear in the Agenta dashboard with full traces and comparison views.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="getting-started">Getting Started<a href="https://agenta.ai/docs/changelog/evaluation-sdk#getting-started" class="hash-link" aria-label="Direct link to Getting Started" title="Direct link to Getting Started" translate="no">​</a></h2>
<p>Install the SDK:</p>
<div class="language-bash codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-bash codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token plain">pip install agenta</span><br></span></code></pre></div></div>
<p>Here's a minimal example evaluating a simple agent:</p>
<div class="language-python codeBlockContainer_PvTT theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_cFN5"><pre tabindex="0" class="prism-code language-python codeBlock_HP0T thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_W7pG"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> agenta </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ag</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> agenta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sdk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">evaluations </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> aevaluate</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Initialize</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Define your application</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@ag</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">application</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">slug</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"my_agent"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">my_agent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">question</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Your agent logic here</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> answer</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Define an evaluator</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@ag</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">evaluator</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">slug</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"correctness_check"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">correctness_check</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">expected</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> outputs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"score"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1.0</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> outputs </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> expected </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"success"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> outputs </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> expected</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Create test data</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">testset </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> ag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">testsets</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">acreate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Agent Tests"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    data</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is 2+2?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"expected"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"4"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is the capital of France?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"expected"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Paris"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Run evaluation</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> aevaluate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Agent Correctness Test"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    testsets</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">testset</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    applications</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">my_agent</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    evaluators</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">correctness_check</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"View results: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">result</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'dashboard_url'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="dashboard-integration">Dashboard Integration<a href="https://agenta.ai/docs/changelog/evaluation-sdk#dashboard-integration" class="hash-link" aria-label="Direct link to Dashboard Integration" title="Direct link to Dashboard Integration" translate="no">​</a></h2>
<p>Every evaluation run gets a shareable dashboard link. The dashboard shows full execution traces, comparison views for different versions, aggregated metrics, and individual test case details.</p>
<h2 class="anchor anchorTargetStickyNavbar_vHny" id="next-steps">Next Steps<a href="https://agenta.ai/docs/changelog/evaluation-sdk#next-steps" class="hash-link" aria-label="Direct link to Next Steps" title="Direct link to Next Steps" translate="no">​</a></h2>
<p>Check out the <a class="" href="https://agenta.ai/docs/evaluation/evaluation-from-sdk/quick-start">Quick Start Guide</a> to build your first evaluation.</p>]]></content:encoded>
            <category>v0.62.0</category>
        </item>
    </channel>
</rss>