K
Krunal Kanojiya
HomeAboutServicesBlogWriting
Hire Me
K
Krunal Kanojiya

Technical Content Writer

BlogRSSSitemapEmail
© 2026 Krunal Kanojiya · Built with Next.js
Privacy PolicyTerms of Service
  1. Home
  2. /
  3. Blog
  4. /
  5. Mosaic AI Agent Framework: The Complete Guide for Building Production AI Agents on Databricks in 2026
Tech20 min read3,923 words

Mosaic AI Agent Framework: The Complete Guide for Building Production AI Agents on Databricks in 2026

Mosaic AI Agent Framework is Databricks' built-in platform for building, evaluating, and deploying enterprise AI agents. This research-backed guide covers every component from Vector Search and MLflow tracing to Agent Bricks, Unity Catalog governance, and MCP integration, with real enterprise examples.

Krunal Kanojiya

Krunal Kanojiya

May 03, 2026
Share:
#mosaic-ai#databricks#ai-agents#agent-framework#rag#mlflow#unity-catalog#llm#agent-bricks#enterprise-ai

Most enterprise AI projects do not fail because the model picked the wrong answer.

They fail because the agent has no governed access to the right data, no evaluation loop before it ships, and no deployment path that the security team will actually approve. Teams spend months stitching together LangChain, Pinecone, a custom API layer, and MLflow, and end up with something that works in a notebook and falls apart the first time a user asks a question it was never tested on.

Mosaic AI Agent Framework is Databricks' answer to that problem. It is not a library you install on top of your existing setup. It is a complete production platform that sits inside Databricks, connecting your enterprise data directly to large language models through a governed, auditable stack.

This guide covers every major component of the framework, how they connect to each other, what changed with Agent Bricks in 2025 and 2026, and how real enterprises have used it to get from prototype to production.

What Mosaic AI Agent Framework actually is

The confusion about Mosaic AI usually starts here. Developers who have used LangChain or LlamaIndex expect an agent framework to be a Python library that handles orchestration, memory, and tool calls. Mosaic AI does those things, but that is only one layer of what it provides.

Mosaic AI Agent Framework is a suite of tools inside Databricks for building, evaluating, and deploying AI agents. It includes:

A data layer, through Delta Lake, Unity Catalog, and Vector Search, that gives agents governed access to your enterprise data. A development layer, through MLflow and the Agent Framework SDK, that handles experiment tracking, tracing, and evaluation. A deployment layer, through Model Serving, that runs agents as auto-scaling REST endpoints. A governance layer, through Unity Catalog and AI Gateway, that enforces access controls, rate limits, and audit trails across every model, tool, and API the agent interacts with.

The framework is also compatible with the open-source tools you already use. You can write agent logic with LangChain, LangGraph, or LlamaIndex and still deploy through Mosaic AI. You are not forced to use Databricks' SDK for everything. The platform layer wraps whatever you build.

That compatibility is important. It means you can migrate a prototype built on open-source tools into a production-governed deployment without rewriting the agent logic from scratch.

The component stack, layer by layer

Delta Lake and Unity Catalog: the data foundation

Every Mosaic AI agent connects back to your data. The quality of what the agent retrieves is a direct function of how clean, complete, and well-organized that data is. Delta Lake provides the storage foundation: ACID transactions, schema enforcement, time travel, and change data capture on top of open Parquet format in your cloud storage.

Unity Catalog sits above Delta Lake and provides centralized governance. Every table, model, tool, prompt, and dataset is registered in Unity Catalog with fine-grained access controls. When an agent calls a tool that queries a table, it inherits the user's permissions from Unity Catalog. The agent can only access what the user who invoked it is authorized to access. This is how enterprise data governance extends into the AI layer without building a separate permissions system.

For ML engineers and data engineers setting up the data layer, the key preparation steps are: structure your source data in Delta Lake tables with clean metadata, ensure column descriptions are meaningful (the LLM uses them for tool selection), and set up Unity Catalog permissions that match your production access policies before building any agent logic.

Mosaic AI Vector Search

RAG is still the most common production AI pattern for enterprise use cases. A retrieval-augmented generation system retrieves relevant context from your data before the LLM generates a response. The quality of retrieval directly determines the quality of the answer.

Mosaic AI Vector Search is a high-performance vector database with real-time syncing from Delta tables. When your source data updates, the vector index stays current automatically. You do not write sync jobs. You define the sync relationship and the platform manages it.

The practical implication is that agents always query fresh data. A traditional vector database setup requires scheduled re-indexing, which means the agent's knowledge can be hours or days stale. Real-time sync removes that lag for most operational use cases.

Vector Search is also exposed as a managed MCP server, which means external agents and tools can query your vector indexes through the MCP standard without custom integration work.

MLflow: tracing, evaluation, and the deployment gate

MLflow is the part of the stack that most enterprise teams underinvest in until they have a production incident.

MLflow Tracing records every step of an agent's reasoning chain: which retrieval was called, what context was returned, which tool was invoked, what the tool returned, and how the model reasoned from context to output. Every intermediate step is logged with inputs, outputs, and metadata.

This matters most when something goes wrong. When an agent gives a wrong answer in production, the trace shows you exactly where the failure happened. Was the retrieval returning the wrong documents? Did a tool call return an unexpected format? Did the model ignore the retrieved context and hallucinate instead? Without tracing, you answer that question by guessing. With tracing, you answer it in minutes.

MLflow 3.0, released at Data and AI Summit 2025, extended observability to agents running outside Databricks. If your agent is deployed on AWS, GCP, or on-premise infrastructure, you can still connect it to MLflow for tracing and evaluation. This matters for teams running hybrid environments or incrementally migrating to Databricks.

The other critical function of MLflow in this stack is evaluation as a deployment gate. Before a new version of your agent ships to production, MLflow Deployment Jobs run your evaluation suite and require human approval if quality thresholds are not met. The agent only reaches production when it passes your defined quality criteria. This is the difference between deploying with confidence and deploying with hope.

Agent Evaluation and the CLEARS framework

Building a test set before you deploy is the single most important thing you can do to catch production failures early. Databricks recommends a minimum of 100 question and answer pairs that represent the full range of questions the agent will handle in production. More is better. A financial services team that ran 200 evaluation queries before their audit assistant went live caught 14 answer-quality issues that manual review had missed entirely.

Agent Evaluation uses AI-as-a-Judge scoring to measure output quality. You define evaluation criteria, the LLM scores each agent response against those criteria, and MLflow logs the aggregate results alongside written rationale for each judgment call.

The CLEARS framework, introduced with Agent Bricks in 2026, provides a standardized six-dimension quality model: Correctness, Latency, Execution, Adherence, Relevance, and Safety. CLEARS gives teams a common vocabulary for defining what "good enough for production" means, which is harder than it sounds when different stakeholders have different definitions of quality.

Agent Evaluation also includes a review application: a browser-based UI that lets non-technical stakeholders review agent responses, submit feedback, and flag quality issues. The feedback is stored as Delta tables in Unity Catalog and feeds directly into the evaluation dataset for the next training iteration. This human feedback loop is how the agent gets better over time in a structured, auditable way.

Model Serving: from evaluation to endpoint

Once your agent passes evaluation, Mosaic AI Model Serving deploys it as an auto-scaling REST endpoint. The deployment handles token streaming for responsive user interfaces, request and response logging for monitoring, and automatic scaling under load. You do not manage the serving infrastructure.

The same Model Serving layer handles classical ML models, fine-tuned LLMs, and agent chains through a single unified deployment workflow. If your application uses a mix of a predictive model for risk scoring, a fine-tuned LLM for text generation, and an agent for question answering, all three deploy through the same system with the same governance and monitoring.

AI Gateway: the control plane for all AI access

AI Gateway provides a unified entry point for all AI services in your organization. It governs access to foundation models from OpenAI, Anthropic, Google, and Meta Llama through a single layer. Rate limits, usage logging, fallback routing, PII detection, safety guardrails, and prompt injection protection all run at the gateway level.

The practical result is that your organization does not need a separate governance policy for every model provider you use. AI Gateway enforces a consistent policy regardless of which underlying model the agent calls. When a model provider has an outage, the gateway automatically routes to a fallback provider without any changes to agent code.

AI Gateway also governs MCP-connected tools, which is a meaningful expansion of scope. Every external API your agent can call, every Databricks service it can query, and every third-party integration it uses is governed and auditable through the same control plane.

Agent Bricks: the 2025 layer that changes the build experience

Agent Bricks is the most significant product addition to the Mosaic AI stack in the last year. It sits on top of the framework described above and changes the experience of building agents from code-first to task-first.

What Agent Bricks does differently

With traditional agent development, you define your tools, write retrieval logic, configure the LLM, tune prompts, build your evaluation set, run evaluation, iterate on failures, and eventually deploy. Each step requires engineering work. The total time from idea to production-grade quality is typically weeks to months.

Agent Bricks compresses that cycle. You describe the agent's task in plain language, connect your enterprise data, and the platform generates domain-specific synthetic training data and task-aware benchmarks automatically. It then uses those benchmarks to optimize the agent for quality and cost, selecting the right model and configuration for your specific task without manual trial and error.

Databricks CEO Ali Ghodsi described Agent Bricks as "a new way of building and deploying AI agents that can reason on your data." The target outcome is getting from idea to production-grade quality with confidence in the quality and cost tradeoffs, rather than guessing.

The managed agents inside Agent Bricks

Agent Bricks ships four managed agent types for the most common enterprise use cases.

Supervisor Agent orchestrates multiple agents and tools into a single workflow. You define the task and connect your systems. The Supervisor coordinates execution across models, Genie spaces, Vector Search indexes, and external tools. BASF Coatings used a Supervisor Agent architecture for their Marketmind sales intelligence system, which serves structured data through Genie and unstructured data through Vector Search across more than 1,000 sales representatives worldwide.

Knowledge Assistant automatically ingests enterprise documents and makes them accessible to any agent. It handles retrieval that incorporates system context, metadata, and user constraints, turning document collections into queryable knowledge without requiring custom pipeline code.

Document Intelligence extracts and structures data from unstructured documents like contracts, invoices, and reports. It converts PDFs into queryable knowledge stored in Unity Catalog, without custom parsing pipelines for each document type.

Custom Agents on Apps lets teams build and deploy agent applications using any model or framework, with full lifecycle support on serverless compute and native integration with Lakebase for memory and conversation history.

Genie Agent Mode

Genie is Databricks' conversational interface for querying structured data using natural language. Agent Mode in Genie Spaces extends single-turn question answering into multi-step reasoning, allowing agents to plan, explore, and answer complex business questions that require multiple data queries and reasoning steps.

When a business user asks "What drove the increase in churn last quarter compared to Q3 of last year, and which customer segments were most affected?", a single SQL query cannot answer that. Agent Mode in Genie can break the question down, run multiple queries, synthesize the results, and return a structured answer, all governed by Unity Catalog permissions.

Genie Conversation APIs make Genie accessible inside custom agent workflows, so developers can embed structured data access into larger multi-agent systems without replicating the governance layer Genie already enforces.

MCP integration: how Mosaic AI connects to the world

The Model Context Protocol is the standard for connecting AI agents to external tools and data sources. Databricks launched managed MCP servers at Data and AI Summit 2025 and has since expanded coverage significantly.

The current managed MCP servers inside Databricks give agents governed access to:

Unity Catalog Functions for custom Python or SQL execution. Mosaic AI Vector Search for unstructured data retrieval. Genie Spaces for conversational structured data access. DBSQL for running SQL queries directly against Unity Catalog tables. The Online Feature Store for sub-25 millisecond feature lookups at high query volume.

All of these are built with enterprise security from the start. Managed MCP servers automatically respect a user's Unity Catalog permissions. The agent cannot access data the user is not authorized to see, regardless of how it is asked. Every tool call is logged in Unity Catalog for audit.

External services connect through Managed OAuth MCP Connectors. GitHub, Atlassian, and Glean are supported currently. Credentials are managed centrally in Unity Catalog. The agent connects to external services without ever seeing the underlying secrets.

The practical result is that your governance policy extends naturally from internal data to external integrations. You define permissions once in Unity Catalog, and those permissions apply whether the agent is querying your Delta tables, calling a Vector Search index, or making an API call to an external service.

The framework compatibility story

One of the less obvious design decisions in Mosaic AI Agent Framework is how deliberately it preserves compatibility with open-source tools.

You can build agent logic with LangChain, LangGraph, or LlamaIndex and log it to MLflow using standard APIs. The log_model and mlflow.evaluate APIs work with any agent implementation, not just agents written using Databricks' SDK. Once logged to MLflow, the agent can be registered in Unity Catalog and deployed to Model Serving like any other Mosaic AI agent.

This matters for teams that already have significant LangChain codebases and cannot rewrite everything. You can migrate incrementally: keep the agent logic in LangChain, add MLflow tracing, connect to Vector Search for retrieval, register in Unity Catalog, and deploy through Model Serving. Each step adds value without requiring a full rewrite.

The Databricks documentation describes this as "use any framework, deploy through the platform." The evaluation tooling, governance layer, and deployment infrastructure work regardless of how the agent logic was written.

Real enterprise deployments

The clearest evidence that a platform works is what real companies have built with it, not what the vendor claims it can do.

Corning's materials science research team built an AI research assistant using Mosaic AI Agent Framework that indexes hundreds of thousands of documents including US Patent Office data. Their researchers use it to find and build on prior work. The key requirement was high accuracy on scientific and patent literature, which required careful evaluation before deployment. Corning used the Agent Evaluation framework to measure retrieval quality against their specific document types before going live.

Block built an agent system using Mosaic AI to automate operations for their sellers, including generating customized menus. The automation uses agent tools to retrieve seller-specific data and generate personalized outputs at scale.

Intercontinental Exchange (ICE) built an agent system that uses their unique financial data to provide accurate answers to customer questions. Financial data governance was the primary constraint. Unity Catalog's row-level security and audit trail addressed the compliance requirements that would have blocked deployment in a less governed environment.

Burberry's analytics team used Mosaic AI to rapidly experiment with augmented LLMs while keeping private data within their control. The MLflow and Model Serving integration let their ML engineering team move from proof of concept to production with minimal additional complexity.

Ford Direct built a unified chatbot for Ford and Lincoln dealerships that integrates proprietary data and documentation using RAG. Dealers use it to assess performance, inventory, trends, and customer engagement metrics through natural language.

BASF Coatings' Marketmind project used a Supervisor Agent architecture to give more than 1,000 sales representatives personalized notifications and suggested actions based on real-time market events. The system combines structured data from Genie and unstructured data from Vector Search indexes. After a five to six week proof of concept and a one-month pilot with 25 key users, they rolled out to North America. Sales representatives now receive context-aware recommendations without searching through scattered notes and folders.

Mosaic AI vs LangChain: where each one wins

This is the practical question most developers ask when evaluating the framework.

LangChain is a flexible open-source library for building agent logic. It is excellent for prototyping because it has a large community, many pre-built integrations, and works with any infrastructure. If you need to experiment quickly with a new agent pattern or test a new model, LangChain is the fastest path.

The limitations of LangChain show up at production scale. Evaluation tooling is fragmented and requires assembling separate tools. Governance requires building a custom permissions layer. Deployment means setting up and managing your own serving infrastructure. Monitoring requires integrating another observability tool. Each of these is solvable, but each adds engineering work and organizational approval overhead.

Mosaic AI Agent Framework wins when your data is already in Databricks, you need governance your compliance team will accept, and you want a production-grade deployment without building the operations layer yourself. The evaluation tooling is built in. The governance layer is built in. The deployment infrastructure is managed. The monitoring is automatic.

The two approaches are also compatible. Many teams start with LangChain and migrate to Mosaic AI once the proof of concept is validated. That migration is manageable because the Agent Framework accepts LangChain-built agents through standard MLflow APIs.

A pattern that works well for teams on tight timelines: build and iterate agent logic with LangChain in a Databricks notebook, add MLflow tracing from the start, connect to Mosaic AI Vector Search for retrieval, and deploy through Model Serving when quality is validated. You keep the LangChain flexibility during development and get the platform governance at deployment time.

Building your first agent: the sequence that works

The most common mistake teams make is jumping into agent code before their data is ready. The data layer determines everything downstream.

Step 1: Get your source data right. Structure your source documents in Delta Lake tables with clean, complete metadata. Write meaningful descriptions for every column. The LLM uses column descriptions to select the right tool when deciding how to answer a query. Poor descriptions mean the agent picks the wrong data source.

Step 2: Create your Vector Search index. Connect it to your Delta table with real-time sync enabled. Choose your embedding model from Databricks Foundation Model APIs or bring your own. Let the sync run and verify that the index returns relevant results for representative queries before you build any agent logic.

Step 3: Register your tools in Unity Catalog. Write Unity Catalog functions for deterministic lookups (order status, account details, inventory levels). Write Vector Search queries as Unity Catalog functions that wrap the retrieval logic. Every tool the agent calls should be registered in Unity Catalog, not hardcoded in the agent script.

Step 4: Build your agent logic. Use the Mosaic AI Agent Framework SDK, LangChain, LangGraph, or LlamaIndex. Use the AI Playground in Databricks to prototype and test tool selection behavior before writing code. Log the agent to MLflow with log_model.

Step 5: Build your evaluation set. Create at least 100 question and answer pairs that cover the full range of queries your agent will handle. Include edge cases and questions where the right answer is "I don't know" or "I need more information." Run Agent Evaluation and review the traces for every failure. Iterate on retrieval logic, prompts, and tool definitions until quality scores hit your target.

Step 6: Register in Unity Catalog and deploy. Register the evaluated agent version in Unity Catalog. Deploy to a Model Serving endpoint with auto-scaling. Enable monitoring for latency, error rates, and quality drift. Connect your application via the REST API.

The temptation to skip step 5 is strong. The demo worked. The stakeholder is excited. Shipping feels more important than testing. Every well-documented enterprise deployment of Mosaic AI agents ran serious evaluation before going live. The ones that did not are the projects you never hear about because they quietly got shut down after production failures.

What changed in 2025 and 2026

The framework has shipped significant new capabilities in the last 12 months. Here is the summary of what is now available that was not there before.

Agent Bricks launched at Data and AI Summit 2025, introducing task-first agent building with automatic synthetic data generation and benchmark optimization. This is generally available now.

MLflow 3.0 launched at the same Summit with a redesigned GenAI-native architecture, cross-platform observability for agents outside Databricks, prompt versioning, and Deployment Jobs for quality-gated releases.

Managed MCP servers for Unity Catalog Functions, Genie, Vector Search, and DBSQL launched with enterprise governance built in.

Serverless GPUs became available in beta, removing the need to provision and manage GPU infrastructure for fine-tuning and inference workloads.

The CLEARS evaluation framework standardized how teams measure agent quality across Correctness, Latency, Execution, Adherence, Relevance, and Safety.

Agent Mode in Genie Spaces expanded single-turn data queries into multi-step reasoning and planning.

Lakebase integration gave agents persistent memory, conversation history, and long-running workflow state stored in the lakehouse.

Conclusion

Mosaic AI Agent Framework is the most complete enterprise agent platform available in 2026 for teams whose data lives in Databricks. It is not the fastest path from zero to a working prototype. LangChain is faster for that. It is the most direct path from a validated prototype to a production deployment that security, compliance, and operations will approve and support.

The framework's strength is the integration density. Every component, from Vector Search to MLflow tracing to Unity Catalog governance to AI Gateway to Model Serving, is designed to work together. You do not assemble a production-grade evaluation and governance system from separate tools. You build the agent, and the platform provides the rest.

The pattern that works consistently across the enterprise deployments documented here is the same: clean data in Delta Lake first, Vector Search for retrieval, Unity Catalog tools for deterministic lookups, MLflow tracing from day one, serious evaluation before production, and governance enforced through Unity Catalog and AI Gateway. Teams that follow this sequence ship reliable agents. Teams that skip steps two through five ship demos that become production incidents.

If your data is already in Databricks and you are building AI agents that real employees or customers will depend on, Mosaic AI Agent Framework is the platform to build on.

Reference links

  • Databricks Mosaic AI product page
  • Databricks: Build and deploy production-quality AI agent systems blog
  • Databricks: Announcing Mosaic AI Agent Framework and Agent Evaluation
  • Databricks: Agent Bricks launch press release
  • Databricks: Agent Bricks governed enterprise agent platform blog
  • Databricks: MLflow 3.0 announcement
  • Databricks: Mosaic AI Data and AI Summit 2025 announcements
  • Databricks: Managed MCP servers with Unity Catalog blog
  • Databricks: MCP and Agent Bricks accelerate AI development blog
  • Databricks: Agent governance capabilities blog
  • Databricks: Supervisor agent architecture blog with BASF Coatings case study
  • Databricks: Build an autonomous AI assistant tutorial
  • Databricks MLflow documentation
  • Databricks Mosaic AI GenAI capabilities documentation
  • Lucent Innovation: What Is the Mosaic AI Agent Framework on Databricks

On this page

What Mosaic AI Agent Framework actually isThe component stack, layer by layerDelta Lake and Unity Catalog: the data foundationMosaic AI Vector SearchMLflow: tracing, evaluation, and the deployment gateAgent Evaluation and the CLEARS frameworkModel Serving: from evaluation to endpointAI Gateway: the control plane for all AI accessAgent Bricks: the 2025 layer that changes the build experienceWhat Agent Bricks does differentlyThe managed agents inside Agent BricksGenie Agent ModeMCP integration: how Mosaic AI connects to the worldThe framework compatibility storyReal enterprise deploymentsMosaic AI vs LangChain: where each one winsBuilding your first agent: the sequence that worksWhat changed in 2025 and 2026ConclusionReference links

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
All posts

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
Krunal Kanojiya

Krunal Kanojiya

Technical Content Writer

Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.

GitHubLinkedIn

Related Posts

Databricks Lakehouse Fundamentals Guide for Beginners 2026

Apr 30, 2026 · 15 min read

Why 1M Tokens Is a Trap: The Hidden Cost of Long Context Windows

Apr 22, 2026 · 12 min read

Prompting, RAG, and In-Context Learning: Using LLMs in Real Products

Apr 20, 2026 · 11 min read