May 14th, 2026
likeheart2 reactions

Considering NL2SQL? Should your database really be the prompt? How can SQL MCP Server help?

Principal Program Manager

You’ve probably experienced both of these, perhaps at the same time. First, that desire to let an agent get at your data. It’s driven by simplification and better experiences for the user and for you: fewer screens, fewer queries, fewer reports, and less code overall.

Second, and perhaps more importantly, that unrelenting reluctance and reticence against all of it. The voice in your head that makes you seriously uneasy. Your reputation is at stake, yes, but to your enterprise, this might be existential.

It has a name: NL2SQL.

NL2SQL, or Natural Language to SQL, is where an AI agent turns a natural language prompt into a SQL query. Ask a question, run a query, get an answer. Simple.

But models can’t do this from just a prompt, they need context. This means you include your schema, relationships, table names, column names, and maybe sample rows. From there, the model attempts to infer the query from your database.

But is NL2SQL a good idea?

Perspectives

In this article, we’ll walk through most of the real-world considerations every enterprise will need to consider when implementing an agentic solution against production data. There’s always more to consider, but this is a nice baseline for a team starting to lean in to this new interaction model.

Schema completeness

Schemas are a reflection of your company’s culture. Here’s what I mean. Through evolution or just business change, a database schema reveals your business model, at least at the time the schema was created, various business applications, different integrated systems from mergers or upgrades, and the drift of best practices and team changes over time.

However, most schemas are not designed to explain. Cryptic table names and inconsistent column names are not wrong, just characteristics. Neither is it a problem when technically valid relationships can be semantically invalid or require additional, unpredictable predicates to be accurate. It’s also okay when column values are sensitive or bespoke. Schemas store data and protect integrity, they don’t communicate business rules.

For this reason, your prompt could be asking a model to infer things that are almost impossible. However, model behavior is the problem. They are not going to give up on your request. Typical model behavior is to try, query, and respond with a “best effort” approach to problem solving. Consider the risk of making business decisions from that.

About ontologies

There is renewed interest in data ontology. Much like previous efforts around data dictionaries, ontologies attempt to explain the data and the relationships between the data. This herculean task has an important goal in mind: explaining the data to models in a way that makes inference more successful. Time will tell if system support, enterprise adoption, and model behavior prove this renewed interest in the data graph worthwhile.

Ontologies describe the logical data, not how to query it.

What I mean is that ontologies, like data dictionaries, help explain the data without necessarily explaining the intricacies of JOIN semantics and filter requirements. For example, an ontology tells me a Salesman is in a Region, but it doesn’t tell me to filter the linking table based on the most recent row-version first to ensure imports from outside systems are properly reflected.

Ontologies have real value. But they aren’t the solution to NL2SQL.

Never repeat yourself?

When you add 1 and 1, you get 2 in every circumstance. When you test if 1 is equal to 1, you get a True in every circumstance. These are deterministic behaviors and they make SQL queries powerful tools to excavate data and answer questions.

But model behavior isn’t deterministic. It doesn’t return the same result every time. When you ask a model a question, it will generate a query. But when you ask the same model the same question about the same database, the next query can be different.

In fact, we should expect this behavior. Considering that a model is an inference engine calculating probabilities every turn, variabilities in similarity ranking alone can drive output down vastly different paths. This is indeterminism, and we want to solve it.

Solving indeterminism

The solution is simple: review every query.

It’s okay to allow models to generate a query. All we need to do is intercede before execution with some type of review process. After the user approves the query, the pipeline continues and the results can be returned.

But is that realistic? This assumes users are experts in both the data model and the query language. And should we allow the user to edit the query? How can we defend systems against this new form of SQL injection?

Perhaps worse, user review introduces a fresh vector for errors as users overwhelmed by query complexity and cognitive overload mistakenly approve invalid queries. A litany of biases emerges as the business starts to rely on those errors instead of rooting them out.

And is that practical? Agentic solutions should reduce cognitive load with less intervention and more agentic autonomy. Asking a user to review every query undermines the value of agentic systems and the investments being made into them.

Reviewing queries is an unrealistic and impractical solution to indeterminism.

Speaking of SQL injection

In app development, SQL injections are a classic bug. It happens when untrusted input is allowed to change the meaning of a query. Developers solved this with parameterization, validation, least privilege, and patterns that keep user input separate from executable SQL.

Models are not just filling in parameters. Models are creating the SQL query. That means the prompt, schema context, conversation history, and retrieved data can all influence what is executed. Moreover, so can the trained behavior of the model.

NL2SQL is the SQL injection of the agentic age.

Listen, we aren’t talking about big headline problems: some agent deletes the database, or some agent drops a table. We can solve most of that with thoughtful security policies. Instead, we are concerned about the subtle errors that allow queries to return conclusions that are wrong but presented with confidence.

This can put any business at risk: asking questions and getting incorrect results. I think dropping a table would be better, at least then we could see the problem. Incorrect results don’t send us looking for database backups. In fact, they don’t trigger any action. They are inaccurate values labeled as accurate ones.

Business decisions from invalid results can have significant consequences.

Perf problems

There are operational concerns, too. NL2SQL doesn’t just risk wrong answers; it can generate expensive queries. A syntactically valid query can scan too much data, join through the wrong path, ignore indexes, or create load patterns the system was never designed to handle.

Consider the covering index. This common practice intentionally aligns the scope of an index with the predicates in a recurring query. For example, if a system always filters by date, region, and owner, a covering index can intentionally include date, region, and owner in its scope. The result? Extremely fast queries.

But how can this work when every query is different?

Remember, a production database is typically a shared system. A bad query is not just a bad answer. It can become a reliability issue across the whole database. One filter too few, one join too many, or one misplaced aggregate can impact database compute, memory, or even system reliability.

Irrelevant inferences

There is also the real problem of leakage through inference. Even when the model cannot directly query data, restricted columns, schema names, table and column names, errors, and partial results can reveal the underlying structure, context, and backend system.

The issue isn’t, “Can the agent read the column?” It’s, “What can the agent infer from the shape of our system?” Table and column names, relationships, and conventions can reveal more than intended. Based on these, for example, the model can infer your organization uses Dynamics, Salesforce, Workday, or SAP.

Why does it matter? Because this type of inference can change model behavior, shape its assumptions, or disclose unexpected details you never meant to share. It’s a key driver behind systems like Data API builder (DAB) that use abstractions: intermediary schemas that can occlude details unrelated to the immediate business scenario.

Trouble testing

Testing is harder, too. A fixed API contract can be tested. A generated query is harder to certify because the query is created at runtime. You can test examples, but you can’t exhaustively test every query the model might generate.

Custody concerns

And finally, there is ownership. When a generated query is wrong, who owns the failure? The model? The developer? The DBA? The user who invoked it? When production systems need accountable boundaries, runtime-generated SQL makes those harder to see.

The right place for NL2SQL

Don’t read this wrong, NL2SQL isn’t useless. In fact, it can be incredibly valuable outside agentic systems that operate against production data to make business decisions.

The best use case is the developer’s inner loop. This loop describes the developer’s cycle of writing, testing, observing the result, and making the next change. For a database, the developer loop is when the schema is created, tables are altered, relationships are mapped, queries are tested, and assumptions are validated. NL2SQL can greatly accelerate the work.

Development isn't production

Remember, during the developer loop, the developer is still in control. There is no production data, there is no production system, and the decisions being made are procedural, architectural, and structural. It’s development.

There’s another use case: information discovery. The goal of information discovery is exploration, not operation. It’s production data, or at least a subset of it, where the user wants to find patterns, compare loosely related data, or ask broad questions that don’t fit a predefined CRUD operation.

Information discovery is the most open-ended scenario for an agent, like letting a hunting dog out into a field to sniff out birds. The difference is that this discovery is just an indicator of places in your data that warrant further investigation. Information discovery is the Shangri-La of agentic data work, pounding against the data to find something new.

Even in information discovery, however, it can be critical to give guidance to the model beyond the schema itself. You want reliable and valid information to be discovered, something you can drive with operational constraints inside specific guardrails. This distinction matters. NL2SQL may help you explore data, even discover new insights. However, this should not be conflated with safe architecture for agents operating against business systems and production data stores.

Agentic success

Introducing SQL MCP Server, an agentic interface built for safe production data interactions. With guardrails, contracts, and semantic enhancements, SQL MCP Server gives enterprises a safe approach to expose production data to agents.

Feature 1: abstraction

SQL MCP Server doesn’t require you to expose the database as-is. It exposes entities that you configure, abstractions of tables, views, and stored procedures on a curated agentic surface with names, permissions, fields, and operations controlled by configuration.

That abstraction is important. The database remains optimized for storage, performance, and even the application, while agents get a surface designed for safe understanding.

Example configuration setup:

dab add SeriesActors \
  --source dbo.SeriesActors \
  --rest.path "/cast" 

Feature 2: description

Agents need more than names. A table name might be accurate and still not explain what the data means. SQL MCP Server supports semantic descriptions for the server itself, configured entities, columns, and parameters so models have guidance beyond the raw schema. These contribute to agentic understanding of intent, not just structure.

Example configuration setup:

dab add SeriesActors \
  --source dbo.SeriesActors \
  --rest.path "/cast" \
  --description "Actors by Star Trek series. Use this instead of joining Actor, Character, and Series tables."

Feature 3: authorization

SQL MCP Server is role aware. The agent doesn’t discover every table, field, or operation. It only sees what the current role is allowed to use. This is enabled without requiring any security or structural changes to your database.

This matters because production data access is never just a query problem. It’s a permissions problem. A safe agentic interface must enforce who can read, create, update, delete, or execute before the model ever gets a chance to act.

Example configuration setup:

dab add SeriesActors \
  --source dbo.SeriesActors \
  --permissions "authenticated:create,read" \
  --rest.path "/cast" \
  --description "Actors by Star Trek series. Use this instead of joining Actor, Character, and Series tables."

Feature 4: API contract

SQL MCP Server gives agents a contract instead of a blank query window. The agent can discover what it’s allowed to do, which entities are available, which fields exist, and which operations are permitted.

That contract is stable, testable, and enforceable.

Example MCP header request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "describe_entities",
    "arguments": {}
  }
}

Feature 5: query builder

SQL MCP Server doesn’t ask the model to invent SQL. The model supplies intent through a constrained tool call, and the engine builds the query deterministically.

That means the generated SQL follows the configured contract every time. The agent can still answer useful questions, but the database is no longer the prompt.

Example inbound request:

{
  "entity": "SeriesActors",
  "select": "Actor,Series",
  "filter": "Series eq 'Star Trek: Voyager'",
  "orderby": [ "Actor asc" ],
  "first": 5
}

Example backend query:

SELECT TOP (5) [Actor], [Series]
FROM [dbo].[SeriesActors]
WHERE [Series] = @param0
ORDER BY [Actor] ASC;

Feature 6: observability

Production systems need logs, traces, health checks, and operational visibility. If agents are acting against data, teams need to know what they called, when they called it, and whether it succeeded.

Observability helps teams inspect agent behavior, troubleshoot failures, and build confidence that production data access is operating as intended.

Where does this leave us?

NL2SQL is tempting because it makes a hard problem look simple. But production data is not a blank canvas, and a database schema is not a safe prompt. The closer agents get to business systems, the more the architecture matters.

That doesn’t mean NL2SQL is bad. It means we need to be precise about where it belongs. Use it in the developer loop. Use it for exploration. Use it where the cost of being wrong is manageable and the human remains in control.

But when agents operate against production data, especially when they read, create, update, delete, or execute, they need more than generated SQL. They need abstraction, authorization, contracts, descriptions, deterministic query generation, and observability.

That’s the point of SQL MCP Server. It gives agents a way to work with data without making the database the prompt.

Author

Jerry Nixon
Principal Program Manager

SQL Server Developer Experience Program Manager for Data API builder.

0 comments