Back to Insights
Technical

Legacy Retrofitting vs AI-Native Architecture: When to Patch and When to Rebuild

13 min read
RETROFITLLMtext-to-SQL!opaque rows, no meaningAI-NATIVELLMGraphRAGentities, relationships, meaningPatch the schema, or give the model a map of the business?
A highly technical comparison of two paths to enterprise AI: bolting an LLM onto your existing relational databases, or building a semantic knowledge graph layer designed for machines to reason over. We explain exactly why text-to-SQL and naive RAG fail in production, how a knowledge graph and GraphRAG change the game, and a system-by-system framework for deciding when to patch and when to rebuild.

Key Takeaways

  • Most enterprise AI failures are not model failures, they are data architecture failures. An LLM is only as capable as the structure of the data it can actually reach and reason over.
  • A relational schema is optimised to record transactions efficiently, not to express meaning. It encodes how to store a sale, not what a high-value at-risk customer is, so the model has nothing meaningful to read.
  • Text-to-SQL demos beautifully and breaks at the join: ambiguous column names, undocumented foreign keys, and business logic that lives in application code rather than the database produce confidently wrong answers.
  • RAG over your documents inherits the same flaw. Vector similarity retrieves passages that sound related but has no model of how entities actually connect, so multi-hop questions fall apart.
  • A semantic knowledge graph layer sits above your systems of record and makes entities, relationships and an ontology explicit, turning opaque rows into a queryable map of your business.
  • GraphRAG, retrieval that traverses a knowledge graph instead of pulling loose text chunks, is what closes the gap, because it returns connected facts rather than similar-sounding ones.
  • The decision is rarely all-or-nothing. A semantic layer wrapped around legacy systems, the strangler fig pattern, usually beats both a naive retrofit and a high-risk big-bang rebuild.
Every enterprise AI project starts with the same instruction from the board: add AI. The fastest-looking route is always the same too. Bolt a large language model onto the systems you already have, point it at your database, and let it answer questions in plain English. The demo works. The pilot impresses the steering committee. Then it meets real data at real scale and quietly falls apart. Understanding precisely why is the difference between an AI initiative that compounds and one that becomes an expensive cautionary tale.
This is the central architecture decision of the next five years, and it is not really about models. It is about the substrate underneath them. The question is whether you retrofit AI onto a legacy data estate built for a pre-AI world, or whether you build, or migrate toward, an architecture designed for machines to reason over. Patch or rebuild. Get this wrong and no amount of prompt engineering or model upgrades will save the project.
This article is a technical comparison, written for engineering leaders and the executives who fund them. We will look at exactly why layering an LLM on top of an old relational database fails, what a semantic knowledge graph layer does differently, and how to decide, system by system, when patching is the right call and when it is throwing good money after bad.

AI Strategy Is Now a Data Architecture Decision

The frontier model is rapidly becoming a commodity. Anyone can call a state-of-the-art LLM through an API for a few pence per request. That means the model is not your competitive advantage and, increasingly, not your problem. The differentiator is what the model can see and reason over: the structure, accessibility and meaning of your data.
An LLM with no reliable access to your business data is a very articulate graduate who has read the entire public internet but none of your files, contracts, or systems. It sounds confident and knows nothing specific to you. The entire engineering challenge of enterprise AI is closing that gap, and the gap is an architecture problem, not a model problem. This is the difference we have written about before between an AI-enhanced bolt-on and a genuinely AI-native build.

What Legacy Retrofitting Actually Looks Like

Legacy retrofitting means keeping your existing relational databases, ERPs and CRMs exactly as they are, and adding an LLM access layer on top. Nothing underneath changes. The appeal is obvious: it is cheap, it is fast, it requires no migration, and it puts no system of record at risk. For a CTO under pressure to ship something this quarter, it is the path of least resistance.
In practice, retrofitting shows up as one of two dominant patterns, often both at once.
  • Text-to-SQL. The LLM is handed your database schema and asked to translate natural-language questions into SQL queries, run them, and summarise the result. "How much did we sell in the North region last quarter" becomes a generated SELECT statement.
  • RAG over documents. You export your documents, wiki pages and knowledge base, embed them into a vector database, retrieve the chunks most similar to a question, and stuff them into the model's prompt as context. RAG stands for retrieval-augmented generation, the standard technique introduced in the 2020 RAG paper by Lewis et al.
Both patterns share a hidden assumption: that the meaning your AI needs is already present in the data, waiting to be read. That assumption is where everything breaks.

Why Layering an LLM on a Legacy Database Fails

The schema encodes transactions, not meaning

A relational schema was normalised decades of best practice ago to do one thing well: record events efficiently and avoid duplicating data. It has tables like orders, line_items, customers and addresses, linked by keys. It tells a machine how to store a transaction. It says almost nothing about what a high-value at-risk customer is, because that concept lives in an analyst's head and a dashboard somewhere, not in any column. The LLM, meanwhile, sees a table named t_cust_mst and a flag where status_cd equals 7, and has no idea that this means a churned enterprise account.

Text-to-SQL breaks at the join

A model can write SELECT star FROM customers in its sleep. It struggles the moment a real business question requires joining six tables through foreign keys nobody documented, where the same concept, say revenue, is computed three different ways in three different systems, and where half the business logic lives in application code or stored procedures rather than in the schema at all. The model cannot see logic it was never shown.
The result is the most dangerous failure mode in all of enterprise AI: a query that runs successfully and returns a confidently wrong number. An outright error gets caught. A plausible but incorrect figure gets pasted into a board deck. The retrofit does not fail loudly; it fails silently, which is far worse.

The meaning lives in tribal knowledge

In most legacy estates, the real semantics are undocumented. Which status code means churned. Why two customer tables exist and which one is authoritative. Which of the four amount fields is the one finance actually trusts. Why a particular join silently drops refunded orders. None of that is written in the database for an LLM to read, because it was never written down anywhere. Retrofitting assumes the meaning is in the data. In a legacy system, it usually is not.
A diagram of the retrofit stack: an LLM connected via text-to-SQL to a pile of flat relational tables, with red markers at the join, the hidden business logic, and the undocumented schema
The retrofit stack fails at three predictable points: the join, the hidden logic, and the undocumented meaning.

RAG over documents inherits the same flaw

The common response is to reach for RAG over your unstructured content. This genuinely helps with policy documents and knowledge bases, but it does not solve the underlying problem, because vector similarity only finds passages that sound related to the question. It has no model of how facts actually connect.
Ask which suppliers are exposed if a particular factory closes, and a similarity search dutifully returns documents that mention factories and suppliers. It cannot traverse the actual factory-to-supplier-to-contract relationships, because it never represented them in the first place. It retrieves things that are textually similar, not things that are causally or structurally connected. We unpack this limitation in depth in our guide to advanced RAG, long context windows and GraphRAG.

The Semantic Knowledge Graph Layer Changes the Game

From rows to relationships

A knowledge graph represents data as entities, the nodes, and relationships, the edges, each with explicit and typed meaning. Instead of a customers table and an orders table joined by an opaque key, you have a Customer node connected to an Order node by a PLACED relationship, that Order connected to a Product by CONTAINS, and that Product connected to a Supplier by SUPPLIED_BY. The connections are first-class, named, and directly queryable. The structure itself carries meaning that a flat table never could.

The ontology is the missing meaning

On top of the graph sits an ontology: a formal, machine-readable definition of the entity types, relationship types and rules in your domain. This is where high-value at-risk customer finally becomes something a machine can reason about, defined explicitly in terms of relationships and properties rather than buried in a BI tool or an analyst's saved query. The ontology is your institutional knowledge, the tribal knowledge from the previous section, made explicit. It is the single most valuable artefact in an AI-native architecture, and it is a business exercise as much as a technical one.

GraphRAG: retrieval that follows connections

GraphRAG is RAG where the retrieval step traverses a knowledge graph instead of pulling loose text chunks. Microsoft Research and others have shown it substantially outperforms naive vector RAG on questions that require connecting multiple facts. Instead of returning a bag of similar-sounding paragraphs, it retrieves a connected subgraph: the relevant entity, its neighbours, and the relationships between them. The LLM then reasons over structured, relevant, connected context.
This is the shift that changes everything. The LLM stops being a query-writer guessing at an opaque schema and becomes a reasoning engine operating over an explicit map of your business. The graph handles the connections and the meaning. The model handles language and inference. Each part does what it is genuinely good at, instead of asking the model to reverse-engineer thirty years of undocumented database decisions on the fly.
A comparison of vector RAG returning three disconnected similar-looking text chunks versus GraphRAG returning a connected subgraph of entities and typed relationships
Vector RAG retrieves things that sound similar. GraphRAG retrieves things that are actually connected.

Legacy Retrofitting vs AI-Native Architecture: The Comparison

Laid side by side, the two approaches differ on almost every dimension that matters in production.
DimensionLegacy Retrofit (LLM on relational DB)AI-Native (semantic graph layer)
Data modelFlat tables, opaque keysTyped entities and relationships
Where meaning livesIn people's heads and BI toolsIn an explicit, machine-readable ontology
Retrieval methodGenerated SQL or vector similarityGraphRAG over a connected subgraph
Multi-hop questionsFails or returns wrong answersHandled natively by traversal
Typical failure modeSilent, confidently wrongExplicit gaps in the model, easier to spot
Cost to startLowModerate (the ontology is real work)
Cost to scale and trustHigh and risingAmortises across use cases
Best forShallow, single-table questionsConnected, high-stakes reasoning

When to Patch and When to Rebuild

None of this means a knowledge graph is always the answer. Architecture is about tradeoffs, and a semantic layer is real work that you should not undertake for a problem that does not need it. The honest answer is that it depends on the question you are trying to answer, and it varies system by system.

Patch (retrofit) when

  • The questions are shallow and mostly single-table, such as "what was revenue last month".
  • The data is already clean, well-documented, and limited in scope.
  • You need a proof of value this quarter and your tolerance for the occasional wrong answer is high.
  • The system is genuinely simple, and the cost of a mistake is low.

Rebuild, or add a semantic layer, when

  • Questions require connecting data across multiple systems, the multi-hop case where retrofits collapse.
  • The meaning is undocumented and lives in the heads of a few key people.
  • Wrong answers are expensive, as in finance, healthcare, compliance, or supply chain.
  • You are building many AI use cases on the same data, so a semantic layer amortises across all of them instead of being rebuilt each time.

The strangler fig: the usual right answer

In reality you rarely rip out a system of record, and you should be suspicious of anyone who suggests a big-bang rebuild. The pragmatic path is the strangler fig pattern, Martin Fowler's term for incrementally wrapping a legacy system with a new one until the old one can be quietly retired. You wrap your existing databases with a semantic graph layer that syncs from them and grows over time, taking on more of the meaning while the legacy systems keep running underneath as sources of truth. You neither freeze on a broken retrofit nor bet the company on a multi-year migration.

A Practical Migration Path

If the semantic layer is the destination, here is the route that consistently works in practice. Note that the first step is not technical.
  1. Map the questions, not the tables. Start from the high-value questions the business actually needs answered, especially the multi-hop ones the current stack cannot handle. The questions define the scope; the schema does not.
  2. Model the entities and write the ontology. Define the entities, relationships and rules those questions require. This is the core work, and it is where tribal knowledge becomes an explicit, durable asset.
  3. Build the graph layer over your systems of record. The graph syncs from the legacy databases rather than replacing them, so nothing of record is put at risk.
  4. Put GraphRAG between the LLM and the graph. Retrieval traverses the graph and returns connected subgraphs, giving the model structured context instead of guessed SQL.
  5. Expand use case by use case. Let the semantic layer strangle the old access patterns gradually, proving value at each step rather than asking for a blank cheque up front.
This is exactly the sequencing we use when building AI agents over an existing stack, and it sits at the heart of any credible roadmap for transforming a legacy business rather than just decorating it with a chatbot.

Conclusion: The Model Is Not Your Moat

The uncomfortable truth for most AI programmes is that the model is neither your competitive advantage nor your real problem. Your data architecture is both. A naive retrofit does not fail because the LLM is weak; it fails because the model is being asked to reason over a structure that never encoded meaning in the first place. You can swap in a smarter model every six months and still get confidently wrong answers, because the gap is structural.
A semantic knowledge graph layer gives the model an explicit map of your business, and GraphRAG lets it follow the connections that actually matter. The decision is not ideological and not all-or-nothing. Patch the systems that are shallow and well understood. Add a semantic layer where the meaning is connected, undocumented, and expensive to get wrong. If you are about to point an LLM at thirty years of legacy schema and hope, that is precisely the moment to stop and decide, deliberately, whether you are patching or rebuilding.

Frequently Asked Questions

Can I just connect an LLM like ChatGPT to my SQL database?
You can, and it will look impressive in a demo. In production it fails on real questions, because the model has to write SQL against a schema it does not understand, joining tables through undocumented keys where the same concept is calculated differently in different systems. The queries often run and return numbers that are quietly wrong, which is more dangerous than an outright error. Connecting the LLM is the easy 20 percent; making it reason correctly over your data is the hard 80 percent.
What is a semantic knowledge graph in plain terms?
It is a way of storing data as entities and the named relationships between them, rather than as isolated rows in tables. Instead of a customers table and an orders table joined by an opaque key, you have a Customer node connected to an Order by a PLACED relationship, and that Order connected to a Product by CONTAINS. On top sits an ontology, a formal definition of your entity and relationship types, which is where institutional meaning finally becomes machine-readable.
What is GraphRAG and how is it different from normal RAG?
Standard RAG embeds your documents into a vector database and retrieves the chunks most similar to the question. GraphRAG instead retrieves from a knowledge graph, pulling back a connected subgraph: the relevant entity, its neighbours, and the relationships between them. The difference matters most on multi-hop questions that require connecting several facts, where vector similarity returns related-sounding text but cannot traverse the actual connections.
Do I have to rip out and rebuild my database to go AI-native?
Usually not. The pragmatic path is the strangler fig pattern: you wrap your existing systems of record with a semantic graph layer that syncs from them and grows over time, rather than replacing them in one risky migration. The legacy databases keep running underneath as sources of truth while the graph layer takes on more of the meaning and querying. You rebuild the access and reasoning layer, not necessarily the storage.
Why does text-to-SQL work in a demo but fail in production?
Demos use clean, small, well-named schemas and simple single-table questions. Production estates have cryptic column names, multiple tables for the same concept, foreign keys nobody documented, and critical business logic buried in application code or stored procedures. The model cannot see that hidden meaning, so as soon as a question needs a real multi-table join it generates plausible SQL that returns the wrong result.
Is a knowledge graph the same as a vector database?
No, and they solve different problems. A vector database stores numerical embeddings and finds items by similarity, which is good for fuzzy text search. A knowledge graph stores explicit, typed relationships and is good for traversing connections and answering structured, multi-hop questions. Mature AI-native architectures often use both: vectors for similarity, the graph for meaning and connections, with GraphRAG combining them.
How long does it take to add a semantic layer to a legacy system?
A focused first use case, modelling the entities and relationships behind a handful of high-value questions and wiring GraphRAG over them, is typically a matter of weeks rather than months. The work is less about technology and more about making tribal knowledge explicit in the ontology. Because the strangler fig approach is incremental, you get value from the first use case and expand the graph one domain at a time rather than waiting for a multi-year rebuild.