Before building your own systems, it’s important to understand how AI works and its limits.
Module 1 covers:
How AI Works
A basic overview of how AI works, as well as in what capacities AI works well.
How AI Falters
An explanation of how the mechanisms discussed in (1.1) can and do cause AI to falter, and a brief discussion of what specific ways it does falter.
1.1 How AI Works and in What Capacities AI Works Well
How AI Predicts
You already understand the basic mechanism behind AI language models. You use a version of it every day in text message autocomplete.
When you type a text message, your phone suggests the next word. It does this by looking at what you have typed so far and predicting what is most likely to come next based on patterns it has learned. If you type “I’ll meet you at,” your phone might suggest “the,” “noon,” or “home.” But the phone doesn’t know your plans. Instead, it is recognizing patterns.
Large language models (LLMs) like ChatGPT and Claude do a similar thing at a vastly larger scale. They use tokens, “basic building blocks of text,” that include words, parts of words, and punctuation. LLMs predict the next “token” in a sequence, one token at a time, based on everything that came before it. The entire output—whether it is a paragraph, a memorandum, or a contract clause—is generated this way. Each token is chosen as the statistically most likely continuation given the preceding text.
The difference between your phone keyboard and ChatGPT is the amount of text the system was trained on and the underlying model’s sophistication. As Apple reports, your phone predicts the next word from “your past conversations, writing style, and even websites you visit in Safari.” More advanced language models are trained on billions of documents—books, articles, websites, legal opinions, contracts, legislation—and build a statistical representation of how language works across all these domains.
For example, Meta states Llama 3 was “pretrained on over 15T tokens” from publicly available sources, and that the data included over thirty languages and four times more code than Llama 2. The result is a system that produces remarkably fluent text.
Here are some of the datasets that models like Llama use:
Using “Word Math” to Guide Language
Understanding word embeddings helps explain why AI can produce fluent text. Language models represent each word as a long list of numbers, called a vector. These numbers position the word in a high-dimensional “meaning space” where words with similar meanings end up close together.
Think of it like GPS coordinates. Washington, D.C., and New York City have similar latitude and longitude values because they are geographically close. London’s values are very different as it is far away. Word vectors work the same way, but in hundreds of dimensions instead of two. “Contract” and “agreement” sit close together in this space. “Contract” and “elephant” are far apart. A good way to understand this is by playing the game Semantle, which asks you to guess a word, giving you hints as to how similar your guesses are to the actual word in the vector space. The game is built using technology similar to LLMs.

Researchers discovered that you can do arithmetic in this space. Take the vector for “king,” subtract “man,” add “woman,” and the result points to “queen.” This is how AI handles synonyms, analogies, and related concepts—it has learned where words sit relative to each other in meaning space.
Legal language is full of near-synonyms and relational concepts. “Breach,” “violation,” and “noncompliance” cluster together. “Plaintiff” and “defendant” sit at a consistent distance from each other, similar to how “king” and “queen” do. The model internalizes these relationships from millions of sources. This is why it can generate text that sounds like it was written by a lawyer.
Tracking Meaning Through “Attention”
The architecture that powers modern language models is called a transformer. Before transformers, AI processed text one word at a time, left to right, and quickly lost track of what came earlier in the document. Transformers solved this with a mechanism called “attention”: the ability to focus on the most relevant parts of a text, regardless of their position.
Consider the sentence: “John wants his bank to cash the check.” The word “bank” could mean a financial institution or a riverbank. The word “his” could refer to anyone. Attention allows the model to look at the surrounding words—“cash” and “check”—to determine “bank” means financial institution. It can connect “his” back to “John”, even if other words intervene.
Attention Mechanism
Click any underlined word to see what the model attends to
- cash41%
- check36%
- John8%
- his5%
Legal documents are often long, and the meaning often depends on provisions that appear on earlier pages. Transformer technology can identify that a “notwithstanding” clause on page 12 modifies the general rule on page 3. This is a major reason legal AI tools are now practical.
The Training Stack
LLMs are built in three stages to help shape their behavior.
Pre-training
Learning the patterns of language
Different AI labs make different choices at each training stage. This is why ChatGPT, Claude, and Gemini feel different from one another even though they share similar architectures.
1.2 How and in What Situations AI Falters
Pattern Prediction Isn’t ‘Understanding’
One important concept: AI does not “think,” nor does it “understand.” It does not know things the way you know things.
When you read a contract, you understand what it means. For example, you may reason about the parties’ intentions, identify ambiguities, and apply legal principles to reach conclusions. When an AI processes that same contract, it constructs a statistical model based on large amounts of text and predicts what words are likely to follow other words in that context.
When given a contract with a “liquidated damages” clause, an AI model does not know that the clause “caps” liability. It has learned a statistical association between those two concepts. The system has no goal beyond generating text that resembles what a human would write. Each word is selected based on probability, regardless of truth.
While useful, this mechanism also makes AI unreliable. Because the system predicts what is likely to come next, it has no internal check for truth.
The Confidence Gap
AI also tends to present its outputs with certainty, meaning that a correct legal analysis and a fabricated one look identical. The system generates confident-sounding text because that is what appears in its training data. Whether the substance is right or wrong is not factored into the prediction.
This is a structural feature of how the technology works—the model is designed to produce text that looks like it was written by a human. When asked to provide evidence for a false claim, the model, rather than checking itself, generates more plausible-sounding text to support the original claim. You can imagine how errors in an AI chatbot’s output can compound when it gives you incorrect information, and you continue to query it. The chatbot keeps confidently building on its initial mistake.
Right, Wrong, and Made Up: How does AI Hallucinate?
The main issue with AI in legal practice is hallucinations. One study, for example, found several ways that AI hallucinates when it comes to finding case law:
Output Types
The researchers manually coded every AI response on two dimensions. They first judged whether the substance of the answer was factually right; then, if the answer was right, whether the citation the tool gave actually supported the claim. This process produced six distinct labels.
Outcome Types
The six output labels combine into three overall outcomes. The study used these outcomes to judge each response. They are what the researchers report when comparing how often each tool gives useful, trustworthy answers versus hallucinated or incomplete ones.
The Jagged Frontier
Another limitation is that AI capabilities do not perform uniformly across tasks. A model might draft a sophisticated contract provision flawlessly, and then fail on a straightforward factual question. The boundary between what AI can and cannot do is irregular and unpredictable. Researchers call this the “jagged technological frontier.”
This means you cannot generalize success from one task to another. Just because AI handled your last research query well does not mean it will handle the next one the same way. Each task requires its own verification.
Context Limits
AI also has certain limitations in accessing information.
AI does not have access to information you have not given it. Nor does it know your client’s facts, your firm’s prior work product, or the judge’s preferences, unless you provide that context.
AI cannot access documents behind paywalls, internal databases, or recent filings unless the tool has been specifically configured to retrieve them.
The model’s training data has a cutoff date. Anything that happened after that date does not exist in the model’s knowledge.
Most AI tools can now augment their context by searching the web (web search) or by pulling external documents into the model’s context (retrieval-augmented generation (RAG)).
These features help, but also introduce their own limitations. A retrieval system can miss relevant documents, return outdated ones, or surface results the model misinterprets. Module 5 covers how this technology works in more detail.