Skip to main content

An online professional reference on the use of generative artificial intelligence in legal practice.

Module 01

How AI Works

Models, products, and what each actually does.

Before building your own systems, it’s important to understand how AI works and its limits.

Module 1 covers:

1.11.2

How AI Works

A basic overview of how AI works, as well as in what capacities AI works well.

How AI Falters

An explanation of how the mechanisms discussed in (1.1) can and do cause AI to falter, and a brief discussion of what specific ways it does falter.

1.1 How AI Works and in What Capacities AI Works Well

How AI Predicts

You already understand the basic mechanism behind AI language models. You use a version of it every day in text message autocomplete.

When you type a text message, your phone suggests the next word. It does this by looking at what you have typed so far and predicting what is most likely to come next based on patterns it has learned. If you type “I’ll meet you at,” your phone might suggest “the,” “noon,” or “home.” But the phone doesn’t know your plans. Instead, it is recognizing patterns.

Large language models (LLMs) like ChatGPT and Claude do a similar thing at a vastly larger scale. They use tokens, “basic building blocks of text,” that include words, parts of words, and punctuation. LLMs predict the next “token” in a sequence, one token at a time, based on everything that came before it. The entire output—whether it is a paragraph, a memorandum, or a contract clause—is generated this way. Each token is chosen as the statistically most likely continuation given the preceding text.

Where should we begin?

The difference between your phone keyboard and ChatGPT is the amount of text the system was trained on and the underlying model’s sophistication. As Apple reports, your phone predicts the next word from “your past conversations, writing style, and even websites you visit in Safari.” More advanced language models are trained on billions of documents—books, articles, websites, legal opinions, contracts, legislation—and build a statistical representation of how language works across all these domains.

For example, Meta states Llama 3 was “pretrained on over 15T tokens” from publicly available sources, and that the data included over thirty languages and four times more code than Llama 2. The result is a system that produces remarkably fluent text.

Here are some of the datasets that models like Llama use:

Wikipedia
Articles on people, places, events, and concepts across many languages.
Project Gutenberg
Public-domain books, including classic literature and non-fiction.
arXiv
Open-access scientific and technical research papers.
Online news sites
Publicly available news and feature articles.
Technical documentation
Official docs and guides for programming languages, libraries, and platforms.
Open-source code repositories
Public source code, READMEs, and project documentation.
Stack Overflow
Technical questions paired with accepted answers.
General public websites
Public web pages, tutorials, and posts on a wide range of topics.

Using “Word Math” to Guide Language

Understanding word embeddings helps explain why AI can produce fluent text. Language models represent each word as a long list of numbers, called a vector. These numbers position the word in a high-dimensional “meaning space” where words with similar meanings end up close together.

Think of it like GPS coordinates. Washington, D.C., and New York City have similar latitude and longitude values because they are geographically close. London’s values are very different as it is far away. Word vectors work the same way, but in hundreds of dimensions instead of two. “Contract” and “agreement” sit close together in this space. “Contract” and “elephant” are far apart. A good way to understand this is by playing the game Semantle, which asks you to guess a word, giving you hints as to how similar your guesses are to the actual word in the vector space. The game is built using technology similar to LLMs.

Semantle game example showing word similarity guessing

Researchers discovered that you can do arithmetic in this space. Take the vector for “king,” subtract “man,” add “woman,” and the result points to “queen.” This is how AI handles synonyms, analogies, and related concepts—it has learned where words sit relative to each other in meaning space.

Legal language is full of near-synonyms and relational concepts. “Breach,” “violation,” and “noncompliance” cluster together. “Plaintiff” and “defendant” sit at a consistent distance from each other, similar to how “king” and “queen” do. The model internalizes these relationships from millions of sources. This is why it can generate text that sounds like it was written by a lawyer.

Tracking Meaning Through “Attention”

The architecture that powers modern language models is called a transformer. Before transformers, AI processed text one word at a time, left to right, and quickly lost track of what came earlier in the document. Transformers solved this with a mechanism called “attention”: the ability to focus on the most relevant parts of a text, regardless of their position.

Consider the sentence: “John wants his bank to cash the check.” The word “bank” could mean a financial institution or a riverbank. The word “his” could refer to anyone. Attention allows the model to look at the surrounding words—“cash” and “check”—to determine “bank” means financial institution. It can connect “his” back to “John”, even if other words intervene.

Attention Mechanism

Click any underlined word to see what the model attends to

"bank" → which sense?
  • cash41%
  • check36%
  • John8%
  • his5%
Attention lets the transformer connect tokens across the sentence to resolve meaning.

Legal documents are often long, and the meaning often depends on provisions that appear on earlier pages. Transformer technology can identify that a “notwithstanding” clause on page 12 modifies the general rule on page 3. This is a major reason legal AI tools are now practical.

The Training Stack

LLMs are built in three stages to help shape their behavior.

Stage 1

Pre-training

Learning the patterns of language

Pre-training is where the model learns language. It reads an enormous amount of text and learns the statistical relationships among words, sentences, and ideas—all by predicting the next word trillions of times, across billions of documents. This is where the model's general knowledge comes from. A model pre-trained on legal opinions, medical literature, and software documentation will generate text in all three domains.

Different AI labs make different choices at each training stage. This is why ChatGPT, Claude, and Gemini feel different from one another even though they share similar architectures.

1.2 How and in What Situations AI Falters

Pattern Prediction Isn’t ‘Understanding’

One important concept: AI does not “think,” nor does it “understand.” It does not know things the way you know things.

When you read a contract, you understand what it means. For example, you may reason about the parties’ intentions, identify ambiguities, and apply legal principles to reach conclusions. When an AI processes that same contract, it constructs a statistical model based on large amounts of text and predicts what words are likely to follow other words in that context.

When given a contract with a “liquidated damages” clause, an AI model does not know that the clause “caps” liability. It has learned a statistical association between those two concepts. The system has no goal beyond generating text that resembles what a human would write. Each word is selected based on probability, regardless of truth.

While useful, this mechanism also makes AI unreliable. Because the system predicts what is likely to come next, it has no internal check for truth.

The Confidence Gap

AI also tends to present its outputs with certainty, meaning that a correct legal analysis and a fabricated one look identical. The system generates confident-sounding text because that is what appears in its training data. Whether the substance is right or wrong is not factored into the prediction.

This is a structural feature of how the technology works—the model is designed to produce text that looks like it was written by a human. When asked to provide evidence for a false claim, the model, rather than checking itself, generates more plausible-sounding text to support the original claim. You can imagine how errors in an AI chatbot’s output can compound when it gives you incorrect information, and you continue to query it. The chatbot keeps confidently building on its initial mistake.

Right, Wrong, and Made Up: How does AI Hallucinate?

The main issue with AI in legal practice is hallucinations. One study, for example, found several ways that AI hallucinates when it comes to finding case law:

Output Types

The researchers manually coded every AI response on two dimensions. They first judged whether the substance of the answer was factually right; then, if the answer was right, whether the citation the tool gave actually supported the claim. This process produced six distinct labels.

1CorrectFactually right and actually answers the question asked+
2IncorrectContains factually wrong information+
3RefusalThe tool declines to answer or gives a reply that does not address the question+
4GroundedThe cited source is real AND actually supports the claim+
5MisgroundedThe cited source is real, but misinterprets the source+
6UngroundedNo citation is provided to back up the claim+

Outcome Types

The six output labels combine into three overall outcomes. The study used these outcomes to judge each response. They are what the researchers report when comparing how often each tool gives useful, trustworthy answers versus hallucinated or incomplete ones.

1AccurateThe answer is factually right AND backed by a real source that actually supports it+
2HallucinationThe answer is factually wrong, OR the citation does not support the claim+
3IncompleteThe response fails to give a usable, verifiable answer. The tool either refuses or provides no citation+
G1Fabricated case (no such decision exists)Weight15+
G2Real case attributed to wrong holding, circuit, or jurisdictionWeight5+
G3Quote or holding attributed to the wrong caseWeight2+
G4Pin-cite error or minor typoWeight1+

The Jagged Frontier

Another limitation is that AI capabilities do not perform uniformly across tasks. A model might draft a sophisticated contract provision flawlessly, and then fail on a straightforward factual question. The boundary between what AI can and cannot do is irregular and unpredictable. Researchers call this the “jagged technological frontier.”

This means you cannot generalize success from one task to another. Just because AI handled your last research query well does not mean it will handle the next one the same way. Each task requires its own verification.

Context Limits

AI also has certain limitations in accessing information.

AI does not have access to information you have not given it. Nor does it know your client’s facts, your firm’s prior work product, or the judge’s preferences, unless you provide that context.

AI cannot access documents behind paywalls, internal databases, or recent filings unless the tool has been specifically configured to retrieve them.

The model’s training data has a cutoff date. Anything that happened after that date does not exist in the model’s knowledge.

Most AI tools can now augment their context by searching the web (web search) or by pulling external documents into the model’s context (retrieval-augmented generation (RAG)).

These features help, but also introduce their own limitations. A retrieval system can miss relevant documents, return outdated ones, or surface results the model misinterprets. Module 5 covers how this technology works in more detail.