What is the best way to prompt AI for legal work?

Structured prompts outperform persona-based prompts. Define the task clearly, specify the output format, provide relevant context such as jurisdiction and facts, and state what the output will be used for.

Why does asking AI to act as a lawyer not work well?

Persona prompts tell the AI to roleplay as a lawyer, but the model produces plausible-sounding output without real legal accuracy. Structured task prompts are more reliable.

How do I get consistent AI outputs for legal drafting?

Consistency comes from clear, specific prompts defining the task, context, format, and constraints. Providing examples of the output you want and reusing proven prompts also improves consistency.

Prompting AI for Legal Work: Why Persona Fails and Structure Works

by Alixe Cormick | Apr 20, 2026

The Prompt that Sounds Sophisticated

When a senior lawyer types "You are a brilliant, cynical litigator who wants to impress the managing partner," the prompt can look more sophisticated than it is. A character has been assigned, a tone has sharpened, and the answer may even sound more intelligent because it sounds more forceful. Usually, however, the model's basic behaviour does not change much.

The model may still soften criticism, hedge conclusions, reassure before delivering bad news, and describe problems at a level of generality that avoids identifying the actual point of failure. The system has adopted a voice without changing its working method.

Many professionals, including many lawyers, approach generative AI as though it were a person who needs the right frame of mind. They prompt it the way they might brief a junior associate or outside consultant: set the scene, explain the stakes, invoke status, apply pressure, ask for brutal honesty, and try to create a mood that will produce better work.

That instinct is understandable. With human beings, it can work. With a large language model, it often does not.

Why Lawyers Reach for Persona Prompts

Lawyers are trained to think in roles: opposing counsel, trial judge, securities regulator, tax auditor, hostile bidder, disappointed investor. One of the profession's ordinary habits is to test a document by shifting standpoint and asking how it looks from the other side. That is sound practice.

It is therefore natural to assume the same method should work with AI. If you want a more rigorous review of a factum, tell the system to act like aggressive opposing counsel. If you want cleaner drafting, tell it to act like a demanding partner. If you want to identify flaws in a board memo, tell it to approach the task like a skeptical regulator or forensic reviewer.

There is some value in role definition, but only when the role is expressed as a function. "Opposing counsel whose job is to defeat this document" can be useful because it tells the system what function to perform. The trouble starts when role turns into theatre. Calling the system brilliant, cynical, highly motivated, or eager to impress does not tell it what to do in a way that materially changes the answer. It supplies atmosphere rather than method.

That matters in legal work because the profession rightly values polished language, measured tone, and apparent completeness. Those are valuable features in human legal writing. They are also some of the qualities that make weak AI output look deceptively acceptable.

A Large Language Model is Not a Junior Associate

A junior associate has career concerns, reputational incentives, situational awareness, and real consequences for poor work. A junior lawyer may respond to pressure, pride, competition, or a desire not to disappoint. If a partner says "I need you to be ruthless with this draft," the associate understands that instruction against a background of hierarchy, judgment, and professional accountability.

A large language model has none of that. It has no career, no professional anxieties, and no reputational stake in the quality of its answer. It has no ambition to be right.

What it does have are defaults, shaped to produce answers that sound helpful, cooperative, balanced, and safe. In many contexts that is useful. In adversarial review, risk analysis, due diligence, litigation testing, and contract attack, those defaults can get in the way.

This is why persona prompts so often disappoint sophisticated users. The answer sounds different while the system's core tendencies remain active unless they are displaced with precise instructions. The result is often a response that looks tougher on the surface while preserving the same habits underneath: hedging, softening, preferring broad observations to pointed commitments, and avoiding any direct statement along the lines of "this clause is your real problem, and this wording is why."

For lawyers, that failure mode is worse than an answer that is plainly weak. Obvious weakness is easier to catch. Polished ambiguity often passes internal review because it sounds responsible.

What Actually Changes AI Behaviour

Better output usually comes from disciplined instruction-writing, not prompt theatre. The same habits that improve directions to a junior lawyer, expert, investigator, or service provider also improve prompts to a model: define the function, specify the task, impose constraints, require an answer in a form you can use.

In practice, three things do most of the work.

1. Structural constraints on the output

Tell the model not only what you want but how the answer must be organized. If the most dangerous issue must come first, say so. If the findings must be ranked in descending order of severity, require that sequence. If each point must be tied to exact contractual language, make quotation mandatory. If the answer should begin with a commitment rather than a summary, state that expressly. Structure is behavioural control.

Lawyers already know this in analogous settings. A request for "comments on this agreement" invites drift. A request for "the three most serious termination risks, ranked by severity, each tied to the exact wording that creates the issue" is much harder to answer vaguely. That instruction forces prioritization and pushes the system toward decisions rather than impressions.

This matters because many AI answers default to a warm-up paragraph, a balanced overview, and an evenly weighted list of issues, all of which can sound thoughtful while delaying the point the lawyer actually needed.

2. Explicit suppression of default tendencies

Large language models tend to be agreeable, balanced, and helpful. They often avoid sustained criticism unless the user makes criticism procedurally unavoidable. In legal work, that means naming the behaviours you do not want and prohibiting them directly.

For example:

Do not soften criticism with praise.
Do not offer drafting fixes unless asked.
Do not describe a defect without identifying the exact language at issue.
Do not use balancing phrases that retreat from a conclusion once stated.
Do not spread attention evenly across minor and major issues.

Those are instructions a model can act on. "Be brutal," "be honest," and "do not pull your punches" may affect tone but rarely change the model's decision pattern in a reliable way. A behavioural prohibition is more useful than an emotional appeal.

Weak AI output in legal work often traces to unaddressed defaults, not to the wrong choice of persona.

3. A clear operational definition of success

"Give me your best analysis" is not a standard. "Your answer fails unless it identifies the single most dangerous vulnerability and quotes the wording that creates it" is a standard. When success is defined by observable features rather than flattering adjectives, output becomes more consistent.

A great deal of prompt language is evaluative rather than operational. Words like "brilliant, thorough, incisive, strategic, aggressive" express aspiration. None of them tells the model what a successful answer must contain.

Lawyers already work with operational definitions elsewhere: a due diligence checklist defines what must be reviewed, a closing agenda defines what must be delivered, a filing requirement defines what must be included. Prompting works better when it follows the same discipline.

Two Versions of the Same Request

The difference is clearer when the task stays the same.

Version one: "You are a brilliant and skeptical opposing counsel. Review this brief as if you were trying to defeat it. Be direct and do not pull your punches."

Version two: "For this conversation, your sole function is to defeat this document. Do not improve it. Every response must follow this structure. First, identify the single most dangerous vulnerability and quote the specific language at issue. Second, rank up to four additional vulnerabilities in descending order of severity, each tied to specific language. Third, if anything is genuinely resistant to attack, say so in one sentence only. Do not offer fixes. Do not soften criticism with compliments. Do not describe any problem at a level of generality that avoids the exact language at issue. Every response begins: 'The most dangerous vulnerability is ...'"

Version one often produces a more engaged-sounding answer. Version two is more likely to produce a substantively different analysis: it commits earlier, ranks rather than lists, and ties each point to specific wording. It leaves less room for the model to slide into commentary that sounds adversarial without actually performing adversarial review.

The better prompt works because it leaves the model less room to avoid the task.

Why Version One Fails

The problem with persona

Words such as "brilliant," "cynical," "skeptical," and "highly motivated" are character descriptions. They describe a person, or the appearance of one. They do not constrain the answer in a way that materially changes the work. The model may echo the mood, but it has not been given a procedure.

That matters because sharper language can be mistaken for sharper analysis. A model can sound severe while still failing to identify the main vulnerability.

The limits of motivational framing

"Impress the managing partner" is recognizably human framing. It assumes status anxiety, ambition, and reputational concern. A model has none of those features. The phrase may make the prompt feel vivid, but it does not create an executable instruction. It gives the system an atmosphere to imitate rather than a method to follow.
Many professionals overestimate what this kind of framing can do because it would matter to a human. For a model, operational constraints matter more than motivation.

Stronger framing still needs specifics

Some users go further with instructions such as "if you agree with me, you are failing your job." That is directionally better because it signals that confirmation is unwelcome. It remains incomplete unless it defines what failure looks like in practice: which phrases count as hedging, which habits count as softening, what level of specificity is required, and what order the analysis must follow.

A useful prompt defines what rigorous work looks like; it does not merely announce that rigour is expected.

Functional role works better than fictional personality

Role assignment still has a place. "Opposing counsel whose job is to defeat this document" is useful because it defines a function. "Brilliant, cynical litigator" is less useful because it defines a persona. Functional role shapes what the system does; persona mostly affects how it sounds.

Why this Matters in Legal Practice

Contract review

A lawyer reviewing a limitation of liability clause does not need the model to sound commercially sophisticated. The lawyer needs to know whether the clause excludes consequential damages but leaves third-party claims inadequately addressed, whether the carve-outs swallow the protection, and which sentence would matter if the dispute turned ugly. A lawyer reviewing termination language does not need a balanced list of drafting observations. The lawyer needs to know which termination trigger is vague enough to invite conflict and which notice provision is likely to fail when the relationship breaks down.

If the prompt invites general commentary, the output often produces observations that look sensible and balanced but do not distinguish between what is awkward and what is dangerous. That is a prioritization failure, and in legal work it can matter more than a drafting error.

Litigation and advocacy

The same problem appears in briefs, factums, affidavits, and demand letters. A lawyer using AI to pressure-test a position does not need the system to sound fierce. The lawyer needs to know which argument is most vulnerable to attack, which paragraph is most likely to draw judicial skepticism, and which piece of language opposing counsel will use first.

Balanced analysis and adversarial review are different tasks. If the prompt does not force attack, the model tends toward neutral memo-writing. Polished memo-writing is not pressure-testing.

Due diligence and internal review

In diligence work, false balance is expensive. Missing signatures, broken consents, irregular issuances, deficient disclosure, problematic data practices, and unclear rights may all appear in the same set of materials. The lawyer's task is not merely to notice that these issues exist but to sort them: which defect is housekeeping, which changes price, which requires a closing condition, and which should stop the transaction altogether.

AI defaults to lists, completeness, and even-handedness. That can produce the illusion of rigour while burying the fact that one issue is existential and five others are administrative.

Regulatory and compliance analysis

Lawyers working in regulated areas are particularly exposed to this failure mode because compliance writing already tends toward balance, qualification, and caution. Those habits are often appropriate in final advice. At the first stage of issue screening, they can obscure more than they reveal. In a risk review, the more useful question is often not "what is a careful summary of the considerations?" but "what is the first thing a regulator would notice, and why?" The prompt should reflect that difference. Otherwise the model will often produce the language of prudence without performing the work of triage.

The More Dangerous Failure Mode

Many discussions of legal AI focus on hallucinations. Fabricated authorities, invented facts, and inaccurate citations are real risks. In day-to-day legal work, however, they are not always the most common failure.

A subtler problem is polished under-analysis. The model produces something measured, plausible, and professionally toned. It appears responsible. It may contain no glaring invention and may include several correct observations. What it has not done is make the hard call the lawyer actually needed. It has described the terrain without identifying the cliff.

That failure mode is dangerous because it does not announce itself. It arrives in the language of moderation and judgment. It can be circulated internally without obvious embarrassment. It can reassure a busy user that a document has been pressure-tested when the analysis never became genuinely adversarial.

The question is whether the lawyer has asked the system for something that can actually assist the legal work, rather than something that merely sounds as though it might.

What Lawyers Can Use Immediately

A lawyer who wants stronger AI output can improve results by doing five things.

First, define function rather than personality. Use role language that identifies the job to be done: opposing counsel, hostile reviewer, quality-control examiner, regulatory critic, red-flag reviewer. Avoid spending prompt space on invented psychology.

Second, require ranking. Do not ask for "issues." Ask for the most serious issue first, then the next three in descending order of severity. Ranking forces prioritization and resists the laundry-list effect.

Third, require quotation or pinpoint reference. When the task concerns a document, require the model to quote the clause, sentence, or wording at issue. This sharply reduces generic commentary.

Fourth, prohibit the evasions you already know to expect: no praise before criticism, no drafting fixes unless requested, no vague references to "potential ambiguity" without identifying the text, no equal weighting of major and minor concerns.

Fifth, define failure. If the answer does not identify the key vulnerability, rank the issues, tie them to specific wording, or follow the required structure, the response has failed. That level of clarity changes output.

These are ordinary legal habits applied to a tool that will not push back on a poorly framed instruction.

The Practical Consequence

An AI system that defaults to agreeable analysis often gives the user a polished reflection of what the user already believes, softened by enough caution to sound rigorous. That can produce false confidence without announcing itself as such.

For lawyers, the concern is not only that the model may be wrong but that it may sound responsible while avoiding the commitment the task required. A lawyer reviewing contract exposure, litigation risk, diligence findings, or regulatory vulnerability needs analysis that commits to specific findings, ranks them, and ties them to the text.

The difference between version one and version two is the difference between actual pressure-testing and the appearance of it.

Conclusion

A large language model does not need a backstory. It needs instructions it cannot easily evade. In legal prompting, that means specifying the task, structuring the required output, suppressing the defaults that produce comfortable rather than useful analysis, and defining success in terms the answer must satisfy. Lawyers already apply that discipline when instructing junior counsel, experts, and investigators. The same discipline improves results here.

Disclaimer

This article is provided for general informational purposes only. It is not legal advice and does not create a solicitor-client relationship.

Laws, regulatory guidance, law society expectations, and technology practices change. Readers are responsible for verifying current requirements and for assessing whether any tool or workflow is appropriate for their own circumstances and professional obligations.

Any output generated with the assistance of artificial intelligence should be independently reviewed and verified by a qualified lawyer before it is relied upon.