Where Client Information Actually Leaks When Lawyers Use Generative AI

by Alixe Cormick | Apr 20, 2026

A partner redacts client names from a merger agreement, pastes the document into an AI tool to draft disclosure schedules, and assumes confidentiality has been preserved. It has not. The redaction may be technically effective, but the information has still been disclosed to a third-party system. This example reflects ordinary, well-intentioned practice rather than carelessness.

Discussions about privacy and generative AI often focus on a single question: Is the model training on my data? Most confidentiality exposure occurs well before any training and continues after the model generates a response.

Generative AI systems operate less like sealed tools and more like information supply chains. Data enters through user interfaces, may be processed by multiple intermediaries, may be stored in ways not visible to the user, and can cross jurisdictional boundaries without obvious warning. Understanding that supply chain is essential to understanding professional risk.

Redaction Does Not Eliminate Disclosure

The moment a lawyer enters text into an AI tool or uploads a document, information leaves the firm's controlled environment. This includes not only visible text, but also metadata, formatting patterns, and contextual details that may identify parties or transactions even when names are removed.

Modern models can sometimes infer identity from transaction structure, timelines, or industry-specific facts. A mid-sized acquisition in a specialized industry during a defined period may be identifiable, even without explicit identifiers.

From a professional responsibility perspective, the relevant fact is the disclosure itself: confidential information has been transmitted to a third-party system, regardless of whether identification is ultimately possible.

The Application Layer Sees Everything

Most lawyers interact with AI through an interface built by an intermediary provider. Legal research platforms, drafting assistants, and document analysis tools often sit between the user and the underlying model provider.

These intermediary applications may log prompts for troubleshooting, truncate long documents to reduce costs, route content through multiple services, or temporarily store data for processing. Much of this activity is not visible to the end user.

Even where the model provider offers limited retention, the intermediary provider may be governed by separate contractual terms. The relevant data-handling obligations may therefore arise from multiple agreements rather than a single set of terms.

This is a contractual risk as much as a technical one.

Encryption Does Not Eliminate Processing Exposure

A common misconception is that encryption solves confidentiality concerns. It protects data in transit and at rest. It does not eliminate exposure during active processing.

To generate a response, the system must process the submitted content in usable form within working memory. During this stage, the information is accessible to the system performing the processing and subject to the controls governing that environment.

Encryption reduces interception risk but does not eliminate confidentiality concerns associated with third-party processing.

Logs and Persistent Representations

Prompts and responses are commonly retained in operational logs for troubleshooting, security monitoring, or service improvement. These logs may be retained longer than the user session, and may be stored in different jurisdictions.

Some systems also convert text into numerical representations ("embeddings") for search and retrieval purposes. These representations preserve semantic relationships rather than exact wording and are generally not reversible into the original text. However, they may still constitute retained information derived from the client matter.

From a legal perspective, these practices complicate deletion requests and data-governance obligations under privacy statutes.

Output Leakage is Uncommon but Relevant

Reports of models generating content traceable to prior user inputs are uncommon, particularly in enterprise environments with appropriate controls. Nevertheless, professional obligations require lawyers to consider low-probability but high-impact risks.*

The relevant question is whether reasonable safeguards have been implemented.

* Lawler,Tara; Gary, Elizabeth Marie; and Mccarthy, Bansri Mehta (September 18, 2025), “Generative AI and the Challenge of Preserving Privilege in Discovery”, Reuters. https://www.reuters.com/legal/legalindustry/generative-ai-challenge-preserving-privilege-discovery--pracin-2025-09-18/

Human Review May Occur

Many AI services include human review as part of quality assurance, security monitoring, or model improvement. Content flagged as anomalous or unclear may be reviewed by employees or contractors.

This review typically occurs outside the user's visibility and may occur in different jurisdictions. Reviewers are performing an operational function and do not assess privilege or confidentiality.

Training Is Usually Not the Immediate Risk

Whether user content is incorporated into model training or fine-tuning receives significant attention, but it is typically not the most immediate source of exposure.

Once incorporated into model parameters, information cannot practically be removed. That remains a serious concern. In practice, however, logging, human review, and cross-border processing occur far more frequently than model training.

Jurisdiction Matters

A Canadian lawyer's prompt processed through infrastructure located in another country may be subject to foreign legal access regimes, including legislation such as the Clarifying Lawful Overseas Use of Data Act ("CLOUD Act").

This can occur even where encryption and enterprise controls are in place. Cross-border processing may expose information to legal regimes that do not fully recognize solicitor-client privilege or professional confidentiality.

A lawful transfer is not the same as a protected one. Processing in another jurisdiction may mean that different rules govern what protections and rights apply.

Where data is processed can matter as much as how it is processed.

Downstream Disclosure

When AI-generated work product is filed with a court, delivered to a client, or shared with opposing counsel, any upstream confidentiality exposure is compounded.

If the output incorporates distinctive facts or structural elements derived from confidential material, those elements may enter public or adversarial hands. Once disclosure occurs at this stage, remediation may be difficult or impossible.

Client Consent

Most retainer agreements do not address the use of generative AI systems.

Even where firm policies permit such use, professional obligations may require client knowledge or consent before confidential information is disclosed to third-party service providers.

Disclosure does not require a human recipient. Submitting client information to a system operated by a third party may constitute disclosure for professional responsibility purposes, particularly where the provider may store or review information.

Enterprise and Regulatory Compliance Claims

Terms such as "enterprise grade" and "HIPAA compliant" describe control frameworks, rather than guarantees of confidentiality or privilege.

These frameworks govern how data is handled, not whether it is processed, copied, retained, or reviewed.

Even in enterprise deployments, certain realities remain:

  • Information must be processed in usable form during inference.
  • Prompts and outputs may be logged for operational purposes.
  • Derived representations may persist in storage systems.
  • Cross-border infrastructure may create legal exposure.
  • Human review may occur in limited circumstances.

There is a conceptual mismatch. Technology providers define privacy primarily in terms of protection against unauthorized access. Lawyers define confidentiality in terms of control over disclosure, including authorized processing that exceeds professional obligations.

Enterprise controls reduce risk. They do not eliminate the underlying information handling chain.

Effective Risk Controls

Managing these risks requires layered controls rather than reliance on single assurances:

  • Contractual provisions addressing retention, review, and jurisdiction.
  • Tools designed for professional use with appropriate contractual protections.
  • Private or controlled deployments where feasible.
  • Limiting use to non-confidential work where appropriate.
  • Client disclosure or consent where required.

The question is whether the information handling chain is sufficiently transparent and controlled to meet professional obligations

Practical Questions

Before using generative AI on client matters, lawyers should be able to answer three questions:

  • Where does the information go, including intermediaries and storage systems?
  • Who may access it, including personnel, contractors, and authorities?
  • How could confidentiality safeguards be demonstrated if challenged by a regulator or court?

 

Disclaimer

This article is provided for general informational purposes only. It is not legal advice and does not create a solicitor-client relationship.

Laws, regulatory guidance, law society expectations, and technology practices change. Readers are responsible for verifying current requirements and for assessing whether any tool or workflow is appropriate for their own circumstances and professional obligations.

Any output generated with the assistance of artificial intelligence should be independently reviewed and verified by a qualified lawyer before it is relied upon.