Skip to content

Don’t let the VDR’s AI tank your deal

VDR vendors are racing to add AI features. Most are getting it wrong.

 


 

A mid-market M&A team uploads 3,000 documents to their virtual data room (VDR). They activate the vendor's flagship AI redaction tool : "bulk-redact thousands of files in seconds", the marketing promised. Minutes later, the system reports completion. Yet buried on page 37 of an employment contract, a social security number remains plainly visible. The AI found punctuation marks to redact instead.

 

This is the reality of what happens in legacy VDRs today. It even appears in the marketing demo from the world’s largest VDR provider... Power users know it well.

 

VDR providers are racing to bolt generative AI onto their platforms, pitching "due diligence agents" and "intelligent redaction" to M&A teams who need reliability above all else.

 

The technology isn't ready. The implementations are naive. And the consequences fall on deal teams who discover, sometimes after buyers have already accessed files, that their AI-powered features failed.

 

The pitch sounds compelling: let algorithms handle repetitive work, answer buyer questions instantly, flag sensitive information automatically. The reality resembles autonomous driving circa 2010: impressive demos that work on a circuit, but frequent accidents in real life, and no one really willing to remove their hands from the wheel, or to take responsibility for crashes.

 

 

The demo breaks in production

Consider two examples that illustrate the pattern: document redaction and Q&A automation.

 

Automated redaction: Some platforms employ large language models to scan documents for personally identifiable information, promising to eliminate manual review. In practice, these systems struggle with many steps, starting with document parsing: PDFs with complex layouts confound them, scanned images defeat them entirely, tables scramble their logic. A model might correctly identify social security numbers in some documents but miss others, etc.

 

Worse: there are no safeguards. When these systems err, they do so confidently. No warning flags, no uncertainty scores. Just a green checkmark and a latent liability.

 

The Q&A implementations present different risks. Some vendors now offer chatbots that answer buyer questions by querying uploaded documents through retrieval-augmented generation. The appeal is obvious: accelerate diligence, reduce email volleys, free up deal team time.

 

But imagine the scenario: a strategic buyer asks about warranty claims history. The chatbot, trained on incomplete document chunks with a context window that can't span the full file set, responds with a figure. The figure is wrong, off by 40%, or citing a superseded version, or hallucinating entirely. The buyer relies on it. The deal reprices or, worse, litigation follows post-close.

 

No M&A professional would let a junior analyst answer buyer questions without review. Yet they're being asked to trust a system that cannot explain its reasoning, cannot flag its uncertainty, and cannot be held accountable.

 

The control problem

The fundamental issue isn't that AI performs poorly, it's that autonomous AI removes human oversight at precisely the moments when oversight matters most.

 

Sell-side teams don't need software that makes decisions. They need tools that preserve their ability to make decisions while reducing mechanical friction.

 

Not a system that bulk-redacts files, but one that identifies every instance of a pattern and lets humans verify before applying changes. Not a sloppy chatbot inside the data room, but infrastructure that lets deal teams in control, and potentially use their preferred AI platforms while maintaining custody and control of documents.

 

What smart controls should actually look like

The alternative approach doesn't try to make AI autonomous. It makes AI context-aware.

 

Entropia, a VDR built by former Google engineers, is implementing this philosophy. Rather than adding AI that make autonomous decisions, the platform focuses on three principles: grounding AI suggestions in user behaviour, scaling individual decisions across bounded contexts, and replaying those decisions as circumstances change.

 

Such a context-aware system observes what you're doing, learns from the patterns in your specific environment, and scales your decisions without making new ones.

Critically, every suggestion requires human validation before execution.

 

Ground suggestions in what users are actually doing

When a deal team renames a file, the system doesn't just accept the change. It analyzes the document type, examines how similar files have been named, and suggests standardized nomenclature for the next document. The AI isn't deciding on a naming convention, it's inferring the convention the team is already using and offering to apply it consistently.

 

The context matters. An agreement uploaded to the "Contracts/Suppliers" folder gets different naming suggestions than the same agreement uploaded to "Contracts/Customers." The AI reads the environment, not just the file.

 

Scale individual actions across bounded contexts

An analyst redacts a social security number in an employment agreement. That single action (hiding this specific nine-digit string) becomes a bounded context the system can replicate. The AI searches every document in the data room for that exact string and flags each instance. The analyst reviews the results, then applies the redaction across all confirmed matches.

 

One decision, executed fifty times. The human made the judgment call about what to redact. The AI handled the pattern matching and execution.

 

This extends to recurring tasks. A team uploads financial statements monthly throughout a deal. They redact certain fields in the first upload (specific salary bands, particular contract values). When new statements arrive, the system identifies analogous fields based on position, formatting, and previous redaction patterns, then flags them for review. The team confirms or adjusts, then applies.

 

Replay decisions when circumstances change

Documents don't arrive all at once. A data room grows throughout diligence as teams locate additional materials or buyers request specific files. Traditional approaches require reviewing each new upload from scratch: checking for sensitive information, verifying naming conventions, confirming version control.

 

Context-aware systems treat previous decisions as reusable templates. If ten contracts have already been reviewed and redacted following a specific pattern, the eleventh contract gets automatically scanned for similar elements. Not automatically redacted, but automatically flagged for the same review process.

 

Let users work on the data room from their preferred AI platforms while staying in control

M&A teams already have powerful AI tools they trust. Some use general-purpose platforms (Google Gemini, Mistral, ChatGPT, Claude) for analysis or drafting. Others rely on specialized platforms like Harvey, Hebbia, or Legora for diligence.

 

These platforms outperform the generic chatbots VDR providers bolt onto their products. They have hundreds (or thousands) of engineers refining models and orchestrating them within sophisticated products. VDR providers, by contrast, are just adding chatbots as naïve interface features, with poor back-end orchestration or infrastructure, and often with minimal quality guardrails.

 

The relevant question becomes whether VDR providers will accommodate the tools teams have chosen, or force them to use inferior alternatives embedded in the platform.

 

Today, the only alternative is either to use inferior AI features in data rooms (a poor option which often requires extra payment to the VDR provider), or taking all documents out of the data room to upload them on other AI platforms (which creates copies outside the VDR's security perimeter).

 

There is a third option. Entropia is providing a secure connection between the VDR and the AI platforms, via a protocol named MCP (for “Model Context Protocol”) *.

 

An MCP server enables users to work with whichever AI platform they prefer, but those platforms query the data room through controlled APIs rather than receiving file copies. A lawyer using Claude to analyze sale agreements doesn't download fifty contracts. Claude queries the VDR for relevant clauses, receives text snippets within defined parameters, and generates analysis. The documents never leave secure storage. Every query gets logged inside the VDR, providing insights to the seller. Access permissions remain consistent across human and AI users.

 

What happens next ?

The AI gold rush in enterprise software follows a predictable pattern: vendors add language models because competitors are doing it and investors expect it, not because customers need them. Features proliferate faster than quality control. Early adopters become beta testers with real stakes.

 

VDRs are particularly unsuited to this dynamic. M&A deals involve sensitive information, tight timelines, and legal exposure. They reward reliability over innovation theater. A chatbot that's right, say, 75% of the time sounds impressive until you consider that the 25% of errors could tank a deal or trigger litigation.

 

Perhaps the more interesting question is whether M&A teams will demand better, or simply adapt to the new risks. The evidence suggests they're choosing a third path: using AI tools they trust and managed to get right. And those tools don’t come from VDR providers.

 

Control, it turns out, means knowing when to build and when to connect.

 

 

 

 


 

* MCP Servers Explained
An MCP server acts as a secure intermediary between AI platforms and sensitive data. Rather than uploading files to ChatGPT or Claude directly, which creates copies outside your control, the MCP server exposes predefined APIs that let these tools query your data room for specific context. Users choose their preferred AI platform. Those platforms access only what the user has permission to see, and their actions get logged in the data room. Think of it as a read-only interface for language models, with granular access controls and full audit trails. Teams leverage powerful AI analysis without surrendering document custody.