Skip to content

Context-adaptive document redaction

Legacy VDRs charge extra for AI-redaction that doesn't work. Entropia has a better idea.

 


 

A sell-side analyst uploads 847 employment contracts to a data room. Each contains social security numbers, home addresses, salary details. The VDR promises to redact them all automatically. One click, the marketing materials said. Forty-five seconds later, a confirmation appears: "1,247 redactions complete."

 

The analyst opens a random file. The CEO's social security number sits there, unmasked. Three more spot checks reveal similar gaps. The automated system missed variations in formatting, struggled with scanned documents, failed to recognize abbreviated names.

 

Eight hours of manual review follow. Better than risking a GDPR violation or tanking a deal because confidential client names leaked to a competitor.

 

"I'd rather have my intern spend eight hours on redaction than five minutes with an AI solution and take the risk," says Victor, an M&A associate at a major European investment bank. The bank's reputation sits on the line with every document shared. No algorithm gets to make that call unsupervised.

 

What redaction actually means in M&A

Document redaction in mergers and acquisitions isn't about hiding embarrassing details. It serves three precise functions that carry legal and financial consequences when done poorly.

 

  • First, regulatory compliance. European data protection rules require removing personally identifiable information before sharing employee records with third parties. Social security numbers, home addresses, phone numbers, bank details. Missing a single instance can trigger regulatory investigations. Financial penalties for GDPR violations reach into millions of euros, calculated as percentages of global revenue.

  • Second, competitive protection. A target company's client list, pricing structures, supplier contracts, and strategic partnerships constitute valuable intelligence. Buyers conducting due diligence need to verify commercial relationships exist without learning specific terms that could advantage them in negotiations or, worse, leak to competitors if a deal collapses.

  • Third, staged disclosure. M&A transactions progress through phases. Early-stage buyers see summary financials and anonymized contracts. Later, as negotiations advance, additional details unlock. The redaction system needs to support this gradual revelation, with precise control over what each party sees and when.

The consequences of failure aren't abstract. A leaked client list can trigger contract renegotiations. Exposed pricing data undermines competitive positioning. Personal information breaches generate regulatory scrutiny that delays or kills transactions entirely.

 

And then there's the technical requirement: true redaction must destroy the underlying data bytes, not merely overlay black boxes. Some transaction advisory firms openly discuss tools that can "un-redact" documents where the original text remains embedded in the file structure. If buyers or their advisors possess such tools, cosmetic redaction becomes disclosure.

 

The AI-redaction theater

Legacy VDR providers spotted an opportunity. Artificial intelligence was the solution to every enterprise software problem, apparently including document redaction. Marketing materials promised dramatic results: "Redact 1,000 files in one click." "80% time savings." "AI-powered pattern recognition."

 

Datasite, one of the established players, promotes its automated redaction feature prominently. But the demo showcased on their website fails spectacularly. When it functions, the system might identify obvious patterns like social security numbers in structured documents, but complexity defeats it quickly. PDFs with unusual layouts confuse the parser. Scanned images remain completely opaque. Tables scramble its logic.

 

The system might correctly identify confidential information in some employment agreements while missing others where formatting differs slightly. No uncertainty scores appear. No confidence intervals. Just green checkmarks and buried liabilities.

 

What legacy vendors actually sell is autonomous AI. The system makes decisions about what to redact based on pattern matching and training data. It executes those decisions without meaningful human oversight. The human role reduces to clicking "approve" on bulk operations affecting hundreds or thousands of documents simultaneously.

 

This autonomy becomes the vulnerability. M&A teams don't need software that makes decisions. They need tools that preserve their ability to make decisions while eliminating mechanical friction.

 

The pricing model reveals the vendors' actual priorities. Legacy vendors charges up to an additional €2,500 per data room for AI-assisted redaction. And some count redacted pages as separate files, effectively charging twice for the same document. Teams upload a fifty-page contract, redact ten pages, and discover they're being billed for sixty pages of storage. This has side effects: users redact outside the VDR to avoid inflated charges, undermining the entire purpose of a centralized, auditable system.

 

What users actually need

The conversation about AI-powered redaction typically focuses on speed and automation. That misses what M&A professionals actually want from their tools.

 

  • Control sits at the top of the hierarchy. Deal teams need visibility into every redaction decision. Not trust, verification. The system can suggest, flag, and accelerate, but humans must validate before anything executes. This isn't inefficiency, it's risk management in an environment where a single mistake can cost millions.

 

  • Completeness ranks second. The fear isn't just making errors, it's missing items entirely. An analyst redacting Pierre-Louis Corteel's information needs the system to identify every variation: P.L. Corteel, Corteel PL, Pierre L. Corteel. Across all documents in the data room, not just the one currently open. Manual review might catch variations in the same file. It won't catch them across 847 employment contracts spread through different folders.

  • Security comes third, though in practice it underlies everything. True redaction must destroy data at the byte level. Visual overlays aren't sufficient. The underlying text must disappear from the file structure entirely, preventing any attempt at recovery.

  • And pricing transparency matters more than vendors acknowledge. Teams need predictable costs that don't penalize thoroughness. If redacting documents becomes expensive, the economic incentive shifts toward doing less redaction, which directly contradicts security objectives.

 

Context-aware redaction: an innovative approach

Entropia takes a different approach to redaction. Rather than attempting to make AI autonomous, the platform makes it context-aware.

 

The system observes what users do, learns patterns from specific environments, and scales individual decisions across bounded contexts. Critically, every suggestion requires human validation before execution.

 

The workflow begins with manual redaction, but amplifies it. An analyst redacts a social security number in an employment contract. That single action creates a bounded context the system replicates. The platform searches every document in the data room for that exact pattern and flags each instance. The analyst reviews the results, confirms matches to apply redaction.

 

One judgment call, executed fifty times. The human decided what to redact. The software handled pattern matching and execution.

 

This extends to more complex scenarios. A team uploads financial statements monthly throughout a deal. They redact specific salary bands and contract values in the first batch. When new statements arrive weeks later, the system identifies analogous fields based on position, formatting, and previous patterns. It flags them for review. The team confirms or adjusts, then applies.

 

The system also handles format variations. Redact "Pierre-Louis Corteel" once, and the platform flags "P.L. Corteel," "Corteel PL," and similar variations across all documents. Not automatically redacted, automatically flagged. The distinction matters. Users maintain control while the system prevents oversights that would occur during pure manual review.

 

Scanned documents or pdf are usually excluded from automations, but in Entropia, they are fully searchable, with the same redaction workflows applying regardless of how documents originated. This removes a major blind spot where legacy systems fail.

 

The search functionality integrates directly with redaction. Search for an item in a document, find ten instances, redact them all in one click. Then the system alerts you to other occurrences in different files. The workflow becomes: search, review results in current document, redact confirmed instances, then address flagged items elsewhere in the data room.

 

Pricing follows a different logic entirely. Assisted redaction isn't an add-on feature. Redacted documents don’t lead to a per-page upcharge. It's included in the platform. No artificial incentives to redact outside the system. No penalties for thorough document protection.

 

The pattern recognition problem

Perhaps the more interesting question is whether M&A teams will demand better tools, or simply adapt to elevated risks. Early evidence suggests they're rejecting automation theater in favor of control.

 

When a transaction advisory firm openly discusses tools that can "un-redact" documents, it confirms what security-conscious teams already suspected. Cosmetic redaction creates the appearance of protection without the substance. And when autonomous AI systems promise comprehensive redaction but deliver spotty results, the risk calculus becomes simple. Eight hours of manual review beats five minutes with an unreliable algorithm when millions of euros and institutional reputation are at stake.

 

The vendors marketing "one-click redaction for 1,000 files" aren't addressing what M&A professionals actually need. They're selling the idea of productivity gains to budget holders who don't sit in the trenches reviewing documents. The people doing the actual work understand the difference between speed and thoroughness.

 

Context-aware systems offer a middle path. The AI doesn't make decisions, it scales decisions humans already made. The human expertise remains central. The software eliminates repetitive mechanical tasks while preserving oversight at critical junctures. This matters because document volumes in M&A continue growing. A mid-sized transaction easily involves thousands of files. Pure manual review becomes genuinely impractical at scale. But autonomous systems that make unsupervised decisions about sensitive data aren't the answer either.

 

The solution involves AI, but not the AI that vendors have been marketing. Not systems that replace human judgment with algorithmic confidence. Instead, tools that extend human judgment across larger datasets while maintaining visibility and control at every step. Whether the VDR industry moves in this direction depends on whether buying decisions come from deal teams who understand the requirements, or from procurement departments optimizing for feature checklists and cost reduction.

 

The evidence from conversations with M&A professionals suggests the former is winning. When an analyst says "I'd rather have my intern spend eight hours on redaction than take the risk with AI," that's not technophobia. That's someone who understands exactly where liability sits when documents leak or regulators investigate.

 

Control, it turns out, means knowing when to automate and when to verify. The best redaction system isn't the one that promises to do everything automatically. It's the one that helps humans do their job better without removing them from the process.

 

-> Read more about our context-aware data room in this article