<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Trust on cekrem.github.io</title><link>https://cekrem.github.io/tags/trust/</link><description>Recent content in Trust on cekrem.github.io</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 24 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://cekrem.github.io/tags/trust/index.xml" rel="self" type="application/rss+xml"/><item><title>LLMs Corrupt Your Documents (and the Theory Dies Twice)</title><link>https://cekrem.github.io/posts/llms-corrupt-your-documents/</link><pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate><guid>https://cekrem.github.io/posts/llms-corrupt-your-documents/</guid><description>&lt;p&gt;This week a friend sent me a paper with a title that made me laugh out loud: &lt;a href="https://arxiv.org/html/2604.15597v1" class="external-link" target="_blank" rel="noopener"&gt;&amp;ldquo;LLMs Corrupt Your Documents When You Delegate.&amp;rdquo;&lt;/a&gt; By Philippe Laban, Tobias Schnabel, and Jennifer Neville at Microsoft Research. Not &amp;ldquo;LLMs &lt;em&gt;might&lt;/em&gt; corrupt&amp;rdquo; or &amp;ldquo;LLMs &lt;em&gt;occasionally&lt;/em&gt; introduce errors.&amp;rdquo; Just the blunt statement of fact.&lt;/p&gt;
&lt;p&gt;I appreciated that, and the veteran reader of my blog might guess already that I&amp;rsquo;m not very surprised.&lt;/p&gt;
&lt;h2 id="the-numbers"&gt;
 The numbers
 &lt;a class="heading-link" href="#the-numbers"&gt;
 &lt;i class="fa-solid fa-link" aria-hidden="true" title="Link to heading"&gt;&lt;/i&gt;
 &lt;span class="sr-only"&gt;Link to heading&lt;/span&gt;
 &lt;/a&gt;
&lt;/h2&gt;
&lt;p&gt;The researchers built something called the DELEGATE-52 benchmark. Fifty-two documents across different domains, handed to nineteen different models (including &amp;ldquo;frontier&amp;rdquo; ones like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT-5.4). Each model gets a document and a series of editing instructions. Twenty interactions. Just twenty. And by the end?&lt;/p&gt;</description></item></channel></rss>