Free Code Diff Tool: Compare & Spot AI Errors Fast

It was a Tuesday afternoon in November. I had a tax calculation engine running on PostgreSQL, and one of the aggregation queries was getting slow — a few hundred milliseconds on a table that was growing every week. Nothing critical, but the kind of thing that nags at you. I asked an AI assistant to look at the query and suggest optimizations.

I copied the suggestion, skimmed it quickly, saw familiar SQL keywords, and ran it. The query executed in 12 milliseconds. I felt good about that. Then I checked the table.

About 4,200 rows — a week of tax records for real users — were gone.

The AI had restructured the query in a way that removed the WHERE user_id = :id filter entirely. Then it added what looked like a cleanup step that truncated stale entries. Except there were no stale entries. There was just live data. Gone.

I had backups, so we recovered. But it took four hours, one very uncomfortable call with my client, and a week of second-guessing every single AI suggestion I'd ever accepted without reading carefully. That week is when I started building the diff comparator.

The Actual Problem: AI Does Not Tell You What It Changed

Every major AI assistant — ChatGPT, Claude, Gemini, Copilot — has the same behavior when you ask it to improve something. It gives you back a complete rewritten version. There is no track changes. There is no red-line markup. There is no summary of what it removed. You get the final result and you are expected to figure out the delta yourself.

For short inputs that is manageable. If you paste three lines of JavaScript and get three lines back, you can read them. But in real work, inputs are rarely three lines.

When I sent the AI that SQL query, it was 47 lines long. The returned version was 41 lines. The six missing lines included the WHERE clause that filtered by user ID and a LIMIT clause that kept the aggregation bounded. I did not notice because I was looking at whether the JOIN structure was correct, not whether the filter conditions were still there.

This is not a criticism of AI tools. They are genuinely useful. The problem is a workflow problem: I was treating AI output like a small edit when it was actually a full rewrite. I had no process for verifying what changed, and that gap cost me four hours and a lot of stress.

⚠️ A Note on Why This Happens Technically

Language models do not "edit" your code or text in the traditional sense. They predict the most likely next token given everything they have seen, including your input and their training. When you ask for an optimization, the model generates a new sequence from scratch. It is not diffing anything internally. It has no concept of what your original said versus what it is producing. Every character in the output is freshly generated. That is why deletions happen silently — there is no deletion operation. The missing content simply never gets predicted.

Who Actually Runs Into This Problem

After the incident I started asking colleagues and people in developer forums if they had similar experiences. The pattern was much broader than I expected.

Developers Reviewing AI-Generated Code

The most common version of the problem in developer circles: you ask AI to refactor a function, it returns a cleaner-looking version, and somewhere in the cleanup it drops an edge case check or a null guard. The code works for 99% of inputs and breaks silently for the 1% your tests did not cover. I have heard this story from backend engineers, mobile developers, and frontend developers working on form validation.

One engineer told me he had AI refactor an authentication middleware. The AI reorganized the code cleanly and removed what looked like a redundant early return. It was not redundant. It was the check that prevented unauthenticated requests from reaching an admin route. The bug made it to staging before anyone caught it.

Writers and Editors Using AI to Polish Drafts

I talked to a technical writer who uses AI to smooth out the language in documentation. She described asking the model to simplify a section explaining an API deprecation. The simplified version was easier to read, but it had dropped the date the old endpoint would stop working. Developers reading the simplified docs missed the deadline because the deadline was no longer in the text.

This is a subtler kind of deletion than a missing WHERE clause, but the effect is the same: something important was in the original that is not in the AI version, and nobody noticed because nobody compared them.

Legal and Business Documents

A freelance contract negotiator I know uses AI to translate dense legal language into plain English for clients. She had a clause about payment terms — specifically a late payment penalty — that the AI softened into vague language about "timely payment expectations." The client read the simplified version. The original contract had the penalty clause. When payment was late, there was a dispute about whether the client had understood the terms.

In legal contexts, the delta between two versions of a document is sometimes the entire argument. Not having a way to see that delta is a liability.

Students and Academics

This one is straightforward: you paste your essay into an AI and ask it to fix grammar. It fixes grammar and also quietly removes a paragraph you spent an hour on, or changes a citation you had verified, or rephrases your thesis in a way that shifts the argument. You submit the AI version. Your argument is not quite what you intended.

How I Approached Building the Tool

My first instinct was to use an existing diff library and wrap a UI around it. But I wanted to understand the algorithm well enough to know exactly what I was showing users, because the choice of diff algorithm genuinely affects what you see.

Why Myers Diff

There are several approaches to computing differences between two sequences. The simplest is a naive line comparison: iterate both texts simultaneously and flag lines that do not match. This works but produces poor results when lines are inserted or deleted in the middle of a file, because everything after the insertion shifts and shows as changed.

The Myers algorithm, published by Eugene Myers in 1986 and used by Git since its earliest versions, solves this by finding the shortest edit script — the minimum number of insertions and deletions needed to transform text A into text B. This means that when you move a function from the top of a file to the bottom, Myers recognizes it as the same function and shows only the surrounding context as changed, not the entire file. For code specifically, this is the difference between a useful diff and an unusable one.

The implementation I wrote uses a longest common subsequence (LCS) approach, which is mathematically equivalent to Myers but slightly easier to implement correctly in JavaScript without a dedicated library. The LCS table is an (m+1) × (n+1) matrix of integers that you fill in with a nested loop, then backtrack through to reconstruct the edit sequence. For the character-level diff — the highlighting inside a changed line — I run the same algorithm on the individual characters of the two line versions.

The Character-Level Layer

Line-level diffing tells you which lines changed. But in practice, many changes are small: a variable renamed, a number adjusted, a word swapped. Without character-level highlighting inside a changed line, you have to read the entire line twice to find the change. With it, the exact characters that were added or removed are marked directly.

This was the feature I wished I had had when reviewing that SQL query. The WHERE clause was on a line that had also been reformatted slightly, so even if I had done a manual line comparison, I might have seen "this line changed" without immediately spotting that the filter condition was the thing that changed.

Syntax Highlighting for Code

When you switch the tool to Code mode, the output applies basic syntax highlighting: keywords in one color, strings in another, comments in italics, numeric literals distinguished from identifiers. This is done with a lightweight token-based regex pass rather than a full parser, which means it works across languages without needing language-specific configurations. It is not as precise as a full AST-based highlighter, but for a diff review — where you are scanning quickly for what changed — the visual differentiation between types of tokens is what matters.

What the Similarity Score Actually Means

The tool shows a similarity percentage between the two texts. This comes up in questions fairly often, so it is worth explaining precisely.

The score is calculated as the number of unchanged lines divided by the total line count of the longer document, expressed as a percentage. So if you have a 100-line original and a 100-line AI version with 80 identical lines, 15 modified lines, and 5 deleted lines replaced by 5 new ones, the similarity score is 80%.

This is a line-level measure, not a semantic one. Two documents that say the same thing in different words will score low. Two documents that use the same sentence structure but swap individual terms will score high. It is a structural similarity metric, most useful for catching unexpected large-scale changes: if you submitted a 200-line function to AI for minor cleanup and get back a 60% similarity score, something significant changed and you should read carefully before accepting it.

Practical Workflow: How I Use This Now

I want to be specific here because "compare before you update" is easy to say and easy to forget under time pressure. Here is the actual process I follow, which takes about 60 seconds per AI interaction:

Write or identify the thing I want to improve. A SQL query, a function, a paragraph in documentation, an email draft. Whatever it is, this is Version A.
Submit to AI and copy the response. This is Version B.
Open the diff tool. Paste A on the left, B on the right. Click Compare.
Read only the red lines first. Red means removed. Before I evaluate whether the AI's additions are good, I make sure I understand what it took out. Deletions are where silent mistakes hide.
Check the similarity score. For a minor grammar pass I expect 85%+. For a structural refactor I might expect 60–70%. If I asked for light editing and get 40% similarity, I read everything again.
Accept or reject. Sometimes the AI change is exactly right. Sometimes it removed something I need. Sometimes I take the AI version as a starting point and manually restore specific parts from the original.

This process has caught problems I would have missed: a removed input validation check in a Node.js route, a deleted "not" that reversed the meaning of a sentence in a user-facing error message, a stripped-out comment that explained why a particular magic number was what it was.

When This Tool Helps Beyond AI Review

The database incident was the origin of the tool, but once it existed, I started using it for other things.

Comparing Two Drafts from Different Team Members

We review each other's documentation. When two people write a version of the same section independently, pasting both into the diff tool immediately shows where their approaches diverge — which sentences are shared, which are different, which facts appear in one version but not the other. It is faster than reading both documents sequentially and trying to hold them in working memory.

Verifying Translations

We translate some technical content. When translating between a version and a reviewed revision of that translation, the diff shows every term that changed during review, which makes it much easier to understand what the reviewer was correcting and apply those corrections consistently to other documents.

Checking Configuration Files Before Deployment

Environment configs are dangerous for the same reason SQL queries are: a small silent change can have large consequences. I diff the config from staging against the config for production before every deploy. Keys that differ show up immediately.

Academic and Writing Review

I write technical posts and documentation. When I have two versions of a section — my draft and an edited version — the diff shows me exactly what the editor changed. This is more useful than reading the edited version alone, because I can see the editor's judgment at the level of individual word choices.

Frequently Asked Questions

How is this different from running git diff?

Git diff works on files that are tracked in a repository and shows changes across commits or branches. This tool works on any two pieces of text you paste directly — no repository, no commit history, no command line required. If you are reviewing an AI suggestion before it ever touches your codebase, or comparing two drafts of a document that exist only in your clipboard, git diff is not the right tool. This is.

Can I compare minified JavaScript or CSS?

You can, but the result will be one long line versus another long line — not very readable. For minified code, the character-level diff will show you what changed within that single line, but the visual output is dense. I would recommend formatting or pretty-printing the code before diffing if readability matters.

What happens with files larger than 200,000 characters?

The limit exists because LCS computation is O(m×n) in memory. At 200,000 characters, the computation is fast on a modern browser. Above that, the matrix would be large enough to cause slow performance or tab crashes on some machines. For the vast majority of real-world use cases — source files, documents, query strings — 200,000 characters is more than enough. That is roughly 4,000 to 5,000 lines of typical code.

How accurate is the character-level diff for non-English text?

The diff algorithm operates on Unicode characters, not bytes, so it handles Arabic, Urdu, Chinese, Korean, and other scripts correctly. Right-to-left text renders correctly in modern browsers. The diff result itself — which characters were added or removed — is accurate regardless of language. The visual display inherits whatever RTL/LTR behavior the browser applies to the text.

Is there any logging or analytics on what I paste?

No. The comparison runs entirely in your browser's JavaScript engine. The text you paste never leaves your machine. There are no API calls, no server-side logging, no telemetry on input content. After the page loads, you can disable your network connection and the tool continues to work exactly the same way.

Does the similarity score account for whitespace differences?

By default, yes — a line with a trailing space is treated as different from the same line without one. This matters for code where indentation is significant (Python, for example). If you are comparing documents where whitespace differences are noise, you can normalize your inputs before pasting by running them through a text editor's "trim trailing whitespace" function.

A Final Note on Trusting AI Tools

I want to be clear that I still use AI assistants every day. The database incident did not make me afraid of them. What it did was make me more precise about how I use them.

The mistake was not asking AI for help. The mistake was treating the output as a reviewed and verified change rather than as a draft that needed the same scrutiny I would give any other code before running it in production. A diff tool is just a way of making that scrutiny faster and more reliable.

If you use AI for anything where the content matters — code that runs in production, documents that go to clients, essays that get submitted, contracts that get signed — build in a comparison step. It takes less than a minute. It has saved me from at least a dozen mistakes since November.

Try the Text & Code Diff Comparator

Paste your original in Version A, paste the AI output in Version B, and see every line that was added, removed, or changed — with character-level highlighting inside modified lines.

Open the Diff Tool →

No account needed. Runs in your browser. Your text stays on your machine.