Tamper detection in document archives: how hash chains work
You cannot verify that a document has not been tampered with just by looking at it. A modified PDF looks identical to the original. A substituted file has the same filename. A silently edited audit log entry leaves no visible trace.
Tamper detection requires a mechanism that makes modification produce a detectable artifact, ideally one that is verifiable by someone who was not present at the original creation. Hash chains are one of the core mechanisms that achieve this.
What a cryptographic hash does
A cryptographic hash function takes an input of arbitrary length and produces a fixed-length output (the digest) with two critical properties:
- Determinism: the same input always produces the same output
- Avalanche effect: any change to the input, even a single bit, produces a completely different output
SHA-256 produces a 256-bit (32-byte) digest. For any practical input, it is computationally infeasible to:
- Find two different inputs that produce the same output (collision resistance)
- Reconstruct the input from the output (preimage resistance)
- Produce a specific output without knowing the input (second preimage resistance)
These properties mean a SHA-256 hash is a reliable fingerprint of a document. If the hash matches, the document is identical to what was hashed. If the hash does not match, something changed.
var hash = SHA256.HashData(File.ReadAllBytes("invoice.pdf"));
Console.WriteLine(Convert.ToHexString(hash));
// A3F8C2...1B7D (64 hex chars, always the same for this file, always different if file changes)
Why a single hash is not enough
Hashing a document gives you a fingerprint at one point in time. But who stores that fingerprint, and what prevents someone from replacing both the document and the stored hash?
If you store the hash in the same system as the document, and you control that system, a determined attacker (or an internal actor covering tracks) can replace both. The hash still matches the new document. The tampering is undetectable.
This is why tamper detection requires an external anchor: a hash stored somewhere the custodian cannot unilaterally modify. An RFC 3161 timestamp is the standard mechanism: it embeds the document hash inside a token signed by a third-party TSA. The custodian cannot alter the token without the TSA’s private key.
What a hash chain adds
A hash chain extends this to a sequence of events. Instead of hashing only the document, you hash each audit entry and include the previous entry’s hash in the current one.
Hash(Entry N) = SHA256( EventData(N) + Hash(Entry N-1) )
This creates a chain where each entry commits to all previous entries. If you change Entry 3 in a chain of 10 entries, Hash(Entry 3) changes. Because Entry 4 includes Hash(Entry 3), Hash(Entry 4) changes. And so on through to Entry 10.
You cannot silently modify a past entry. The chain will not validate.
Here is a minimal implementation:
public sealed record AuditEntry(
string EventType,
string Actor,
DateTimeOffset Timestamp,
string DocumentHash,
string PreviousEntryHash)
{
public string ComputeHash()
{
var raw = $"{EventType}|{Actor}|{Timestamp:O}|{DocumentHash}|{PreviousEntryHash}";
return Convert.ToHexString(SHA256.HashData(Encoding.UTF8.GetBytes(raw)));
}
}
public static string AppendEntry(IList<AuditEntry> chain, string eventType,
string actor, byte[] documentBytes)
{
var previousHash = chain.Count == 0
? new string('0', 64)
: chain[^1].ComputeHash();
var docHash = Convert.ToHexString(SHA256.HashData(documentBytes));
var entry = new AuditEntry(eventType, actor, DateTimeOffset.UtcNow, docHash, previousHash);
chain.Add(entry);
return entry.ComputeHash();
}
public static bool VerifyChain(IList<AuditEntry> chain)
{
var expectedPrevious = new string('0', 64);
foreach (var entry in chain)
{
if (entry.PreviousEntryHash != expectedPrevious) return false;
expectedPrevious = entry.ComputeHash();
}
return true;
}
Verification is O(n) and requires no external calls. Anyone with the chain can run VerifyChain and confirm that no entry has been modified.
Anchoring the chain with a timestamp
A verified hash chain proves internal consistency: no entry was modified after the chain was built. But it does not prove when the chain was built.
If you build a fraudulent chain today and claim it represents events from three years ago, the chain itself will verify correctly. The hash chain does not have a clock.
This is where RFC 3161 timestamps anchor the chain to real time. At key points in the chain (at minimum: when the chain is finalized; ideally at each significant event), request a timestamp from a qualified TSA. The TSA embeds the current entry’s hash in a signed token with a trusted timestamp.
Now the chain has an external anchor. You cannot backdate the chain past the oldest timestamp, because the TSA’s token proves the hash existed at that point in time, and the TSA’s clock is authoritative.
// After appending an entry, request a timestamp for the entry hash
var entryHash = AppendEntry(chain, "created", "api:tenant-1", documentBytes);
var entryHashBytes = Convert.FromHexString(entryHash);
var timestampToken = await RequestTimestampAsync(entryHashBytes, tsaUrl);
// Store the token alongside the chain entry
chain[^1] = chain[^1] with { TimestampToken = timestampToken };
An auditor verifying the chain can:
- Verify the hash chain is internally consistent (no modifications)
- Verify each timestamp token against the TSA’s public certificate (no backdating)
- Verify the document hash in each entry against the current document (no substitution)
Hash chains vs. other integrity mechanisms
Append-only logs (e.g., database tables with no DELETE): Prevent deletion via access controls, but are only as trustworthy as the access control system. A database administrator can bypass them. Hash chains are mathematically enforced, not policy-enforced.
Digital signatures on individual documents: A signature proves a document was signed by a specific key at signing time. It does not record what happened to the document afterwards. Chain of custody requires recording post-signing events (transmission, conversion, archiving), not just the signing moment.
Blockchain: Uses hash chains as a core primitive, but adds distributed consensus to prevent any single party from rewriting history. For document archives, a local hash chain anchored by external RFC 3161 timestamps provides the same tamper-evidence guarantee without the complexity of a distributed network. The distinction matters: you do not need consensus between strangers to prove a document was not modified; you need an external anchor you do not control.
WORM storage (Write Once Read Many): Prevents modification at the storage layer. The medium physically cannot be overwritten. This is complementary to hash chains, not a replacement. WORM proves the stored bits have not changed. A hash chain proves what sequence of events produced those bits. Both together provide stronger guarantees than either alone.
What verification looks like in practice
A complete verification workflow for a document in a hash-chained archive:
- Retrieve the document and the audit trail from the archive
- Compute SHA-256 of the document; confirm it matches the
documentHashin the latest chain entry - Walk the chain backwards, recomputing each
entryHashand confirming it matches thepreviousEntryHashin the next entry - For each entry with a timestamp token, verify the token against the TSA’s certificate and confirm the hash in the token matches the computed
entryHash - Confirm the document hash in step 2 matches the document hash in the earliest entry that covers the document’s final form
If all five steps pass, the document is verified as unmodified since its creation, the audit trail accurately records every processing step, and the timestamps provide externally verifiable time anchors.
This verification requires: the document, the audit trail, the timestamp tokens, and the TSA’s public certificates. No network access. No dependency on the custodian’s systems being operational. The evidence is self-contained.
Tamper detection as infrastructure
Most organizations do not implement hash chains because they feel like an advanced feature. They are not. SHA-256 is available in every modern runtime. The chain construction is a few dozen lines of code. The benefit is that you can prove document integrity to a skeptical third party without relying on their trust of your organization.
For regulated industries (finance, healthcare, insurance, government), that proof is not optional. For any organization that wants to present digital documents as evidence in a dispute, it is the baseline.
SealDoc implements hash-chained audit trails for every document in its pipeline. The chain is exported in the Legal Evidence Pack and is verifiable offline. For organizations that need to build their own evidence infrastructure, the SealDoc API exposes the audit trail and timestamp tokens as structured data you can incorporate into your own verification workflows.