Hash chains and tamper detection: how one changed byte invalidates an entire archive

When someone says a document archive is “tamper-evident,” the claim is specific: any modification to any record produces a detectable inconsistency. Not tamper-proof — that is a different and stronger claim. Tamper-evident means that if the archive has been modified, you can find the modification.

Hash chains are the standard construction for achieving this property. They are not new — the concept predates blockchain by decades — but they are widely misunderstood, which leads to both overconfidence and underuse.

This article explains how hash chains work, what they guarantee, what they do not guarantee, and why the specific implementation choices matter.

What a cryptographic hash is

A cryptographic hash function takes an input of arbitrary length and produces a fixed-length output, called the hash or digest. SHA-384 produces a 384-bit (48-byte) output, typically represented as 96 hex characters.

Three properties make hash functions useful for tamper detection:

Determinism. The same input always produces the same output. If you hash the same document twice, you get the same hash.

Avalanche effect. A small change in the input produces a completely different output. Changing one byte in a 10,000-byte document produces a hash that bears no resemblance to the original. There is no way to predict what the new hash will look like.

One-way. Given a hash, it is computationally infeasible to reconstruct the input. You cannot work backwards from the hash to the document.

The third property is what makes hash chains binding: once you publish or record a hash, you have committed to the content that produced it without revealing the content itself.

The basic construction

A hash chain links a sequence of events so that each event depends on all previous events.

In SealDoc’s implementation, an evidence event contains:

The event type (e.g. vault.ingested, anchor.rfc3161)
A sequence number
A timestamp
Details specific to the event (as canonical JSON)
The hash of the previous event
The hash of this event, computed over all the above fields

The hash of each event includes the hash of the previous event. This means that to reproduce the hash of event N, you need the exact content of event N and the exact hash of event N-1. The hash of event N-1 depended on event N-2, and so on back to the first event.

The first event uses a fixed genesis value as its “previous hash” — a well-known constant that everyone can verify.

This creates a chain where:

Hash(Event 1) = H("vault.ingested" | seq:1 | ts:... | details:... | genesis)
Hash(Event 2) = H("anchor.rfc3161" | seq:2 | ts:... | details:... | Hash(Event 1))
Hash(Event 3) = H("vault.storage-lock.applied" | seq:3 | ts:... | details:... | Hash(Event 2))

What happens when you tamper with it

Suppose an adversary wants to modify the details of Event 1. They change a field in the event, then recompute the hash of Event 1. The new hash is different from the original.

But the hash of Event 2 was computed using the original hash of Event 1. The adversary now needs to recompute the hash of Event 2 using the new hash of Event 1. That produces a different hash for Event 2.

Now Event 3 has the wrong previous hash. And so on for every subsequent event.

To make the modification undetectable, the adversary needs to recompute the hashes of every event from the tampered point forward. In a local, self-contained system, this is possible. The chain alone does not prevent this.

This is why anchoring matters.

Anchoring: making the chain externally verifiable

A hash chain that only exists in your own database proves integrity within your own system. An adversary who controls your database can rewrite the whole chain. The chain needs to be anchored to something outside your control.

The standard mechanism is an RFC 3161 timestamp. The flow works like this:

At the point of document creation, compute the hash of the evidence event.
Submit that hash to an independent Time Stamping Authority (TSA).
The TSA signs a response containing your hash, the TSA’s timestamp, and the TSA’s identity. This signed response is the timestamp token.
Store the timestamp token alongside the hash chain.

The timestamp token is signed by the TSA’s private key. The TSA’s certificate chain is publicly verifiable. Anyone with the document, the original hash, and the timestamp token can verify independently that the document existed in that state at that time, without trusting you at all.

Now the adversary’s problem is harder. To modify Event 1 undetectably, they need to:

Modify the event and recompute all downstream hashes.
Produce a new RFC 3161 timestamp from a trusted TSA for the new hash.

The second step is impossible retroactively. The TSA’s timestamp has a time built into it. A timestamp claiming the modified document existed at the original time would need to be issued by the TSA at that original time, which the adversary cannot arrange after the fact.

This is what “tamper-evident” means in practice: not that modification is prevented at the physical level, but that any modification that an adversary wants to go undetected requires either compromising the independent TSA’s signature or rewriting history, both of which are computationally infeasible.

Why SHA-384 specifically

SHA-256 is the most commonly used hash algorithm for this class of application. SealDoc uses SHA-384 instead, which produces a longer digest (384 bits versus 256 bits).

The practical difference is margin. SHA-256 is currently considered secure, but cryptographic recommendations evolve. SHA-384 belongs to the same SHA-2 family and is computed identically, just with a larger internal state. For archives with long retention windows — 7 years for EU VAT invoices, up to 20 years for certain public sector records — the additional margin matters.

The longer digest also reduces the (already negligible) probability of hash collisions, where two different documents produce the same hash. For compliance archives, this is a meaningful property: two different invoices should never produce the same identifier.

Verifying the chain

Verification works backwards from the latest event. Given a set of events in sequence order:

For each event, recompute the hash using the stored fields: event type, sequence number, timestamp, details, and previous hash.
Compare the recomputed hash to the stored hash. If they match, the event content has not been changed since the hash was computed.
Check that each event’s previousHash matches the stored hash of the preceding event. If they match, the chain is unbroken.
For any event that carries an RFC 3161 timestamp anchor, verify the timestamp token against the TSA’s certificate chain.

SealDoc’s GET /api/vault/{recordId}/verify endpoint performs all three checks and returns a per-event breakdown:

{
  "chainValid": true,
  "eventCount": 3,
  "events": [
    {
      "sequenceNo": 1,
      "eventType": "vault.ingested",
      "hashVerified": true,
      "chainLinkVerified": true,
      "signaturePresent": true,
      "signatureVerified": true
    }
  ]
}

hashVerified means the recomputed hash matches the stored hash. chainLinkVerified means the previous hash pointer is consistent. signatureVerified means the ECDSA signature on the event hash is valid against the known signing key.

A chain where every event returns hashVerified: true and chainLinkVerified: true is a chain where no stored data has been modified. The anchor timestamps provide the external, TSA-verifiable proof of when each event occurred.

What the chain does not guarantee

Tamper-evidence is a specific and bounded property. It is worth being precise about what it does and does not cover.

It does not prevent deletion. A hash chain detects modification of existing records but cannot detect that a record was deleted. If Event 2 in a five-event chain is removed entirely, the chain breaks — but only if the verifier knows there should be five events, not three. For this reason, retention enforcement operates at the storage layer separately from the hash chain. MinIO Object Lock COMPLIANCE mode is designed to prevent deletion via the S3 API before the retention period expires.

It does not prevent the source system from lying. If the document was corrupted before it entered the pipeline, the hash chain commits to that corrupted version. The chain proves integrity from the point of ingestion forward, not from the point of creation. This is why validation — PDF/A conformance checking, Factur-X schema validation — happens before the hash is committed.

The chain is only as strong as its anchors. A chain with no external RFC 3161 anchors is a self-sealing system. It proves internal consistency but not temporal binding. For any serious compliance use case, timestamps anchored to an independent TSA are necessary.

The practical upshot for engineers

If you are building a document pipeline and you need tamper-evidence, the implementation is straightforward:

At ingestion, compute a hash of the document using SHA-256 or SHA-384.
Submit the hash to an RFC 3161 TSA and store the returned timestamp token.
For every subsequent event, link the event hash to the previous event hash.
Store the chain in append-only storage (no updates or deletes on the events table).
Expose a verify endpoint that recomputes and checks the chain on demand.

The conceptual complexity is low. The implementation complexity comes from operational concerns: key management, TSA reliability, chain migration if you change the hash algorithm, and handling the case where TSA submission fails mid-transaction.

SealDoc handles these concerns as infrastructure, so pipelines that route documents through the API get the chain without implementing it. But if you are building it yourself, the above is the minimum viable construction.

The SealDoc Evidence Vault produces a hash-chained event log for every document, with RFC 3161 timestamp anchors from a trusted Time Stamping Authority. The chain and timestamps are verifiable via GET /api/vault/{recordId}/verify without trusting the SealDoc infrastructure.