Here's a question that sounds simple: "How big was my Docker image three months ago?"

If you were logging image sizes in CI, you might have a number. But which layer caused the 200MB increase between February and March? What Dockerfile change was responsible? When exactly did someone add that bloated dev dependency? Your CI logs have point-in-time snapshots, not a causal story.

And if you weren't capturing sizes all along, you can't recover them—not from Git history, not from anywhere—unless you rebuild the image from each historical point. When you do, you might get a different answer than you would have gotten three months ago.

This is the fundamental weirdness at the heart of Docker image archaeology, and it's what made building Docker Time Machine technically interesting. The tool walks through your Git history, checks out each commit, builds the Docker image from that historical state, and records metrics—size, layer count, build time. Simple in concept. Philosophically treacherous in practice.

The Irreproducibility Problem

Consider a Dockerfile from six months ago:

FROM ubuntu:22.04 RUN apt-get update && apt-get install -y nginx

What's the image size? Depends when you build it. ubuntu:22.04 today has different security patches than six months ago. The nginx package has been updated. The apt repository indices have changed. Build this Dockerfile today and you'll get a different image than you would have gotten in the past.

The tool makes a pragmatic choice: it accepts this irreproducibility. When it checks out a historical commit and builds the image, it's not recreating "what the image was"—it's creating "what the image would be if you built that Dockerfile today." For tracking Dockerfile-induced bloat (adding dependencies, changing build patterns), this is actually what you want. For forensic reconstruction, it's fundamentally insufficient.

The implementation leverages Docker's layer cache:

opts := build.ImageBuildOptions{ NoCache: false, // Reuse cached layers when possible PullParent: false, // Don't pull newer base images mid-analysis }

This might seem problematic—if you're reusing cached layers from previous commits, are you really measuring each historical state independently?

Here's the key insight: caching doesn't affect size measurements. A layer is 50MB whether Docker executed the RUN command fresh or pulled it from cache. The content is identical either way—that's the whole point of content-addressable storage.

Caching actually improves consistency. Consider two commits with identical RUN apk add nginx instructions. Without caching, both execute fresh, hitting the package repository twice. If a package was updated between builds (even seconds apart), you'd get different layer sizes for identical Dockerfile instructions. With caching, the second build reuses the first's layer—guaranteed identical, as it should be.

The only metric affected is build time, which is already disclaimed as "indicative only."

Layer Identity Is Philosophical

Docker layers have content-addressable identifiers—SHA256 hashes of their contents. Change one byte, get a different hash. This creates a problem for any tool trying to track image evolution: how do you identify "the same layer" across commits?

You can't use the hash. Two commits with identical RUN apt-get install nginx instructions will produce different layer hashes if any upstream layer changed, if the apt repositories served different package versions, or if the build happened on a different day (some packages embed timestamps).

The solution I landed on identifies layers by their intent, not their content:

type LayerComparison struct { LayerCommand string `json:"layer_command"` SizeByCommit map[string]float64 `json:"size_by_commit"` }

A layer is "the same" if it came from the same Dockerfile instruction. This is a semantic identity rather than a structural one. The layer that installs nginx in commit A and the layer that installs nginx in commit B are "the same layer" for comparison purposes, even though they contain entirely different bits.

This breaks down in edge cases. Rename a variable in a RUN command and it becomes a "different layer." Copy the exact same instruction to a different line and it's "different." The identity is purely textual.

The normalization logic tries to smooth over some of Docker's internal formatting:

func truncateLayerCommand(cmd string) string { cmd = strings.TrimPrefix(cmd, "/bin/sh -c ") cmd = strings.TrimPrefix(cmd, "#(nop) ") cmd = strings.TrimSpace(cmd) // ... }

The #(nop) prefix indicates metadata-only layers—LABEL or ENV instructions that don't create filesystem changes. Stripping these prefixes allows matching RUN apt-get install nginx across commits even when Docker's internal representation differs.

But it's fundamentally heuristic. There's no ground truth for "what layer corresponds to what" when layer content diverges.

Git Graphs Are Not Timelines

"Analyze the last 20 commits" sounds like it means "commits from the last few weeks." It doesn't. Git's commit graph is a directed acyclic graph, and traversal follows parent pointers, not timestamps.

commitIter, err := tm.repo.Log(&git.LogOptions{ From: ref.Hash(), All: false, })

Consider a rebase. You take commits from January, rebase them onto March's HEAD, and force-push. The rebased commits have new hashes and new committer timestamps, but the author date—what the tool displays—still says January.

Run the analysis requesting 20 commits. You'll traverse in parent-pointer order, which after the rebase is linearized. But the displayed dates might jump: March, March, March, January, January, February, January. The "20 most recent commits by ancestry" can span arbitrary calendar time.

Date filtering operates on top of this traversal:

if !sinceTime.IsZero() && c.Author.When.Before(sinceTime) { return nil // Skip commits before the since date }

This filters the parent-chain walk; it doesn't change traversal to be chronological. You're getting "commits reachable from HEAD that were authored after date X," not "all commits authored after date X." The distinction matters for repositories with complex merge histories.

The Filesystem Transaction Problem

The scariest part of the implementation is working-directory mutation. To build a historical image, you have to actually check out that historical state:

err = worktree.Checkout(&git.CheckoutOptions{ Hash: commit.Hash, Force: true, })

That Force: true is load-bearing and terrifying. It means "overwrite any local changes." If the tool crashes mid-analysis, the user's working directory is now at some random historical commit. Their in-progress work might be... somewhere.

The code attempts to restore state on completion:

// Restore original branch if originalRef.Name().IsBranch() { checkoutErr = worktree.Checkout(&git.CheckoutOptions{ Branch: originalRef.Name(), Force: true, }) } else { checkoutErr = worktree.Checkout(&git.CheckoutOptions{ Hash: originalRef.Hash(), Force: true, }) }

The branch-vs-hash distinction matters. If you were on main, you want to return to main (tracking upstream), not to the commit main happened to point at when you started. If you were in detached HEAD state, you want to return to that exact commit.

But what if the process is killed? What if the Docker daemon hangs and the user hits Ctrl-C? There's no transaction rollback. The working directory stays wherever it was.

A more robust implementation might use git worktree to create an isolated checkout, leaving the user's working directory untouched. But that requires complex cleanup logic—orphaned worktrees accumulate and consume disk space.

Building Docker Images Without Shelling Out

Rather than invoking docker build through the shell, the tool uses the Docker Engine API directly via the official Go SDK. This has a non-obvious benefit: no shell escaping.

When you shell out to a CLI, you construct a command string. That string gets parsed by the shell, which interprets quotes, backslashes, dollar signs. If any file in the build context has a weird name, or if the Dockerfile path contains spaces, you need careful escaping. Get it wrong and you have a security vulnerability or a mysterious failure.

The API approach sidesteps this entirely:

buildContext, err := b.createBuildContext(contextPath) // buildContext is an io.Reader containing a tar stream resp, err := b.client.ImageBuild(ctx, buildContext, opts)

The build context is a tar stream created programmatically with proper header encoding. The API receives structured data, not a shell command. No escaping needed.

The tar creation itself provides isolation:

err := filepath.Walk(contextPath, func(path string, info os.FileInfo, err error) error { if strings.HasPrefix(relPath, ".git") || strings.HasPrefix(relPath, ".dtm-cache") { if info.IsDir() { return filepath.SkipDir } return nil } // ... add to tar ... }) return bytes.NewReader(buf.Bytes()), nil

The entire build context is serialized to an in-memory buffer before being sent to the daemon. For large repositories this could be memory-intensive, but it provides a snapshot guarantee: the daemon receives the filesystem state at a moment in time, immune to concurrent modifications from other processes or from the tool's own commit-hopping.

Error Propagation Across Build Failures

When analyzing 20 commits, some will fail to build. Maybe the Dockerfile had a syntax error at that point in history. Maybe a required file didn't exist yet. How do you calculate meaningful size deltas?

The naive approach compares each commit to its immediate predecessor. But if commit #10 failed, what's the delta for commit #11? Comparing to a failed build is meaningless.

// Calculate size difference from previous successful build if i > 0 && result.Error == "" { for j := i - 1; j >= 0; j-- { if tm.results[j].Error == "" { result.SizeDiff = result.ImageSize - tm.results[j].ImageSize break } } }

This backwards scan finds the most recent successful build for comparison. Commit #11 gets compared to commit #9, skipping the failed #10.

The semantics are intentional: you want to know "how did the image change between working states?" A failed build doesn't represent a working state, so it shouldn't anchor comparisons. If three consecutive commits fail, the next successful build shows its delta from the last success, potentially spanning multiple commits worth of changes.

Edge case: if the first commit fails, nothing has a baseline. Later successful commits will show absolute sizes but no deltas—the loop never finds a successful predecessor, so SizeDiff remains at its zero value.

What You Actually Learn

After all this machinery, what does the analysis tell you?

You learn how your Dockerfile evolved—which instructions were added, removed, or modified, and approximately how those changes affected image size (modulo the irreproducibility problem). You learn which layers contribute most to total size. You can identify the commit where someone added a 500MB development dependency that shouldn't be in the production image.

You don't learn what your image actually was in production at any historical point. You don't learn whether a size change came from your Dockerfile or from upstream package updates. You don't learn anything about multi-stage build intermediate sizes (only the final image is measured).

The implementation acknowledges these limits. Build times are labeled "indicative only"—they depend on system load and cache state. Size comparisons are explicitly between rebuilds, not historical artifacts.

The interesting systems problem isn't in any individual component. Git traversal is well-understood. Docker builds are well-understood. The challenge is in coordinating two complex systems with different consistency models, different failure modes, and fundamentally different notions of identity.

The tool navigates this by making explicit choices: semantic layer identity over structural hashes, parent-chain traversal over chronological ordering, contemporary rebuilds over forensic reconstruction. Each choice has tradeoffs. The implementation tries to be honest about what container archaeology can and cannot recover from the geological strata of your Git history.