CmdName previously opened the running executable and streamed through
tens of MB searching for two 16-byte magic needles to find the Go
module info blob embedded by the linker, allocating a 64 KiB buffer on
every call. It was called at least twice during tailscaled startup
(logpolicy.LogsDir and logpolicy.Options.init on Windows) and once
more for tsweb's debug page title -- each call ~2.6 ms on a ~40 MB
tailscaled binary, and ~74 KB of allocator churn per call.
The same module info is exposed via runtime/debug.ReadBuildInfo, which
reads an already-resident string maintained by the Go runtime -- no
filesystem I/O. Use that, cached with sync.OnceValue so the lookup
happens at most once per process.
Benchmarks on Xeon 6975P-C (Linux amd64), against the version test
binary:
BenchmarkCmdName before: 540,094 ns/op 66,136 B/op 8 allocs/op
BenchmarkCmdName after: 2.5 ns/op 0 B/op 0 allocs/op
First-call (uncached) cost at tailscaled scale (71 deps in embedded
build info): 160 mallocs, ~11.5 KiB allocated, sub-microsecond. That
is smaller than the single 64 KiB scratch buffer the old code
allocated per call, and is now paid exactly once per process.
Stripped tailscaled binary size is unchanged: runtime/debug was
already imported transitively (including by sibling code in this same
package at version/version.go:121), so no new dependencies ship. The
hand-rolled byte scanner, 64 KiB scratch buffer, rsc.io/goversion-
derived magic constants, and the 32-second integration test that
built tailscaled to re-parse it are removed.
TestCmdNameNoAllocs asserts that, once primed, CmdName is 0 allocs
per call, guarding against regressions that reintroduce per-call
binary parsing. TestCmdNameFromBuildInfo verifies that CmdName is
pulling from the embedded build info rather than falling back to the
os.Executable basename.