Make TestIntegrationExitCodes and TestOpenSSHExitCodes diagnose their
own setup and tolerate transport-level noise without masking the bug
they assert.
Pre-flight in TestMain: log GOOS, GOARCH, euid, hostname,
TAILSCALED_PATH, the test user's uid/gid/home/login shell, ssh -V,
and pre-create host keys. Each invariant is fail-fast with a clear
message instead of leaving the failure to surface as a sub-second
test crash with no log output, which is exactly what was happening
on the macos-latest GitHub runner.
Retries: both tests now retry up to 3 times on transport-level
failures (dial errors, ssh exit code 255, non-*ssh.ExitError errors)
with linear backoff. An exit-code mismatch — the actual behavior the
fix from #18256 is asserting — never retries and fails loudly. This
keeps stability without hiding regressions.
OpenSSH client args: ConnectTimeout bumped from 5s to 15s, plus
IdentityAgent=none and PreferredAuthentications=none so the auth
path is pinned across OpenSSH versions on macOS instead of letting
the bundled LibreSSL fork pick a different fallback.
Per-test diagnostics: dumpIncubatorLogOnFail prints
/tmp/tailscalessh.log (where the incubator subprocess writes) into
the test output when a subtest fails. Previously the log was only
printed at end-of-binary, after all subtests, which made
attributing output to a specific failure painful.
dialTestClientForUser is a non-fatal variant of testClientForUser
that returns the dial error so the retry loop can act on it.
Verified locally on Linux: 20 consecutive runs of both tests with
the full incubator code path (tailscaled be-child ssh re-exec, drop
privileges, login shell exec) and 5 consecutive runs under -race.
No flakes; exit codes 0, 42, and 127 all propagate correctly.
Updates #18256
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Make the OpenSSH integration test deterministic on macOS while preserving the meaningful Tailscale SSH authentication path: use an empty client config, disable public-key/password/keyboard-interactive/GSSAPI follow-up methods, and rely on the initial SSH none-auth request that tailssh handles through clientAuth.
Log the OpenSSH version and capture verbose SSH output so future runner-specific failures expose the client-side reason.
Updates #18256
Change-Id: I5ea2dedd45f0294a053cee0f5a46cfa3cf2d993f
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Use the existing macOS runner account for focused exit-status coverage instead of provisioning a synthetic Directory Services user.
Also drain Go SSH client output and add a per-command timeout so stderr output cannot block the test process until the global test timeout expires.
Updates #18256
Change-Id: Ic4a0f391c56210023ece20c13d8627b0f5ad68e7
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Add coverage that exercises tailssh through the real ssh client so the client-visible exit status ordering is checked, including command-not-found behavior.
Updates #18256
Change-Id: If2bae5b337d213390f4a9788501c1a59aea2eafb
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Add integration tests exercising the exit-status ordering fix and
related improvements:
TestIntegrationExitCodes: verifies exit code 0 (success), 42
(passthrough), and 127 (command not found) are delivered to the SSH
client through the full server stack.
TestLocalUnixForwardingHalfClose: verifies that when one direction of
a Unix socket forwarding tunnel finishes, the other direction still
completes. A service reads all input then sends a delayed response;
the client closes its write side and verifies the response arrives.
This directly tests the bicopy half-close fix where the old
cancel-on-first-direction approach would drop in-flight data.
TestIntegrationSIGHUP: verifies that child processes receive SIGHUP
(not SIGKILL) when an SSH session is terminated, matching POSIX
terminal disconnect semantics.
Updates #18256
Change-Id: I5206f48ee6f9d68f749755fd0378388963be423c
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Add two narrower accessors alongside the existing
[LocalBackend.NetMap], with docs that distinguish their semantics:
- NetMapNoPeers: cheap (returns the cached *netmap.NetworkMap with
a possibly-stale Peers slice). For callers that only read non-Peers
fields like SelfNode, DNS, PacketFilter, capabilities.
- NetMapWithPeers: documented as returning an up-to-date Peers slice.
For callers that genuinely need to iterate Peers or call
PeerByXxx.
Mark the existing NetMap deprecated and point readers at the two new
accessors. NetMap, NetMapNoPeers, and NetMapWithPeers all currently
return the same value (b.currentNode().NetMap()): this commit is a
no-op behaviorally, just a renaming and migration of in-tree callers.
A subsequent change in the same series will switch
NetMapWithPeers to actually rebuild the Peers slice from the live
per-node-backend peers map (O(N) per call), at which point the
distinction between the two new accessors becomes load-bearing.
Migrate in-tree callers to the appropriate accessor based on what
fields they read:
- NetMapNoPeers (most common): localapi handlers, peerapi accept,
GetCertPEMWithValidity, web client noise request, doctor DNS
resolver check, tsnet CertDomains/TailscaleIPs, ssh/tailssh
SSH-policy/cap reads, several LocalBackend internals
(isLocalIP, allowExitNodeDNSProxyToServeName, pauseForNetwork
nil-check, serve config).
- NetMapWithPeers: writeNetmapToDiskLocked (persist full netmap to
disk for fast restart), PeerByTailscaleIP lookup.
Tests still call the legacy NetMap; they'll see the deprecation
warning but otherwise behave identically.
Also add two pieces of plumbing the next change in this series will
need, but which are already useful on their own:
- [client/local.GetDebugResultJSON]: a generic [Client.DebugResultJSON]
that decodes directly into a target type T, avoiding the
marshal/unmarshal roundtrip callers otherwise need.
- localapi "current-netmap" debug action: returns the current
netmap (with peers) as JSON. Documented as debug-only — the
netmap.NetworkMap shape is internal and may change without notice.
This commit is part of a series breaking up a larger change for
review; on its own it is a no-op refactor.
Updates #12542
Change-Id: Idbb30707414f8da3149c44ca0273262708375b02
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Parallelize the SSH integration tests across OS targets and reduce
per-container overhead:
- CI: use GitHub Actions matrix strategy to run all 4 OS containers
(ubuntu:focal, ubuntu:jammy, ubuntu:noble, alpine:latest) in parallel
instead of sequentially (~4x wall-clock improvement)
- Makefile: run docker builds in parallel for local dev too
- Dockerfile: consolidate ~20 separate RUN commands into 5 (one per
test phase), eliminating Docker layer overhead. Combine test binary
invocations where no state mutation is needed between them. Fix a bug
where TestDoDropPrivileges was silently not being run (was passed as a
second positional arg to -test.run instead of using regex alternation).
- TestMain: replace tail -F + 2s sleep with synchronous log read,
eliminating 2s overhead per test binary invocation. Set debugTest once
in TestMain instead of redundantly in each test function.
- session.read(): close channel on EOF so non-shell tests return
immediately instead of waiting for the 1s silence timeout.
Updates #19244
Change-Id: I2cc8588964fbce0dd7b654fb94e7ff33440b8584
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I'm not sure how this file got into the repo without gofmt.
Maybe gofmt rules changed in some Go release?
Updates #cleanup
Change-Id: Ia8bd46e29f116f7fbfca11be80c8ef48699cd9f2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Brings in a newer version of Gliderlabs SSH with added socket forwarding support.
Fixes#12409Fixes#5295
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Commit f905871fb moved host key generation from the ipnLocalBackend
interface (GetSSH_HostKeys) to the standalone getHostKeys function,
which requires either system host keys in /etc/ssh/ or a valid
TailscaleVarRoot to generate keys into. The testBackend returned ""
for TailscaleVarRoot, and the Docker test containers only install
openssh-client (no server host keys), so getHostKeys always failed.
When getHostKeys fails, HandleSSHConn returns the error but never
closes the TCP connection, so SSH clients hang forever waiting for
the server hello.
Fix by creating a temp directory in TestMain and returning it from
testBackend.TailscaleVarRoot().
Regression from f905871fb #18949 ("ipn/ipnlocal, feature/ssh: move SSH code
out of LocalBackend to feature").
I was apparently too impatient to wait for the test to complete
and didn't connect the dots: https://github.com/tailscale/tailscale/actions/runs/22930275950
We should make that test faster (#19244) for the patience issue, but
also fail more nicely if this happens in the future.
Updates #19244
Change-Id: If82393b8f35413b04174e6f7d09a1ee3a2125a6b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.
It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.
Updates #12614
Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Some clients don't request 'none' authentication. Instead, they immediately supply
a password or public key. This change allows them to do so, but ignores the supplied
credentials and authenticates using Tailscale instead.
Updates #14922
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Originally implemented in 46fd4e58a27495263336b86ee961ee28d8c332b7,
which was reverted in b60f6b849af1fae1cf343be98f7fb1714c9ea165 to
keep the change out of v1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This reverts commit 46fd4e58a27495263336b86ee961ee28d8c332b7.
We don't want to include this in 1.80 yet, but can add it back post 1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Add logic to set environment variables that match the SSH rule's
`acceptEnv` settings in the SSH session's environment.
Updates https://github.com/tailscale/corp/issues/22775
Signed-off-by: Mario Minardi <mario@tailscale.com>
Instead of changing the working directory before launching the incubator process,
this now just changes the working directory after dropping privileges, at which
point we're more likely to be able to enter the user's home directory since we're
running as the user.
For paths that use the 'login' or 'su -l' commands, those already take care of changing
the working directory to the user's home directory.
Fixes#13120
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This involved the following:
1. Pass the su command path as first of args in call to unix.Exec to make sure that busybox sees the correct program name.
Busybox is a single executable userspace that implements various core userspace commands in a single binary. You'll
see it used via symlinking, so that for example /bin/su symlinks to /bin/busybox. Busybox knows that you're trying
to execute /bin/su because argv[0] is '/bin/su'. When we called unix.Exec, we weren't including the program name for
argv[0], which caused busybox to fail with 'applet not found', meaning that it didn't know which command it was
supposed to run.
2. Tell su to whitelist the SSH_AUTH_SOCK environment variable in order to support ssh agent forwarding.
3. Run integration tests on alpine, which uses busybox.
4. Increment CurrentCapabilityVersion to allow turning on SSH V2 behavior from control.
Fixes#12849
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This allows the SSH_AUTH_SOCK environment variable to work inside of
su and agent forwarding to succeed.
Fixes#12467
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This allows pam authentication to run for ssh sessions, triggering
automation like pam_mkhomedir.
Updates #11854
Signed-off-by: Percy Wegmann <percy@tailscale.com>