Parallelize the SSH integration tests across OS targets and reduce
per-container overhead:
- CI: use GitHub Actions matrix strategy to run all 4 OS containers
(ubuntu:focal, ubuntu:jammy, ubuntu:noble, alpine:latest) in parallel
instead of sequentially (~4x wall-clock improvement)
- Makefile: run docker builds in parallel for local dev too
- Dockerfile: consolidate ~20 separate RUN commands into 5 (one per
test phase), eliminating Docker layer overhead. Combine test binary
invocations where no state mutation is needed between them. Fix a bug
where TestDoDropPrivileges was silently not being run (was passed as a
second positional arg to -test.run instead of using regex alternation).
- TestMain: replace tail -F + 2s sleep with synchronous log read,
eliminating 2s overhead per test binary invocation. Set debugTest once
in TestMain instead of redundantly in each test function.
- session.read(): close channel on EOF so non-shell tests return
immediately instead of waiting for the 1s silence timeout.
Updates #19244
Change-Id: I2cc8588964fbce0dd7b654fb94e7ff33440b8584
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Start using a common helper for tests to declare that they require root.
This is step 1. A later step will then make this helper track which tests were
skipped so a subsequent pass will run these test as root.
Updates tailscale/corp#40007
Change-Id: I4979e1def0fa3691d38c83f48c89aaa443e7f62e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Validated against a modern Debian install, fixes a typo.
Updates #cleanup
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I7b26012f54dbd2f0f9fea98722e8edc2fe97645a
When a recording upload fails mid-session, the recording goroutine
cancels the session context. This triggers two concurrent paths:
exec.CommandContext kills the process (causing cmd.Wait to return),
and killProcessOnContextDone tries to write the termination message
via exitOnce.Do. If cmd.Wait returns first, the main goroutine's
exitOnce.Do(func(){}) steals the once, and the termination message
is never written to the client.
Fix by waiting for killProcessOnContextDone to finish writing the
termination message (via <-ss.exitHandled) before claiming exitOnce,
when the context is already done.
Also fix the fallback path when launchProcess itself fails due to
context cancellation: use SSHTerminationMessage() with the correct
"\r\n\r\n" framing instead of fmt.Fprintf with the internal error
string.
Deflakes TestSSHRecordingCancelsSessionsOnUploadFailure, which was
failing consistently at a low rate due to the exitOnce race. After
this fix, flakestress passes with 8,668 runs, 0 failures.
Fixes#7707 (again. hopefully for good.)
Change-Id: I5ab911c71574db8d3f9d979fb839f273be51ecf9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Brings in a newer version of Gliderlabs SSH with added socket forwarding support.
Fixes#12409Fixes#5295
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Commit f905871fb moved host key generation from the ipnLocalBackend
interface (GetSSH_HostKeys) to the standalone getHostKeys function,
which requires either system host keys in /etc/ssh/ or a valid
TailscaleVarRoot to generate keys into. The testBackend returned ""
for TailscaleVarRoot, and the Docker test containers only install
openssh-client (no server host keys), so getHostKeys always failed.
When getHostKeys fails, HandleSSHConn returns the error but never
closes the TCP connection, so SSH clients hang forever waiting for
the server hello.
Fix by creating a temp directory in TestMain and returning it from
testBackend.TailscaleVarRoot().
Regression from f905871fb #18949 ("ipn/ipnlocal, feature/ssh: move SSH code
out of LocalBackend to feature").
I was apparently too impatient to wait for the test to complete
and didn't connect the dots: https://github.com/tailscale/tailscale/actions/runs/22930275950
We should make that test faster (#19244) for the patience issue, but
also fail more nicely if this happens in the future.
Updates #19244
Change-Id: If82393b8f35413b04174e6f7d09a1ee3a2125a6b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a new vet analyzer that checks t.Run subtest names don't contain
characters requiring quoting when re-running via "go test -run". This
enforces the style guide rule: don't use spaces or punctuation in
subtest names.
The analyzer flags:
- Direct t.Run calls with string literal names containing spaces,
regex metacharacters, quotes, or other problematic characters
- Table-driven t.Run(tt.name, ...) calls where tt ranges over a
slice/map literal with bad name field values
Also fix all 978 existing violations across 81 test files, replacing
spaces with hyphens and shortening long sentence-like names to concise
hyphenated forms.
Updates #19242
Change-Id: Ib0ad96a111bd8e764582d1d4902fe2599454ab65
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This makes tsnet apps not depend on x/crypto/ssh and locks that in with a test.
It also paves the wave for tsnet apps to opt-in to SSH support via a
blank feature import in the future.
Updates #12614
Change-Id: Ica85628f89c8f015413b074f5001b82b27c953a9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
When a recording upload fails mid-session, killProcessOnContextDone
writes the termination message to ss.Stderr() and kills the process.
Meanwhile, run() takes the ss.ctx.Done() path and proceeds to
ss.Exit(), which tears down the SSH channel. The termination message
write races with the channel teardown, so the client sometimes never
receives it.
Fix by adding an exitHandled channel that killProcessOnContextDone
closes when done. run() now waits on this channel after ctx.Done()
fires, ensuring the termination message is fully written before
the SSH channel is torn down.
Fixes#7707
Change-Id: Ib60116c928d3af46d553a4186a72963c2c731e3e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I omitted a lot of the min/max modernizers because they didn't
result in more clear code.
Some of it's older "for x := range 123".
Also: errors.AsType, any, fmt.Appendf, etc.
Updates #18682
Change-Id: I83a451577f33877f962766a5b65ce86f7696471c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
OnPolicyChange can observe a conn in activeConns before authentication
completes. The previous `c.info == nil` guard was itself a data race
against clientAuth writing c.info, and even when c.info appeared
non-nil, c.localUser could still be nil, causing a nil pointer
dereference at c.localUser.Username.
Add an authCompleted atomic.Bool to conn, stored true after all auth
fields are written in clientAuth. OnPolicyChange checks this atomic
instead of c.info, which provides the memory barrier guaranteeing all
prior writes are visible to the concurrent reader.
Updates tailscale/corp#36268 (fixes, but we might want to cherry-pick)
Co-authored-by: Gesa Stupperich <gesa@tailscale.com>
Change-Id: I4c69843541f5f9f04add9bf431e320c65a203a39
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This file was never truly necessary and has never actually been used in
the history of Tailscale's open source releases.
A Brief History of AUTHORS files
---
The AUTHORS file was a pattern developed at Google, originally for
Chromium, then adopted by Go and a bunch of other projects. The problem
was that Chromium originally had a copyright line only recognizing
Google as the copyright holder. Because Google (and most open source
projects) do not require copyright assignemnt for contributions, each
contributor maintains their copyright. Some large corporate contributors
then tried to add their own name to the copyright line in the LICENSE
file or in file headers. This quickly becomes unwieldy, and puts a
tremendous burden on anyone building on top of Chromium, since the
license requires that they keep all copyright lines intact.
The compromise was to create an AUTHORS file that would list all of the
copyright holders. The LICENSE file and source file headers would then
include that list by reference, listing the copyright holder as "The
Chromium Authors".
This also become cumbersome to simply keep the file up to date with a
high rate of new contributors. Plus it's not always obvious who the
copyright holder is. Sometimes it is the individual making the
contribution, but many times it may be their employer. There is no way
for the proejct maintainer to know.
Eventually, Google changed their policy to no longer recommend trying to
keep the AUTHORS file up to date proactively, and instead to only add to
it when requested: https://opensource.google/docs/releasing/authors.
They are also clear that:
> Adding contributors to the AUTHORS file is entirely within the
> project's discretion and has no implications for copyright ownership.
It was primarily added to appease a small number of large contributors
that insisted that they be recognized as copyright holders (which was
entirely their right to do). But it's not truly necessary, and not even
the most accurate way of identifying contributors and/or copyright
holders.
In practice, we've never added anyone to our AUTHORS file. It only lists
Tailscale, so it's not really serving any purpose. It also causes
confusion because Tailscalars put the "Tailscale Inc & AUTHORS" header
in other open source repos which don't actually have an AUTHORS file, so
it's ambiguous what that means.
Instead, we just acknowledge that the contributors to Tailscale (whoever
they are) are copyright holders for their individual contributions. We
also have the benefit of using the DCO (developercertificate.org) which
provides some additional certification of their right to make the
contribution.
The source file changes were purely mechanical with:
git ls-files | xargs sed -i -e 's/\(Tailscale Inc &\) AUTHORS/\1 contributors/g'
Updates #cleanup
Change-Id: Ia101a4a3005adb9118051b3416f5a64a4a45987d
Signed-off-by: Will Norris <will@tailscale.com>
Send LOGIN audit messages to the kernel audit subsystem on Linux
when users successfully authenticate to Tailscale SSH. This provides
administrators with audit trail integration via auditd or journald,
recording details about both the Tailscale user (whois) and the
mapped local user account.
The implementation uses raw netlink sockets to send AUDIT_USER_LOGIN
messages to the kernel audit subsystem. It requires CAP_AUDIT_WRITE
capability, which is checked at runtime. If the capability is not
present, audit logging is silently skipped.
Audit messages are sent to the kernel (pid 0) and consumed by either
auditd (written to /var/log/audit/audit.log) or journald (available
via journalctl _TRANSPORT=audit), depending on system configuration.
Note: This may result in duplicate messages on a system where
auditd/journald audit logs are enabled and the system has and supports
`login -h`. Sadly Linux login code paths are still an inconsistent wild
west so we accept the potential duplication rather than trying to avoid
it.
Fixes#18332
Signed-off-by: James Tucker <james@tailscale.com>
Perform a path check first before attempting exec of `true`.
Try /usr/bin/true first, as that is now and increasingly so, the more
common and more portable path.
Fixes tests on macOS arm64 where exec was returning a different kind of
path error than previously checked.
Updates #16569
Signed-off-by: James Tucker <james@tailscale.com>
It has nothing to do with logtail and is confusing named like that.
Updates #cleanup
Updates #17323
Change-Id: Idd34587ba186a2416725f72ffc4c5778b0b9db4a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The Tracker was using direct callbacks to ipnlocal. This PR moves those
to be triggered via the eventbus.
Additionally, the eventbus is now closed on exit from tailscaled
explicitly, and health is now a SubSystem in tsd.
Updates #15160
Signed-off-by: Claus Lensbøl <claus@tailscale.com>
This is a follow-up to #15351, which fixed the test for Linux but not for
Darwin, which stores its "true" executable in /usr/bin instead of /bin.
Try both paths when not running on Windows.
In addition, disable CGo in the integration test build, which was causing the
linker to fail. These tests do not need CGo, and it appears we had some version
skew with the base image on the runners.
In addition, in error cases the recover step of the permissions check was
spuriously panicking and masking the "real" failure reason. Don't do that check
when a command was not produced.
Updates #15350
Change-Id: Icd91517f45c90f7554310ebf1c888cdfd109f43a
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Also add a trailing newline to error banners so that SSH client messages don't print on the same line.
Updates tailscale/corp#29138
Signed-off-by: Percy Wegmann <percy@tailscale.com>
As noted in #16048, the ./ssh/tailssh package failed to build on
Android, because GOOS=android also matches the "linux" build
tag. Exclude Android like iOS is excluded from macOS (darwin).
This now works:
$ GOOS=android go install ./ipn/ipnlocal ./ssh/tailssh
The original PR at #16048 is also fine, but this stops the problem
earlier.
Updates #16048
Change-Id: Ie4a6f6966a012e510c9cb11dd0d1fa88c48fac37
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In tailssh.go:1284, (*sshSession).startNewRecording starts a fire-and-forget goroutine that can
outlive the test that triggered its creation. Among other things, it uses ss.logf, and may call it
after the test has already returned. Since we typically use (*testing.T).Logf as the logger,
this results in a data race and causes flaky tests.
Ideally, we should fix the root cause and/or use a goroutines.Tracker to wait for the goroutine
to complete. But with the release approaching, it's too risky to make such changes now.
As a workaround, we update the tests to use tstest.WhileTestRunningLogger, which logs to t.Logf
while the test is running and disables logging once the test finishes, avoiding the race.
While there, we also fix TestSSHAuthFlow not to use log.Printf.
Updates #15568
Updates #7707 (probably related)
Signed-off-by: Nick Khyl <nickk@tailscale.com>
Commit 4b525fdda (ssh/tailssh: only chdir incubator process to user's
homedir when necessary and possible, 2024-08-16) defers changing the
working directory until the incubator process drops its privileges.
However, it didn't account for the case where there is no incubator
process, because no tailscaled was found on the PATH. In that case, it
only intended to run `tailscaled be-child` in the root directory but
accidentally ran everything there.
Fixes: #15350
Signed-off-by: Simon Law <sfllaw@sfllaw.ca>
Although, at the moment, we do not yet require an event bus to be present, as
we start to add more pieces we will want to ensure it is always available. Add
a new constructor and replace existing uses of new(tsd.System) throughout.
Update generated files for import changes.
Updates #15160
Change-Id: Ie5460985571ade87b8eac8b416948c7f49f0f64b
Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
And don't return a comma-separated string. That's kinda weird
signature-wise, and not needed by half the callers anyway. The callers
that care can do the join themselves.
Updates #cleanup
Change-Id: Ib5ad51a3c6b663d868eba14fe9dc54b2609cfb0d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Some clients don't request 'none' authentication. Instead, they immediately supply
a password or public key. This change allows them to do so, but ignores the supplied
credentials and authenticates using Tailscale instead.
Updates #14922
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This fork golang.org/x/crypto/ssh (at upstream x/crypto git rev e47973b1c1)
into tailscale.com/tempfork/sshtest/ssh so we can hack up the client in weird
ways to simulate other SSH clients seen in the wild.
Two changes were made to the files when they were copied from x/crypto:
* internal/poly1305 imports were replaced by the non-internal version;
no code changes otherwise. It didn't need the internal one.
* all decode-with-passphrase funcs were deleted, to avoid
using the internal package x/crypto/ssh/internal/bcrypt_pbkdf
Then the tests passed.
Updates #14969
Change-Id: Ibf1abebfe608c75fef4da0255314f65e54ce5077
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Shells on OpenBSD don't support the -l option. This means that when
handling SSH in-process, we can't give the user a login shell, but this
change at least allows connecting at all.
Updates #13338
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Shells on FreeBSD don't support the -l option. This means that when
handling SSH in-process, we can't give the user a login shell, but this
change at least allows connecting at all.
Updates #13338
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Originally implemented in 46fd4e58a27495263336b86ee961ee28d8c332b7,
which was reverted in b60f6b849af1fae1cf343be98f7fb1714c9ea165 to
keep the change out of v1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
This reverts commit 46fd4e58a27495263336b86ee961ee28d8c332b7.
We don't want to include this in 1.80 yet, but can add it back post 1.80.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
The upstream crypto package now supports sending banners at any time during
authentication, so the Tailscale fork of crypto/ssh is no longer necessary.
github.com/tailscale/golang-x-crypto is still needed for some custom ACME
autocert functionality.
tempfork/gliderlabs is still necessary because of a few other customizations,
mostly related to TTY handling.
Updates #8593
Signed-off-by: Percy Wegmann <percy@tailscale.com>
These erroneously blocked a recent PR, which I fixed by simply
re-running CI. But we might as well fix them anyway.
These are mostly `printf` to `print` and a couple of `!=` to `!Equal()`
Updates #cleanup
Signed-off-by: Will Norris <will@tailscale.com>
When we first made Tailscale SSH, we assumed people would want public
key support soon after. Turns out that hasn't been the case; people
love the Tailscale identity authentication and check mode.
In light of CVE-2024-45337, just remove all our public key code to not
distract people, and to make the code smaller. We can always get it
back from git if needed.
Updates tailscale/corp#25131
Updates golang/go#70779
Co-authored-by: Percy Wegmann <percy@tailscale.com>
Change-Id: I87a6e79c2215158766a81942227a18b247333c22
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The v2 endpoint supports HTTP/2 bidirectional streaming and acks for
received bytes. This is used to detect when a recorder disappears to
more quickly terminate the session.
Updates https://github.com/tailscale/corp/issues/24023
Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
This adds a new generic result type (motivated by golang/go#70084) to
try it out, and uses it in the new lineutil package (replacing the old
lineread package), changing that package to return iterators:
sometimes over []byte (when the input is all in memory), but sometimes
iterators over results of []byte, if errors might happen at runtime.
Updates #12912
Updates golang/go#70084
Change-Id: Iacdc1070e661b5fb163907b1e8b07ac7d51d3f83
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This allows passing through any environment variables that we set ourselves, for example DBUS_SESSION_BUS_ADDRESS.
Updates #11175
Co-authored-by: Mario Minardi <mario@tailscale.com>
Signed-off-by: Percy Wegmann <percy@tailscale.com>
Add logic to set environment variables that match the SSH rule's
`acceptEnv` settings in the SSH session's environment.
Updates https://github.com/tailscale/corp/issues/22775
Signed-off-by: Mario Minardi <mario@tailscale.com>
Add logic for parsing and matching against our planned format for
AcceptEnv values. Namely, this supports direct matches against string
values and matching where * and ? are treated as wildcard characters
which match against an arbitrary number of characters and a single
character respectively.
Actually using this logic in non-test code will come in subsequent
changes.
Updates https://github.com/tailscale/corp/issues/22775
Signed-off-by: Mario Minardi <mario@tailscale.com>
this commit changes usermetrics to be non-global, this is a building
block for correct metrics if a go process runs multiple tsnets or
in tests.
Updates #13420
Updates tailscale/corp#22075
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
And update a few callers as examples of motivation. (there are a
couple others, but these are the ones where it's prettier)
Updates #cleanup
Change-Id: Ic8c5cb7af0a59c6e790a599136b591ebe16d38eb
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>