feat: add socket auditor for forwarding logs to coder agent #124

zedkipp · 2025-12-19T00:20:13Z

Add SocketAuditor that sends audit logs to the Coder workspace agent via a Unix socket. This enables boundary audit events to be forwarded to coderd for centralized logging.

Features:

Batching: 10 logs or 5 seconds, whichever comes first
Wire format: length-prefixed protobuf (proto imported from AgentAPI) to make boundary -> agent -> coderd simple to start

RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba?pvs=23
Corresponding PR in coder/coder coder/coder#21345
coder/coder#21280

go.mod

audit/socket_auditor.go

Add SocketAuditor that sends audit logs to the Coder workspace agent via a Unix socket. This enables boundary audit events to be forwarded to coderd for centralized logging. Implementation notes: - Batching: 10 logs or 5 seconds, whichever comes first - Wire format: tag & length prefixed protobuf. proto imported from AgentAPI to simplify boundary -> agent -> coderd forwarding to start. - CLI and config flag to disable sending of audit logs to workspace agent as an escape hatch

dannykopping

Spotted two fairly significant shortcomings which I think need to be addressed before this can land.

I'm not going to block merge because this is not my project, but I highly recommend these be addressed before proceeding.

dannykopping · 2025-12-24T07:33:46Z

audit/socket_auditor.go

+	defaultBatchSize          = 10
+	defaultBatchTimerDuration = 5 * time.Second


Magic numbers should be documented please.

dannykopping · 2025-12-24T07:40:11Z

audit/socket_auditor.go

+	defaultBatchTimerDuration = 5 * time.Second
+	// DefaultAuditSocketPath is the well-known path for the boundary audit socket.
+	// The expectation is the Coder agent listens on this socket to receive audit logs.
+	DefaultAuditSocketPath = "/tmp/boundary-audit.sock"


/tmp is not guaranteed to exist.

dannykopping · 2025-12-24T07:40:48Z

audit/socket_auditor.go

+		return &flushErr{err: err, permanent: true}
+	}
+
+	if len(data) > 1<<28 {


Please use constants instead of magic numbers, and document how this relates to the wire protocol.

dannykopping · 2025-12-24T07:43:19Z

audit/socket_auditor.go

+		}
+
+		if err := flush(conn, batch); err != nil {
+			s.logger.Warn("failed to flush audit logs", "error", err)


Nit:

Suggested change

s.logger.Warn("failed to flush audit logs", "error", err)

s.logger.Warn("failed to flush audit logs", slog.Error(err))

dannykopping · 2025-12-24T07:43:44Z

audit/socket_auditor.go

+			s.logger.Warn("failed to flush audit logs", "error", err)
+			if err.permanent {
+				// Data error: discard batch to avoid infinite retries.
+				clearBatch()


This needs to be logged.

dannykopping · 2025-12-24T07:54:51Z

audit/socket_auditor.go

+			if len(batch) >= s.batchSize {
+				doFlush()
+				if len(batch) >= s.batchSize {
+					s.logger.Warn("audit log dropped, batch full")


I wonder if we should report something up to the agent in this case.
If operators never look at the boundary logs, they'll never know that logs have been silently dropped.
The agent could receive a payload which informs it of how many were dropped and export a metric.

dannykopping · 2025-12-24T07:57:29Z

audit/socket_auditor.go

+		}
+		connect()
+		if conn == nil {
+			// No connection: logs will be retried on next flush.


This should be logged.

dannykopping · 2025-12-24T08:01:44Z

audit/socket_auditor.go

+			doFlush()
+			closeConn()
+			return
+		case <-t.C:


Nit: t := time.NewTimer(0) will tick immediately, so if the batch is empty this will be a noop.

dannykopping · 2025-12-24T08:05:58Z

audit/socket_auditor.go

+		connect()
+		if conn == nil {
+			// No connection: logs will be retried on next flush.
+			return


‼️ t.Stop was called upfront, so when you return here you won't get any further ticks until the buffer is full.

This could lead to buffered logs being stuck until the buffer fills up, which may be a very long time.

dannykopping · 2025-12-24T08:10:52Z

audit/socket_auditor_test.go

+
+// startTestServer starts a Unix socket server that reads length-prefixed protobuf messages,
+// and reports all received requests to the given channel.
+func startTestServer(t *testing.T, socketPath string, received chan<- *agentproto.ReportBoundaryLogsRequest) {


‼️ This is a red flag for me; you're reimplementing the upstream server for your tests.
This will undoubtably lead to inconsistencies.

I'd suggest extracting the message framing code into its own package which can be reused; that way you centralize the core logic.

func WriteFrame(w io.Writer, tag byte, data []byte) error func ReadFrame(r io.Reader, maxSize int) (tag byte, data []byte, err error)

Then in this test you can simply set up a net.Pipe instead of leaking implementation detail from upstream.
And likewise in the related coder PR you don't have to leak any details from this library (agent/boundary_logs_test.go -> sendBoundaryLogsRequest).

zedkipp commented Dec 19, 2025

View reviewed changes

go.mod Outdated Show resolved Hide resolved

zedkipp force-pushed the zedkipp/socket-auditor branch 3 times, most recently from fac8048 to a2ea4f9 Compare December 19, 2025 21:37

zedkipp mentioned this pull request Dec 20, 2025

feat: add boundary log forwarding from agent to coderd coder/coder#21345

Open

zedkipp marked this pull request as ready for review December 23, 2025 21:39

zedkipp requested a review from evgeniy-scherbina December 23, 2025 21:39

zedkipp commented Dec 24, 2025

View reviewed changes

audit/socket_auditor.go Show resolved Hide resolved

zedkipp force-pushed the zedkipp/socket-auditor branch from a2ea4f9 to 2365931 Compare December 24, 2025 00:13

dannykopping reviewed Dec 24, 2025

View reviewed changes

		defaultBatchSize = 10
		defaultBatchTimerDuration = 5 * time.Second

	s.logger.Warn("failed to flush audit logs", "error", err)
	s.logger.Warn("failed to flush audit logs", slog.Error(err))

feat: add socket auditor for forwarding logs to coder agent #124

Are you sure you want to change the base?

feat: add socket auditor for forwarding logs to coder agent #124

Uh oh!

Conversation

zedkipp commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zedkipp commented Dec 19, 2025 •

edited

Loading