The Subtle Art of Taming Flows and Coroutines in Kotlin, or 'How Not to DDoS Yourself with Server-Sent Events'

I originally wanted to write a post about Server-Sent Events in general, and how delightfully cool they are. SSE provides a clean, standardized way for servers to push real-time updates to web clients over a simple HTTP connection. The MDN documentation showcases how straightforward the client-side implementation is, while Ktor’s SSE support makes the server-side equally elegant. SSE strikes a perfect balance: simpler than WebSockets when you only need one-way communication, yet more efficient than polling.

But this isn’t that post.

Instead, this is a story about how seemingly innocent Flow and coroutine code can bite you in production in the most unexpected ways. It’s about the subtle difference between “working” and “working under load.” And it’s about how a tiny change in flow control can mean the difference between a robust server and an accidental self-DDoS.

Note: I tell this tale with the explicit permission from my client, but I’ve intentionally obfuscated some details for obvious reasons.

The Setup: A Perfect Storm Link to heading

Picture this: It’s a Friday around lunch-time. Our team has just deployed a beautiful new SSE endpoint for real-time notifications. The code passed code review with flying colors, worked flawlessly in local development, and sailed through our staging environment. We were proud of our clean, idiomatic Kotlin—a textbook example of modern coroutine and Flow usage.

Then we deployed to production.

At the exact same time, a known hacker group decided to launch a DDoS attack against our infrastructure. Thousands of legitimate users were online, each with active SSE connections for real-time updates. The combination of external attack traffic and internal connection management created the perfect storm.

Our servers didn’t just struggle—they started consuming resources at an alarming rate. Memory usage spiked, CPU utilization maxed out, and we were essentially DDoS’ing ourselves from the inside while fighting off the external attack.

The Puzzle: Two Approaches, One Problem Link to heading

Here’s the code that went to production. Can you spot which approach will leak resources under load?

Approach A: Collect & Return Link to heading

routing {
    sse("/events") {
        val sessionId = call.sessionId()
        val eventFlow: Flow<Pair<String, Boolean> = merge(someGlobalEventFlow, someClientSpecificEventFlow(sessionId))
            .map { event -> Pair(event, checkIfClientIsAuthenticated(sessionId)) }

        // Approach A: collect && return
        eventFlow.collect { (event, clientIsAuthenticated) ->
            // if the client is not authenticated, return from function to stop collecting events
            if (!clientIsAuthenticated) {
                sendLoggedOutEvent()
                return@collect
            }

            // try to send the event to the client, returning true if the client is still connected
            val clientIsConnected = trySendEvent(event)
            if (!clientIsConnected) {
                return@collect
            }
        }

        // home free: no longer trying to send events to this user
        close()
    }
}

Approach B: Collect & Cancel Link to heading

routing {
    sse("/events") {
        val sessionId = call.sessionId()
        val eventFlow: Flow<Pair<String, Boolean> = merge(someGlobalEventFlow, someClientSpecificEventFlow(sessionId))
            .map { event -> Pair(event, checkIfClientIsAuthenticated(sessionId)) }

        // Approach B: collect && cancel
        try {
            eventFlow.collect { (event, clientIsAuthenticated) ->
                // if the client is not authenticated, cancel the flow to stop collecting events
                if (!clientIsAuthenticated) {
                    sendLoggedOutEvent()
                    cancel(IntentionalCloseException)
                }

                // try to send the event to the client, returning true if the client is still connected
                val clientIsConnected = trySendEvent(event)
                if (!clientIsConnected) {
                    cancel(IntentionalCloseException)
                }
            }
        } catch (e: IntentionalCloseException) {
            // do nothing, we've cancelled the flow intentionally
        } finally {
            // home free: no longer trying to send events to this user
            close()
        }
    }
}

Both approaches look reasonable at first glance. Both handle authentication checking and client disconnection. Both compile cleanly and work perfectly with a handful of concurrent connections.

But only one of them will behave correctly under production load.

The Difference: A Tale of Two Control Flows Link to heading

The critical difference lies in how each approach handles early termination of the Flow collection.

Approach A: The Resource Leak Link to heading

if (!clientIsAuthenticated) {
    return@collect  // This only skips the current emission!
}

Here’s the subtle trap: return@collect doesn’t stop the collection—it only skips processing the current emission. The collect block continues waiting for the next emission from the Flow. This means:

The coroutine keeps running
The SSE connection remains open
The Flow continues producing events
close() is never reached
Resources accumulate with each “disconnected” client

So while the return@collect appears to be the coroutine equivalent of a break within a regular loop, it’s actually more similar to a continue. Precicely what we don’t want!

Under normal conditions with a few dozen connections, this might go unnoticed (and it sure did!). But when thousands of connections are established during a DDoS attack and then clients become unauthenticated or disconnect, those zombie collectors pile up quickly. Very quickly!

Approach B: Clean Termination Link to heading

if (!clientIsAuthenticated) {
    cancel(IntentionalCloseException)  // This cancels the collecting coroutine; think `break` within a loop
}

The cancel() call throws a IntentionalCloseException, which:

Terminates the collecting coroutine
Exits the collect block
Triggers the finally block
Calls close() to clean up the SSE connection
Properly releases all associated resources

The try-catch-finally structure ensures that when we intentionally cancel the operation, cleanup happens correctly.

(You could also use some variation of transformWhile or takeWhile before the collect instead of canceling with an exception, which is what we ended up doing. But try/catch/finally was easier to explain.)

The Production Reality Link to heading

During our incident, Approach A created a cascading resource leak. Every time a client disconnected or became unauthenticated (which happened frequently during the DDoS), we accumulated:

An active coroutine waiting for the next Flow emission
An open SSE connection consuming server resources
Memory allocated for the Flow processing pipeline
Background tasks polling for authentication status

With thousands of connections being established and “abandoned” in this way, our servers quickly became overwhelmed—not just by the external attack, but by our own leaked resources.

The Fix and Lessons Learned Link to heading

The fix was embarrassingly simple: replace return@collect with cancel(...) and add proper exception handling. But the lessons were profound:

1. Load Testing Reveals Truth Link to heading

Code that works with 10 concurrent connections might fail catastrophically with 10,000. Our staging environment, optimized for cost over scale, simply couldn’t reproduce the production load patterns.

2. Resource Management Is Critical Link to heading

In languages with garbage collection, it’s easy to forget about resource leaks. But when dealing with network connections, coroutines, and flows, explicit cleanup becomes crucial.

3. Control Flow Matters Link to heading

The difference between “skip this iteration” and “stop collecting” is subtle in code (and in this case very easy to miss!) but massive in production impact. Understanding the exact semantics of coroutine cancellation is essential for robust server applications.

4. Timing Is Everything Link to heading

Our code worked perfectly—until it didn’t. The combination of high load and external pressure revealed edge cases that never appeared under normal conditions.

Best Practices for SSE and Flow Management Link to heading

Always use explicit cancellation when you need to terminate Flow collection early
Implement proper cleanup in finally blocks or using use functions
Test under realistic load with tools that can simulate thousands of concurrent connections
Monitor resource usage in production to catch accumulation patterns early
Understand coroutine lifecycle and how cancellation propagates through your system

A Happy Ending Link to heading

After deploying the fix, our servers stabilized even under the continued DDoS attack. The external attackers were eventually blocked, but more importantly, we learned that our internal code was resilient under extreme load.

The corrected approach handles thousands of SSE connections gracefully, properly cleaning up resources when clients disconnect, and maintaining predictable memory usage even under attack conditions.

Conclusion Link to heading

Server-Sent Events are indeed a powerful and elegant technology for real-time web applications. Kotlin’s coroutines and Flow provide beautiful abstractions for handling asynchronous streams. But excellence, as always, is in the details.

The difference between return@collect and cancel(...) might seem trivial, but in production systems serving thousands of users, these subtleties become the difference between stability and catastrophic failure.

Sometimes the most dangerous bugs are the ones that hide in plain sight, looking perfectly reasonable until the moment they’re not.

Remember: when dealing with flows and coroutines, always clean up your resources. Your production servers will thank you.

Special thanks to the DDoS attackers for providing the load testing we apparently needed. Your service is not requested, but occasionally educational.