Graceful Shutdowns in NodeJS

Kris Zyp
Doctor Evidence Development
4 min readJan 25, 2018

--

We use a NodeJS-based server in our stack that is responsible for transforming data and caching content in LevelDB for fast access in the front-end. This, like any other HTTP server, handles HTTP requests from users. And when we need to update our code and restart this server, we want to do this as unobtrusively as possible. Therefore, we employed a “graceful shutdown” process that restarts Node, avoiding any lost requests that we can.

Fortunately, Node is actually very well designed for graceful shutdowns, with several important features that facilitate a smooth exit. First, Node
naturally exits once there are no outstanding or impending events to process. To initiate a graceful shutdown, we can simply avoid queueing up any new responses to handle. Node will go through its normal process of finishing any outstanding events or requests to respond to, then do a normal exit. In addition, the HTTP module also includes a graceful shutdown process, where you can close() the server, which will refuse new connections, but continue to finish processing request/responses on existing connections until complete (allowing any pages that are in the middle of loading to complete). This includes a callback for when all connections are closed so we can force an exit with a timeout, if necessary. Once the HTTP server has finished its work, and other outstanding events are completed, Node’s event queue will be empty and will exit on its own.

In our application, this itself would not be sufficient to close down the server, because we support real-time notifications, through Server-Sent Event (SSE) connections, which are long-lived, and an HTTP server that waits for these connections to close would be forced to wait indefinitely for them to close (could be hours or days). Consequently we proactive close these particular passive connection/responses, which allows the HTTP server to close down once actual active requests are finished.

There are a number of other precautions and Node features that should be observed to ensure a graceful Node shutdown. One of the key events that can prevent Node from gracefully shutdown is long-waiting setTimeout or setInterval timeouts or periodic operations. By default, functions that have been queued through these functions will prevent Node from exiting. However, if these timeout or interval callbacks are not required to complete, we can specify that they should not block exiting. For example, we have interval timers that periodically cleanup lock states, expire timer-based cache entries, and send keep-alives on real-time connections. None of these should prevent a Node exit. Fortunately this is simple to handle as well. setTimeout and setInterval both return timer objects that have an unref() method that can be called to indicate Node should not wait for the callback before exiting.

Another consideration is how the graceful shutdown should be initiated or signaled. On Unix systems, a processes can be stopped with a kill command, which by default sends a TERM signal. Node makes it easy to add a SIGTERM event handler, which can then initiate the shutdown process (calling close() on the HTTP server and any other cleanup). Sending this signal, waiting for a process to exit, and restarting the process can be handled by process managers as well (which can also restart if a process unexpectedly terminates).

However, we run our Node server on a Windows server, which unfortunately, does not have support for process signals. Instead, we use an Inter-Process Communciation (IPC) channel to communicate a request to shutdown. We have a small process manager module that monitors our source directories for file changes, and when it detects a change, it recompiles the TypeScript code with the TypeScript compiler, and then sends a shutdown message to the main process to initiate graceful shutdown. Once it completes the process manager module simply restarts the process.

This approach has the advantage of simplifying deployment; merely copying the files to the server is sufficient to start the compilation, shutdown, and restart. And further, the restart will not commence(or happen) until the compilation has completed successfully.

All of these considerations of graceful shutdown/restart handling come together to provide minimal interruption to users during our deployments. With the Node server waiting for current connection requests to finish and restarts taking just a few seconds, we can deploy changes to our Node server with almost zero noticeable impact for users.

If you are interested in joining our team, and solving these type of challenges to provide a better platform for medical research, you can learn more here.

--

--