Apache Worker, I Command Thee!
If you’re running a web server, no matter the traffic, you’ve most likely looked at pretty graphs from your server and turned into a efficiency nut of some kind. You’re always looking at what is using your RAM, what’s eating up your CPU, and what’s clogging up your network. If you’re not, you should be! While there’s tons of tips, tricks, and techniques you can apply to your server to run your software stack as efficiently as possible, KeepAlives settings in Apache are something that many people overlook and can have a dramatic impact on Apache’s memory usage. We’re really only going to focus on two Apache directives found in your main Apache configuration file.
On Debian/Ubuntu = /etc/apache2/apache2.conf
On RHEL/CentOS = /etc/httpd/httpd.conf
KeepAlive (On | Off)
KeepAliveTimeout (# of seconds to stay alive before timing out)
In short KeepAlives are a way for Apache to process multiple HTTP requests over a single TCP connection. This can help serve your files quicker by the client (visitor’s browser) and the server (Apache) not having to reestablish a new TCP connection for each and every file on your web page. Although, if used improperly, this can hurt your server by keeping these connections open longer than they need to be and causing unnecessary memory usage by your server. This memory usage is what we’re going to tackle a bit with KeepAlives.
Before we get started, let’s quickly go over how your Apache server is most likely setup. When you start the Apache server, a main process is created. Let’s call this main process the coordinator. The coordinator actually doesn’t handle client requests for your web page and files, he’s too good for that. Instead, the coordinator will spawn and coordinate a pool of worker processes to handle those requests. The important thing to note here is that each of these workers will use RAM. When a user opens an image on your site, the client’s browser will send a request to Apache on your server. The coordinator will look at how many workers are available, pick an available one, and have it deal with that particular request. While that worker is dealing with that particular request, that worker cannot be used for a different request.
This I where KeepAlive comes into play. When that worker sends the client the image that was requested, it can either close the TCP session and report back to the coordinator again as an available worker ready for another request. Or, with KeepAlive enabled, it can keep the session alive and listen to the client for more requests. Perhaps the client also needs a CSS file, or a JavaScript file. That same worker, because he kept the session alive, can just keep processing those requests. This is wonderful as it saves time on the client’s end not having to reestablish the TCP connection for every single little file.
Now think about it; What if there were no other files needed by the client? What if it only wanted the image and we knew for a fact it didn’t need anything else. That worker is going to keep the session alive, and while it’s alive that worker is not available for another request from a different client. Meaning when someone else now opens that image on your site, the coordinator will have to either use or spawn a new worker. Since each worker uses memory, this can get pretty ugly, pretty quick if you have a busy site. You don’t want your workers tied up for no reason waiting for something you know is never going to come.
This where we set a sensible timeout limit for our KeepAlive sessions. The KeepAliveTimeout directive is the number of seconds that worker will wait since the last request for a new request before closing the session and reporting back to the coordinator as a available worker again for a new client or request.
Different distributions that package Apache have different default values for these settings. Almost all of them have KeepAlive enabled, however most of them have the KeepAliveTimeout set to something higher than is needed for most people. I’ve seen people run Apache with KeepAliveTimeout set at 15 seconds.
If you have a traditional site, where once things are loaded, they are loaded, 15 seconds can be killer. This means that you’re tying up a worker for 15 seconds for no reason, letting it wait for something that isn’t coming. All the meanwhile the coordinator process is having to spawn new workers, using up even more RAM for other clients and requests. The goal should be to put the worker back into the available pool of workers as quickly as possible so you can serve the most clients with the least amount of active workers. Less workers equals less RAM used.
For most websites, you’ll definitely want to enable KeepAlive as your pages probably are having the client downloads multiple images, css files, javascript files, and who knows what. However, you definitely want to turn the KeepAliveTimeout down to as low as even 2 seconds.
On the other hand, if you’re an image host and you know that when people make requests to your web server, you’ll only be responding with a single image. There is no need for KeepAlive at all. Go ahead and turn it off.
Using this in conjunction with the other thread settings available with your Apache MPM, you can really fine tune Apache beyond the default settings and run your web server more efficiently.