Just had to kill multiple Java threads spawned by a hanging database query, or something so similar to a hanging db query that it makes no difference. On Solaris, this would be comparatively simple, as all the threads would have been represented by a single process id. One kill -9 < pid > would have done the trick. On Linux, or at least the 7.2 Redhat version we run, each thread has it’s own pid, so before I could restart the Tomcat servlet engine I would have had to kill each one individually. It’s not difficult, typing “kill -9 < pid > ” over and over again, but it is onerous, or at least as onerous as my job gets.
Here’s where I tell you a secret. The best sysadmins are lazy bastards, so lazy that typing in even just 10 “kill -9 < pid > ” commands seems an overwhelming task. I am not a great sysadmin; I’m not quite lazy enough*. Most of the time I’ll just type in the multiple commands. Today I decided not to.
Here’s what the processes that needed killing looked like; (I’ve broken them up so that they won’t screw up viewers with small monitors. Normally everything in the 2 entries below would appear on one line)
root 18935 0.0 6.7 236880 86704 ? S Apr28 0:00
/usr/Java/jdk1.3.1_07/bin/i386/native_threads/Java -Djava.endorsed.dirs=/usr/local/tomcat-smartask/bin:/usr/local/tomcat-smartask/common/lib – classpath /usr/Java/jdk1.3.1_07/lib/tools.jar:/usr/local/tomcat-smartask/bin/bootstrap.jar -Dcatalina.base=/usr/local/tomcat-smartask -Dcatalina.home=/usr/local/tomcat-smartask -Djava.io.tmpdir=/usr/local/tomcat-smartask/temp org.apache.catalina.startup.Bootstrap start
root 18936 0.0 6.7 236880 86704 ? S Apr28 0:00
/usr/Java/jdk1.3.1_07/bin/i386/native_threads/Java -Djava.endorsed.dirs=/usr/local/tomcat-smartask/bin:/usr/local/tomcat-smartask/common/lib -classpath /usr/Java/jdk1.3.1_07/lib/tools.jar:/usr/local/tomcat-smartask/bin/bootstrap.jar -Dcatalina.base=/usr/local/tomcat-smartask -Dcatalina.home=/usr/local/tomcat-smartask -Djava.io.tmpdir=/usr/local/tomcat-smartask/temp org.apache.catalina.startup.Bootstrap start
There were about 50 just like the above, and this command took care of all of them at once, once I took the two minutes needed to puzzle it out;
kill -9 `ps -auxwww | grep smart | cut -c10-16`
I’m going to assume that most everyone who made it this far understands the above, and is wondering why it took me to this point in my career before I used such a simple damn command.
My answer? Because I never needed it before, so there.
For the rest of you, who must have a masochistic streak a mile wide somewhere in your makeup, the command above is actually 3 commands, enclosed in backquotes and separated by pipes (|), that output a number (the pid) to the fourth command, the kill -9
Commands:
ps -auxwww – is part of the Berkeley implementation of the the ps command, found inn /usr/ucb for an Solaris users out there. It gives me the entire process entry, instead of cutting it off after display line. If passes that information to
grep smart – grep looks at all the process entries and filters out all the ones where the word “smart” doesn’t appear. I can’t use “Java” because I have lots of processes with that word in the entry, and I don’t want to kill them all. However, if I could have used “Java” then I could have gotten away with the much simpler and more familiar “ps -ef” command at the beginning, rather than the more cumbersome “ps -auxwww”. The filtered processes are passed to
cut -c10-16 – which cuts out the characters in the 10th thru 16th spaces in the filtered process entry and serves them to
kill -9 – end a process, and do it now. It is the UNIX equivalent of your mother calling you by your entire name, and telling you to drop whatever you are doing and get in the house now.
I copied the entire command to a text file so that rather than having to type multiple kill -9′s in future, I can just change “smart” to whatever identifies the processes I want and run it again. Two minutes expended now, 30 minutes saved over the course of the next few months. The entire course of a system administration career can be traced in minutes of activity saved. Like I said, lazy
The best sysadmins, the ones whose praises are sung in story and song, are entirely sessile, or invisible, having saved enough time that they need never come to work, which of course is the goal of every sysadmin.
Just think how much blogging I’ll be able to do then.
Update: Aside from the sloth, another nice thing about thing a sysadmin is that there are always more elegant sysadmins around to point out inefficiencies in one’s code**, like Jeff of Caerdroia, who wrote to point out that
kill -9 `ps -auxwww | grep smart | grep -v grep | awk ‘{print $2}’`
is a much better solution, as it proscribes the possibility (admittedly remote) that I would kill my own command before it had run its course. After all, “smart” would be found in its process entry as well. That’s where the grep -v grep command comes in, as it filters out the word “grep.” awk ‘{print $2}’ pulls out everything in the second field of the process entry, rather than a set number of characters, useful in case the pids that need killing Re not all of the same length. As my “cut” would only pull out spaces in the case of a shorter pid and extra spaces are ignored in commands, it doesn’t actually matter, but it is a more elegant solution, and thus better.
*The sysadmin at Medfusion is, though, and both he and his boss read the site regularly. Heh.
** all it takes is a willingness to expose one’s shortcomings to the world, and I posses both the willingness and the shortcomings in ample measure.