After the September BBLISA meeting, several of us adjourned to the CBC for beer and discussion. We quickly got onto the topic of hiring, and I mentioned that I had developed a set of questions I used as a phone screen to decide whether to bring candidates in for face-to-face interviews. By popular request, I offer it to the BBLISA community.
Some of you might be wondering whether posting these questions publically is a good idea; after all, couldn't a potential candidate study these questions and just repeat the answers from memory? I have several replies to this: First, if a candidate is smart enough to be on the BBLISA mailing list, I'd say that already puts them at least a few points ahead. Second, at least some of the questions are designed to show how a candidate thinks about and solves problems, so no amount of memory is going to help.
I don't expect any but the most senior of candidates to be able to answer all of these questions; besides, that's not the purpose of the screen process. Rather, I'm simply trying to find out what the candidate does and doesn't know, and what depth of knowledge the candidate has. I always make it a point to remind the candidate of this; I also remind them that there are very few "right" answers, and part of the idea is to gain some understand of and insight into how they think about and solve problems.
But enough meta-discussion; here then, in the order I ask them (just to keep the candidate on their toes), are the questions. In some cases, any given question may seem overly simple; this is most often the case when I'm trying to steer the candidate in a particular direction for answering the next question or two.
=============================================================================== For the first three (3) questions, assume one or more colon-delimited files similar to /etc/passwd, with simple changes to be made; for example, change all GIDs to 4000 or change all GIDs < 1000 to 9999. =============================================================================== Q. What are three commands OTHER THAN TEXT EDITORS to could be used to make changes to a file? A. awk, perl, python, and sed are the obvious answers; C, C++, Java, and Tcl are also correct, but the intention of the question is to probe for knowledge of common scripting languages. =============================================================================== Q. How would you make a similar change to 500 files? Assume all the files are in a single directory, and that there is no need to save the original copies of the files. A. Something like this: for f in * ; do awk -F: '. . .' $f > $$ mv $$ $f done =============================================================================== Q. Can you solve this problem without use of a "language tool" (such as awk, perl, python, sed, etc.)? A. Yes, like this: for f in * ; do > $$ while read line ; do set $(echo $line | tr ':' ' ') echo "$1:$2:$3:4000:$5:$6:$7" >> $$ done < $f mv $$ $f done How to handle the case of one or more fields containing blanks or changing only certain fields based on their value is left as an exercise for the interviewer. I will, however, suggest you consider additional uses of "tr" as well as some or all of "case," "if," "test," and "expr." Furthermore, a more complete solution would behave correctly if one or more fields was blank (as is typical for field #2 in /etc/group, or allowed-but-uncommon for field #7 in /etc/passwd). Comments Go ahead and try writing the full, complete, and correct solution; it's harder than you think. Make sure to test your solution on lines like this: user1:pass1:101:100:User1:/home/user1:/bin/sh user2::102:100:User2:/home/user2:/bin/sh user3:pass3:103:100:User #3:/home/user3:/bin/sh user4:pass4:104:100:User4:/home/user4: user5::105:100:User Number Five:/home/user5: =============================================================================== Q. Please describe, in some reasonable level of detail and for whichever version of Unix you are most familiar, what happens from the time you turn on the power until you get a login prompt. A. (Very briefly) Boot prom, auto-start or user input, boot loader from boot block, possibly a secondary loader from somewhere else on the disk, starting the kernel, starting init, the inittab file, the rc files (/etc/rc?.d/S*), and (finally) getty. Comments If the candidate doesn't go into enough detail, I may ask them to describe one or more steps again but in greater detail. If, OTOH, they're giving me too much detail, I don't hesitate to tell them to skip to the next step. =============================================================================== Q. Is there anything special about the files in /etc/rc?.d, or is there some particular relationship between the files in /etc/rc?.d and in /etc/init.d? If so, why? A. The files in /etc/rc?.d are symlinks to the files in /etc/init.d. This is because 1) the "S" version and the "K" version of the files are run from different directories (for example, /etc/rc3.d/S27foo and /etc/rc0.d/K27foo), and 2) so there's only one version of the file to edit (even though it appears in multiple directories). =============================================================================== Q. Other than the number of commands executed, is there any functional difference in the following pairs of pipelines? Assume the omitted material ("...") is identical in each pair. cat *.xyz | sed '...' | sort sed '...' *.xyz | sort cat *.xyz | grep '...' | sort grep '...' *.xyz | sort cat *.xyz | awk '...' | sort awk '...' *.xyz | sort A. No, yes, maybe. =============================================================================== Q. Explain why you might care about the difference between the grep and awk pipelines in the previous question. A. cat | grep won't "clutter" the output with file names; this is especially useful when used with sort -u. cat | awk prevents awk from detecting when it crosses file boundaries; some awk programs use this ability in how the process the files. =============================================================================== Q. The following pairs of code fragments are functionally equivalent: for f in * ; do for f in *.xyz ; do . . . . . . done done ls | ls | ( grep '\.xyz$' | while read f ; do ( . . . while read f ; do done . . . ) done ) In some cases the top fragment of each pair will fail. Why would this happen and why doesn't the bottom fragment also fail? A. The top fragments are limited by MAX_ARGS (or whatever it's called); the bottom fragments rely on stdin and stdout which are unlimited. =============================================================================== Q. Do you understand subnetting and subnet masks? If so, please explain them (briefly). Q. Let's say our network number is 135.27.0.0, and we want to have at least six subnets, but each subnet should be able to have as many hosts as possible. What would the subnet mask be? How many hosts can we put on each network? (Optional: What is the first host address on the third subnet? The last subnet?) A. The subnet mask would be 255.255.224.0 (binary: 1111 1111 . 1111 1111 . 1110 0000 . 0000 0000) Each subnet can have up to 8190 (2 ^ 13 - 2) hosts. [Extra credit: why not 8192?] The first host address on the third subnet would be 135.27.64.1; on the last subnet would be 135.27.95.1 Comment Asking for six subnets is intentionally misleading; I'm hoping they'll say something like "you can't have six, do you want four or eight?" I don't expect candidates to be able to do binary-to-decimal conversions in their head, so I'm willing to take most of their answers in binary. If they have some sort of conversion calculator handy, I will ask them the optional questions. =============================================================================== Q. Is there a standard Unix program that will do base conversions? How would you specify binary input and decimal output? What about binary input and hexadecimal output? A. Yes: dc. For binary->decimal, use "2i"; for binary->hex, use "16o2i" (or, if you're a real bit-banger, "2i10000o"). Please note that "2i16o" is *NOT* correct. Comment For those without a Palm and IPcalc, dc is a gift from the gods when mucking about with networks. Spend a few minutes learning to use it. =============================================================================== Q. What is CIDR? A. Classless Inter-Domain Routing. Traditional subnet masks apply only to local networks; CIDR extends these subnets across an internet. =============================================================================== Q. Most modern Unix systems store three times for each file; what are their names and meanings? A. 1) Access time (atime); when the file was last read OR written. 2) Modification time (mtime); when the *file* was last modified. 3) Change time (ctime); when the *inode* was last modified. Comments Unfortunately, more than a few candidates (even very senior ones) think ctime is "creation time." :-( Never was, and probably never will be. Knowing the names isn't enough; I want them to be able to explain the relationship between the three times, too. Specifically, a change in mtime implies a change in atime as well (but not the reverse); a change in either atime or mtime implies a change in ctime as well, but chown/chmod/chgrp change only ctime. =============================================================================== Q. Please diagnose the problem described in the following situation: A new system was attached to a Cisco 5509 Ethernet switch; RedHat Linux was installed, the network was configured, and everything was working just fine. The system was then moved to a different port on the Ethernet switch (using the same cable); at this point, the network hung. *NOTHING* else had changed; both the Ethernet switch and the NIC showed link lights, and based on the flashing lights, traffic seemed to be getting from the system to the switch; however, no replies were received, and the system -- which had been using the network just fine less than 15 second ago -- now couldn't communicate with anything else on the network. We quickly tried another Ethernet cable; no difference. Suddenly, about a minute later, everything started working again. What had happened, and why? Additional information: 1) *Everything* on the Linux box was working correctly. 2) The "tcpdump" command showed packets going out, but no packets ever came back. 3) Had we run "tcpdump" on another node and watched ping traffic between the two nodes, everything would have looked like it was working. 4) The same things would (most likely) have happened with a different Ethernet switch (either model, vendor, or both). A. Many (most?) Ethernet switches keep tables of which MAC addresses are associated with which ports. Some (many? most?) switches don't check every packet against this table; instead, they have some sort of time-out mechanism. Until that mechanism clears the MAC address from the old port, the switch continues to (mis-) direct packets to the old port. This explain why ping appears to work on the target test node, but the recently-moved node never gets a reply. Comments I don't really care if the candidate figures out the cause of the problem. This is one of those questions that's designed to show how a candidate approaches problems. I'm far more interested in the process than in the solution. =============================================================================== Q. What is the most common cause of DNS problems (not counting syntax errors)? A. Forgetting to increment the serial number. Comment If the candidate needs a hint, try asking this question instead: "What must you *always* do *any* time you make a change to a DNS zone file? =============================================================================== Q. How do you "restart" DNS serial numbers? That is, if you accidentally set your serial number to, say, 2099103100, how can you set it back to 2000103100? A. Start by adding 2147483647 (2 ^ 31 - 2) to the serial number; set the refresh time to 600 (5 minutes); SIGHUP named; wait 10 minutes. Set the serial number to 1; SIGHUP named; wait 10 minutes. Set the serial number to the desired value; set the refresh time back to whatever it was before you started this whole mess; SIGHUP named. Comments What that second step *really* does is this: serial + (4294967296 - serial) - 4294967296 + 1. See RFC 1912 and RFC 1982 for why this works. Again, if you don't understand this answer, don't ask the question! =============================================================================== Q. Please describe, in terms of what bits in the TCP header are on or off, how a TCP connection is established. Q. Please describe, in terms of header bits, how a TCP connection is closed. Q. Please describe the "theory" of the ACK bit and TCP packet sequence numbers. Comments If you don't already know the answers to these questions, you probably shouldn't be asking them. This may seem like trivia, but if you're debugging networks (especially routers/routing and/or firewalls), it's essential. =============================================================================== Q. Please explain (briefly) the difference between paging and swapping. A. Paging requires VM hardware support, and moves *parts* of individual processes out of and back into main memory; swapping moves one or more whole processes (as needed). =============================================================================== Q. Please diagnose the problem described in the following situation: While booting a Sun system, messages on the console indicated that /var/spool/mail was full, and that there was no space left on the device. When the system had finished booting to multi-user mode (in what appears to be an otherwise successful manner), "df" showed that /var/spool/mail was only 65% full, but / was 100% full. However, when "du" was run on all directories that actually resided on the root partition, only about 50% of the space could be accounted for. Attempt to write to /var/spool/mail worked just fine, but attempts to write to / failed with messages like "no space left on device." Additional information: 1) To prove that "du" and "df" were working correctly (which they were), tests were run on every other filesystem to compare the results of the two commands; in every case but /, they matched. 2) All mounts on the system were local; NFS was not being used. 3) There were no other error messages in any of the logs. 4) Rebooting the system did not fix the problem. 5) During the reboot, the console was watched carefully for other errors; there were none. 6) Running fsck showed no errors, nor did it have any effect on the problem. 7) The directories /var and /var/spool resided on the root partition, but /var/spool/mail was a mount point for a second disk. A. Somehow, sometime in the past, the system had booted but the mount of /var/spool/mail had failed. It then ran in this state for some period of time, during which rather large files were written to /var/spool/mail -- WHICH WAS THEN ON THE ROOT PARTITION. Sometime later, problem that had prevented /var/spool/mail from being mounted was fixed and the system was rebooted. Unfortunately, the operator forgot to remove the files in the /var/spool/mail still on the root partition. This left root at 100%, but 50% of the space was now hidden by the mount. Comments This is another "problem-solving question." =============================================================================== Q. There is something "special" about how the "cd" command is implemented; with respect to implementation, what is the difference between "cd" and, say, "ls"? More importantly, why *must* "cd" be implemented in this way? A. The "cd" command *MUST* be implemented as a shell built-in command. Current working directory is a process attribute, and it is not possible for one process to change the attributes of another process. If "cd" were a program, it would run as a different process from the shell and, as such, could not affect the CWD of the process in which the shell is running. Comments If the candidate needs a hint, ask if they've ever tried to do "man cd" (or, if they have a system in front of them, suggest that they try it). =============================================================================== Q. What is an inode? What's the difference between an inode and it's associated file? What is stored in the inode? A. An inode holds information *about* the file, such as type, device info, permissions, owner, group, size, number of links, and the block map (whereas the file holds only data). =============================================================================== Q. Is the file name stored in the inode? A. No. Q. Please describe (briefly) the difference between hard links and symbolic links. A. Hard links are multiple directory entries pointing to the same inode; symlinks store the name of the "real" file as data (or maybe in spare space in the inode). Comments If the candidate gets the first question wrong, they'll almost certainly get the second one wrong, too. =============================================================================== Q. Which conferences do you attend? How did you choose them? Given the opportunity, would you go to these same conferences again? If you haven't attended any conferences, why not? Comments For junior and *maybe* mid-level candidates, I will not hold it against them if their answer is "none, because my employer wouldn't pay for them." However, for senior-level candidates, I do not consider this acceptable.