After the September BBLISA meeting, several of us adjourned to the CBC for beer and discussion. We quickly got onto the topic of hiring, and I mentioned that I had developed a set of questions I used as a phone screen to decide whether to bring candidates in for face-to-face interviews. By popular request, I offer it to the BBLISA community.

Some of you might be wondering whether posting these questions publically is a good idea; after all, couldn't a potential candidate study these questions and just repeat the answers from memory? I have several replies to this: First, if a candidate is smart enough to be on the BBLISA mailing list, I'd say that already puts them at least a few points ahead. Second, at least some of the questions are designed to show how a candidate thinks about and solves problems, so no amount of memory is going to help.

I don't expect any but the most senior of candidates to be able to answer all of these questions; besides, that's not the purpose of the screen process. Rather, I'm simply trying to find out what the candidate does and doesn't know, and what depth of knowledge the candidate has. I always make it a point to remind the candidate of this; I also remind them that there are very few "right" answers, and part of the idea is to gain some understand of and insight into how they think about and solve problems.

But enough meta-discussion; here then, in the order I ask them (just to keep the candidate on their toes), are the questions. In some cases, any given question may seem overly simple; this is most often the case when I'm trying to steer the candidate in a particular direction for answering the next question or two.

===============================================================================

	For the first three (3) questions, assume one or more
	colon-delimited files similar to /etc/passwd, with
	simple changes to be made; for example, change all GIDs
	to 4000 or change all GIDs < 1000 to 9999.

===============================================================================

Q. What are three commands OTHER THAN TEXT EDITORS to could be used to
   make changes to a file?

A. awk, perl, python, and sed are the obvious answers; C, C++, Java, and
   Tcl are also correct, but the intention of the question is to probe
   for knowledge of common scripting languages.

===============================================================================

Q. How would you make a similar change to 500 files? Assume all the
   files are in a single directory, and that there is no need to save
   the original copies of the files.

A. Something like this:

	for f in * ; do
	    awk -F: '. . .' $f > $$
	    mv $$ $f
	done

===============================================================================

Q. Can you solve this problem without use of a "language tool" (such as
   awk, perl, python, sed, etc.)?

A. Yes, like this:

	for f in * ; do
	    > $$
	    while read line ; do
		set $(echo $line | tr ':' ' ')
		echo "$1:$2:$3:4000:$5:$6:$7" >> $$
	    done < $f

	    mv $$ $f
	done

   How to handle the case of one or more fields containing blanks or
   changing only certain fields based on their value is left as an
   exercise for the interviewer. I will, however, suggest you consider
   additional uses of "tr" as well as some or all of "case," "if,"
   "test," and "expr." Furthermore, a more complete solution would
   behave correctly if one or more fields was blank (as is typical for
   field #2 in /etc/group, or allowed-but-uncommon for field #7 in
   /etc/passwd).

Comments
	Go ahead and try writing the full, complete, and correct
	solution; it's harder than you think. Make sure to test your
	solution on lines like this:

	    user1:pass1:101:100:User1:/home/user1:/bin/sh
	    user2::102:100:User2:/home/user2:/bin/sh
	    user3:pass3:103:100:User #3:/home/user3:/bin/sh
	    user4:pass4:104:100:User4:/home/user4:
	    user5::105:100:User Number Five:/home/user5:

===============================================================================

Q. Please describe, in some reasonable level of detail and for whichever
   version of Unix you are most familiar, what happens from the time you
   turn on the power until you get a login prompt.

A. (Very briefly) Boot prom, auto-start or user input, boot loader from
   boot block, possibly a secondary loader from somewhere else on the
   disk, starting the kernel, starting init, the inittab file, the rc
   files (/etc/rc?.d/S*), and (finally) getty.

Comments
	If the candidate doesn't go into enough detail, I may ask them
	to describe one or more steps again but in greater detail. If,
	OTOH, they're giving me too much detail, I don't hesitate to
	tell them to skip to the next step.

===============================================================================

Q. Is there anything special about the files in /etc/rc?.d, or is there
   some particular relationship between the files in /etc/rc?.d and in
   /etc/init.d? If so, why?

A. The files in /etc/rc?.d are symlinks to the files in /etc/init.d.
   This is because 1) the "S" version and the "K" version of the files
   are run from different directories (for example, /etc/rc3.d/S27foo
   and /etc/rc0.d/K27foo), and 2) so there's only one version of the
   file to edit (even though it appears in multiple directories).

===============================================================================

Q. Other than the number of commands executed, is there any functional
   difference in the following pairs of pipelines? Assume the omitted
   material ("...") is identical in each pair.

	cat *.xyz | sed '...' | sort
	sed '...' *.xyz | sort

	cat *.xyz | grep '...' | sort
	grep '...' *.xyz | sort

	cat *.xyz | awk '...' | sort
	awk '...' *.xyz | sort

A. No, yes, maybe.

===============================================================================

Q. Explain why you might care about the difference between the grep and
   awk pipelines in the previous question.

A. cat | grep won't "clutter" the output with file names; this is
   especially useful when used with sort -u.

   cat | awk prevents awk from detecting when it crosses file
   boundaries; some awk programs use this ability in how the process the
   files.

===============================================================================

Q. The following pairs of code fragments are functionally equivalent:

	for f in * ; do		for f in *.xyz ; do
	    . . .		    . . .
	done			done

	ls |			ls |
	(			grep '\.xyz$' |
	while read f ; do	(
	    . . .		while read f ; do
	done			    . . .
	)			done
				)

   In some cases the top fragment of each pair will fail. Why would this
   happen and why doesn't the bottom fragment also fail?

A. The top fragments are limited by MAX_ARGS (or whatever it's called);
   the bottom fragments rely on stdin and stdout which are unlimited.

===============================================================================

Q. Do you understand subnetting and subnet masks? If so, please explain
   them (briefly).

Q. Let's say our network number is 135.27.0.0, and we want to have at
   least six subnets, but each subnet should be able to have as many
   hosts as possible. What would the subnet mask be? How many hosts can
   we put on each network?  (Optional: What is the first host address on
   the third subnet? The last subnet?)

A. The subnet mask would be 255.255.224.0
   (binary: 1111 1111 . 1111 1111 . 1110 0000 . 0000 0000)

   Each subnet can have up to 8190 (2 ^ 13 - 2) hosts.

   [Extra credit: why not 8192?]

   The first host address on the third subnet would be 135.27.64.1;
   on the last subnet would be 135.27.95.1

Comment
	Asking for six subnets is intentionally misleading; I'm hoping
	they'll say something like "you can't have six, do you want four
	or eight?"

	I don't expect candidates to be able to do binary-to-decimal
	conversions in their head, so I'm willing to take most of their
	answers in binary. If they have some sort of conversion
	calculator handy, I will ask them the optional questions.

===============================================================================

Q. Is there a standard Unix program that will do base conversions? How
   would you specify binary input and decimal output? What about binary
   input and hexadecimal output?

A. Yes: dc. For binary->decimal, use "2i"; for binary->hex, use "16o2i"
   (or, if you're a real bit-banger, "2i10000o"). Please note that
   "2i16o" is *NOT* correct.

Comment
	For those without a Palm and IPcalc, dc is a gift from the gods
	when mucking about with networks. Spend a few minutes learning
	to use it.

===============================================================================

Q. What is CIDR?

A. Classless Inter-Domain Routing. Traditional subnet masks apply only to
   local networks; CIDR extends these subnets across an internet.

===============================================================================

Q. Most modern Unix systems store three times for each file; what are
   their names and meanings?

A. 1) Access time (atime); when the file was last read OR written.
   2) Modification time (mtime); when the *file* was last modified.
   3) Change time (ctime); when the *inode* was last modified.

Comments
	Unfortunately, more than a few candidates (even very senior
	ones) think ctime is "creation time." :-( Never was, and
	probably never will be.

	Knowing the names isn't enough; I want them to be able to
	explain the relationship between the three times, too.
	Specifically, a change in mtime implies a change in atime as
	well (but not the reverse); a change in either atime or mtime
	implies a change in ctime as well, but chown/chmod/chgrp change
	only ctime.

===============================================================================

Q. Please diagnose the problem described in the following situation:

   A new system was attached to a Cisco 5509 Ethernet switch; RedHat
   Linux was installed, the network was configured, and everything was
   working just fine. The system was then moved to a different port on
   the Ethernet switch (using the same cable); at this point, the
   network hung. *NOTHING* else had changed; both the Ethernet switch
   and the NIC showed link lights, and based on the flashing lights,
   traffic seemed to be getting from the system to the switch; however,
   no replies were received, and the system -- which had been using the
   network just fine less than 15 second ago -- now couldn't communicate
   with anything else on the network. We quickly tried another Ethernet
   cable; no difference. Suddenly, about a minute later, everything
   started working again. What had happened, and why?

   Additional information: 1) *Everything* on the Linux box was working
   correctly. 2) The "tcpdump" command showed packets going out, but
   no packets ever came back. 3) Had we run "tcpdump" on another node
   and watched ping traffic between the two nodes, everything would have
   looked like it was working. 4) The same things would (most likely)
   have happened with a different Ethernet switch (either model, vendor,
   or both).

A. Many (most?) Ethernet switches keep tables of which MAC addresses are
   associated with which ports. Some (many? most?) switches don't check
   every packet against this table; instead, they have some sort of
   time-out mechanism. Until that mechanism clears the MAC address from
   the old port, the switch continues to (mis-) direct packets to the
   old port. This explain why ping appears to work on the target test
   node, but the recently-moved node never gets a reply.

Comments
	I don't really care if the candidate figures out the cause of
	the problem. This is one of those questions that's designed to
	show how a candidate approaches problems. I'm far more
	interested in the process than in the solution.

===============================================================================

Q. What is the most common cause of DNS problems (not counting syntax
   errors)?

A. Forgetting to increment the serial number.

Comment
	If the candidate needs a hint, try asking this question instead:
	"What must you *always* do *any* time you make a change to a DNS
	zone file?

===============================================================================

Q. How do you "restart" DNS serial numbers? That is, if you accidentally
   set your serial number to, say, 2099103100, how can you set it back
   to 2000103100?

A. Start by adding 2147483647 (2 ^ 31 - 2) to the serial number; set the
   refresh time to 600 (5 minutes); SIGHUP named; wait 10 minutes.

   Set the serial number to 1; SIGHUP named; wait 10 minutes.

   Set the serial number to the desired value; set the refresh time back
   to whatever it was before you started this whole mess; SIGHUP named.

Comments
	What that second step *really* does is this:
	
		serial + (4294967296 - serial) - 4294967296 + 1.

	See RFC 1912 and RFC 1982 for why this works. Again, if you
	don't understand this answer, don't ask the question!

===============================================================================

Q. Please describe, in terms of what bits in the TCP header are on or
   off, how a TCP connection is established.

Q. Please describe, in terms of header bits, how a TCP connection is
   closed.

Q. Please describe the "theory" of the ACK bit and TCP packet sequence
   numbers.

Comments
	If you don't already know the answers to these questions, you
	probably shouldn't be asking them.

	This may seem like trivia, but if you're debugging networks
	(especially routers/routing and/or firewalls), it's essential.

===============================================================================

Q. Please explain (briefly) the difference between paging and swapping.

A. Paging requires VM hardware support, and moves *parts* of individual
   processes out of and back into main memory; swapping moves one or
   more whole processes (as needed).

===============================================================================

Q. Please diagnose the problem described in the following situation:

   While booting a Sun system, messages on the console indicated that
   /var/spool/mail was full, and that there was no space left on the
   device. When the system had finished booting to multi-user mode (in
   what appears to be an otherwise successful manner), "df" showed that
   /var/spool/mail was only 65% full, but / was 100% full. However, when
   "du" was run on all directories that actually resided on the root
   partition, only about 50% of the space could be accounted for.
   Attempt to write to /var/spool/mail worked just fine, but attempts to
   write to / failed with messages like "no space left on device."

   Additional information: 1) To prove that "du" and "df" were working
   correctly (which they were), tests were run on every other filesystem
   to compare the results of the two commands; in every case but /, they
   matched. 2) All mounts on the system were local; NFS was not being
   used. 3) There were no other error messages in any of the logs. 4)
   Rebooting the system did not fix the problem. 5) During the reboot,
   the console was watched carefully for other errors; there were none.
   6) Running fsck showed no errors, nor did it have any effect on the
   problem. 7) The directories /var and /var/spool resided on the root
   partition, but /var/spool/mail was a mount point for a second disk.

A. Somehow, sometime in the past, the system had booted but the mount
   of /var/spool/mail had failed. It then ran in this state for some
   period of time, during which rather large files were written to
   /var/spool/mail -- WHICH WAS THEN ON THE ROOT PARTITION. Sometime
   later, problem that had prevented /var/spool/mail from being mounted
   was fixed and the system was rebooted. Unfortunately, the operator
   forgot to remove the files in the /var/spool/mail still on the root
   partition. This left root at 100%, but 50% of the space was now
   hidden by the mount.

Comments
	This is another "problem-solving question."

===============================================================================

Q. There is something "special" about how the "cd" command is implemented;
   with respect to implementation, what is the difference between "cd"
   and, say, "ls"? More importantly, why *must* "cd" be implemented in
   this way?

A. The "cd" command *MUST* be implemented as a shell built-in command.
   Current working directory is a process attribute, and it is not
   possible for one process to change the attributes of another process.
   If "cd" were a program, it would run as a different process from the
   shell and, as such, could not affect the CWD of the process in which
   the shell is running.

Comments
	If the candidate needs a hint, ask if they've ever tried to do
	"man cd" (or, if they have a system in front of them, suggest
	that they try it).

===============================================================================

Q. What is an inode? What's the difference between an inode and it's
   associated file? What is stored in the inode?

A. An inode holds information *about* the file, such as type, device
   info, permissions, owner, group, size, number of links, and the block
   map (whereas the file holds only data).

===============================================================================

Q. Is the file name stored in the inode?

A. No.

Q. Please describe (briefly) the difference between hard links and
   symbolic links.

A. Hard links are multiple directory entries pointing to the same inode;
   symlinks store the name of the "real" file as data (or maybe in spare
   space in the inode).

Comments
	If the candidate gets the first question wrong, they'll almost
	certainly get the second one wrong, too.

===============================================================================

Q. Which conferences do you attend? How did you choose them? Given the
   opportunity, would you go to these same conferences again? If you
   haven't attended any conferences, why not?

Comments
	For junior and *maybe* mid-level candidates, I will not hold it
	against them if their answer is "none, because my employer
	wouldn't pay for them." However, for senior-level candidates, I
	do not consider this acceptable.