After the September BBLISA meeting, several of us adjourned to the CBC for beer and discussion. We quickly got onto the topic of hiring, and I mentioned that I had developed a set of questions I used as a phone screen to decide whether to bring candidates in for face-to-face interviews. By popular request, I offer it to the BBLISA community.
Some of you might be wondering whether posting these questions publically is a good idea; after all, couldn't a potential candidate study these questions and just repeat the answers from memory? I have several replies to this: First, if a candidate is smart enough to be on the BBLISA mailing list, I'd say that already puts them at least a few points ahead. Second, at least some of the questions are designed to show how a candidate thinks about and solves problems, so no amount of memory is going to help.
I don't expect any but the most senior of candidates to be able to answer all of these questions; besides, that's not the purpose of the screen process. Rather, I'm simply trying to find out what the candidate does and doesn't know, and what depth of knowledge the candidate has. I always make it a point to remind the candidate of this; I also remind them that there are very few "right" answers, and part of the idea is to gain some understand of and insight into how they think about and solve problems.
But enough meta-discussion; here then, in the order I ask them (just to keep the candidate on their toes), are the questions. In some cases, any given question may seem overly simple; this is most often the case when I'm trying to steer the candidate in a particular direction for answering the next question or two.
===============================================================================
For the first three (3) questions, assume one or more
colon-delimited files similar to /etc/passwd, with
simple changes to be made; for example, change all GIDs
to 4000 or change all GIDs < 1000 to 9999.
===============================================================================
Q. What are three commands OTHER THAN TEXT EDITORS to could be used to
make changes to a file?
A. awk, perl, python, and sed are the obvious answers; C, C++, Java, and
Tcl are also correct, but the intention of the question is to probe
for knowledge of common scripting languages.
===============================================================================
Q. How would you make a similar change to 500 files? Assume all the
files are in a single directory, and that there is no need to save
the original copies of the files.
A. Something like this:
for f in * ; do
awk -F: '. . .' $f > $$
mv $$ $f
done
===============================================================================
Q. Can you solve this problem without use of a "language tool" (such as
awk, perl, python, sed, etc.)?
A. Yes, like this:
for f in * ; do
> $$
while read line ; do
set $(echo $line | tr ':' ' ')
echo "$1:$2:$3:4000:$5:$6:$7" >> $$
done < $f
mv $$ $f
done
How to handle the case of one or more fields containing blanks or
changing only certain fields based on their value is left as an
exercise for the interviewer. I will, however, suggest you consider
additional uses of "tr" as well as some or all of "case," "if,"
"test," and "expr." Furthermore, a more complete solution would
behave correctly if one or more fields was blank (as is typical for
field #2 in /etc/group, or allowed-but-uncommon for field #7 in
/etc/passwd).
Comments
Go ahead and try writing the full, complete, and correct
solution; it's harder than you think. Make sure to test your
solution on lines like this:
user1:pass1:101:100:User1:/home/user1:/bin/sh
user2::102:100:User2:/home/user2:/bin/sh
user3:pass3:103:100:User #3:/home/user3:/bin/sh
user4:pass4:104:100:User4:/home/user4:
user5::105:100:User Number Five:/home/user5:
===============================================================================
Q. Please describe, in some reasonable level of detail and for whichever
version of Unix you are most familiar, what happens from the time you
turn on the power until you get a login prompt.
A. (Very briefly) Boot prom, auto-start or user input, boot loader from
boot block, possibly a secondary loader from somewhere else on the
disk, starting the kernel, starting init, the inittab file, the rc
files (/etc/rc?.d/S*), and (finally) getty.
Comments
If the candidate doesn't go into enough detail, I may ask them
to describe one or more steps again but in greater detail. If,
OTOH, they're giving me too much detail, I don't hesitate to
tell them to skip to the next step.
===============================================================================
Q. Is there anything special about the files in /etc/rc?.d, or is there
some particular relationship between the files in /etc/rc?.d and in
/etc/init.d? If so, why?
A. The files in /etc/rc?.d are symlinks to the files in /etc/init.d.
This is because 1) the "S" version and the "K" version of the files
are run from different directories (for example, /etc/rc3.d/S27foo
and /etc/rc0.d/K27foo), and 2) so there's only one version of the
file to edit (even though it appears in multiple directories).
===============================================================================
Q. Other than the number of commands executed, is there any functional
difference in the following pairs of pipelines? Assume the omitted
material ("...") is identical in each pair.
cat *.xyz | sed '...' | sort
sed '...' *.xyz | sort
cat *.xyz | grep '...' | sort
grep '...' *.xyz | sort
cat *.xyz | awk '...' | sort
awk '...' *.xyz | sort
A. No, yes, maybe.
===============================================================================
Q. Explain why you might care about the difference between the grep and
awk pipelines in the previous question.
A. cat | grep won't "clutter" the output with file names; this is
especially useful when used with sort -u.
cat | awk prevents awk from detecting when it crosses file
boundaries; some awk programs use this ability in how the process the
files.
===============================================================================
Q. The following pairs of code fragments are functionally equivalent:
for f in * ; do for f in *.xyz ; do
. . . . . .
done done
ls | ls |
( grep '\.xyz$' |
while read f ; do (
. . . while read f ; do
done . . .
) done
)
In some cases the top fragment of each pair will fail. Why would this
happen and why doesn't the bottom fragment also fail?
A. The top fragments are limited by MAX_ARGS (or whatever it's called);
the bottom fragments rely on stdin and stdout which are unlimited.
===============================================================================
Q. Do you understand subnetting and subnet masks? If so, please explain
them (briefly).
Q. Let's say our network number is 135.27.0.0, and we want to have at
least six subnets, but each subnet should be able to have as many
hosts as possible. What would the subnet mask be? How many hosts can
we put on each network? (Optional: What is the first host address on
the third subnet? The last subnet?)
A. The subnet mask would be 255.255.224.0
(binary: 1111 1111 . 1111 1111 . 1110 0000 . 0000 0000)
Each subnet can have up to 8190 (2 ^ 13 - 2) hosts.
[Extra credit: why not 8192?]
The first host address on the third subnet would be 135.27.64.1;
on the last subnet would be 135.27.95.1
Comment
Asking for six subnets is intentionally misleading; I'm hoping
they'll say something like "you can't have six, do you want four
or eight?"
I don't expect candidates to be able to do binary-to-decimal
conversions in their head, so I'm willing to take most of their
answers in binary. If they have some sort of conversion
calculator handy, I will ask them the optional questions.
===============================================================================
Q. Is there a standard Unix program that will do base conversions? How
would you specify binary input and decimal output? What about binary
input and hexadecimal output?
A. Yes: dc. For binary->decimal, use "2i"; for binary->hex, use "16o2i"
(or, if you're a real bit-banger, "2i10000o"). Please note that
"2i16o" is *NOT* correct.
Comment
For those without a Palm and IPcalc, dc is a gift from the gods
when mucking about with networks. Spend a few minutes learning
to use it.
===============================================================================
Q. What is CIDR?
A. Classless Inter-Domain Routing. Traditional subnet masks apply only to
local networks; CIDR extends these subnets across an internet.
===============================================================================
Q. Most modern Unix systems store three times for each file; what are
their names and meanings?
A. 1) Access time (atime); when the file was last read OR written.
2) Modification time (mtime); when the *file* was last modified.
3) Change time (ctime); when the *inode* was last modified.
Comments
Unfortunately, more than a few candidates (even very senior
ones) think ctime is "creation time." :-( Never was, and
probably never will be.
Knowing the names isn't enough; I want them to be able to
explain the relationship between the three times, too.
Specifically, a change in mtime implies a change in atime as
well (but not the reverse); a change in either atime or mtime
implies a change in ctime as well, but chown/chmod/chgrp change
only ctime.
===============================================================================
Q. Please diagnose the problem described in the following situation:
A new system was attached to a Cisco 5509 Ethernet switch; RedHat
Linux was installed, the network was configured, and everything was
working just fine. The system was then moved to a different port on
the Ethernet switch (using the same cable); at this point, the
network hung. *NOTHING* else had changed; both the Ethernet switch
and the NIC showed link lights, and based on the flashing lights,
traffic seemed to be getting from the system to the switch; however,
no replies were received, and the system -- which had been using the
network just fine less than 15 second ago -- now couldn't communicate
with anything else on the network. We quickly tried another Ethernet
cable; no difference. Suddenly, about a minute later, everything
started working again. What had happened, and why?
Additional information: 1) *Everything* on the Linux box was working
correctly. 2) The "tcpdump" command showed packets going out, but
no packets ever came back. 3) Had we run "tcpdump" on another node
and watched ping traffic between the two nodes, everything would have
looked like it was working. 4) The same things would (most likely)
have happened with a different Ethernet switch (either model, vendor,
or both).
A. Many (most?) Ethernet switches keep tables of which MAC addresses are
associated with which ports. Some (many? most?) switches don't check
every packet against this table; instead, they have some sort of
time-out mechanism. Until that mechanism clears the MAC address from
the old port, the switch continues to (mis-) direct packets to the
old port. This explain why ping appears to work on the target test
node, but the recently-moved node never gets a reply.
Comments
I don't really care if the candidate figures out the cause of
the problem. This is one of those questions that's designed to
show how a candidate approaches problems. I'm far more
interested in the process than in the solution.
===============================================================================
Q. What is the most common cause of DNS problems (not counting syntax
errors)?
A. Forgetting to increment the serial number.
Comment
If the candidate needs a hint, try asking this question instead:
"What must you *always* do *any* time you make a change to a DNS
zone file?
===============================================================================
Q. How do you "restart" DNS serial numbers? That is, if you accidentally
set your serial number to, say, 2099103100, how can you set it back
to 2000103100?
A. Start by adding 2147483647 (2 ^ 31 - 2) to the serial number; set the
refresh time to 600 (5 minutes); SIGHUP named; wait 10 minutes.
Set the serial number to 1; SIGHUP named; wait 10 minutes.
Set the serial number to the desired value; set the refresh time back
to whatever it was before you started this whole mess; SIGHUP named.
Comments
What that second step *really* does is this:
serial + (4294967296 - serial) - 4294967296 + 1.
See RFC 1912 and RFC 1982 for why this works. Again, if you
don't understand this answer, don't ask the question!
===============================================================================
Q. Please describe, in terms of what bits in the TCP header are on or
off, how a TCP connection is established.
Q. Please describe, in terms of header bits, how a TCP connection is
closed.
Q. Please describe the "theory" of the ACK bit and TCP packet sequence
numbers.
Comments
If you don't already know the answers to these questions, you
probably shouldn't be asking them.
This may seem like trivia, but if you're debugging networks
(especially routers/routing and/or firewalls), it's essential.
===============================================================================
Q. Please explain (briefly) the difference between paging and swapping.
A. Paging requires VM hardware support, and moves *parts* of individual
processes out of and back into main memory; swapping moves one or
more whole processes (as needed).
===============================================================================
Q. Please diagnose the problem described in the following situation:
While booting a Sun system, messages on the console indicated that
/var/spool/mail was full, and that there was no space left on the
device. When the system had finished booting to multi-user mode (in
what appears to be an otherwise successful manner), "df" showed that
/var/spool/mail was only 65% full, but / was 100% full. However, when
"du" was run on all directories that actually resided on the root
partition, only about 50% of the space could be accounted for.
Attempt to write to /var/spool/mail worked just fine, but attempts to
write to / failed with messages like "no space left on device."
Additional information: 1) To prove that "du" and "df" were working
correctly (which they were), tests were run on every other filesystem
to compare the results of the two commands; in every case but /, they
matched. 2) All mounts on the system were local; NFS was not being
used. 3) There were no other error messages in any of the logs. 4)
Rebooting the system did not fix the problem. 5) During the reboot,
the console was watched carefully for other errors; there were none.
6) Running fsck showed no errors, nor did it have any effect on the
problem. 7) The directories /var and /var/spool resided on the root
partition, but /var/spool/mail was a mount point for a second disk.
A. Somehow, sometime in the past, the system had booted but the mount
of /var/spool/mail had failed. It then ran in this state for some
period of time, during which rather large files were written to
/var/spool/mail -- WHICH WAS THEN ON THE ROOT PARTITION. Sometime
later, problem that had prevented /var/spool/mail from being mounted
was fixed and the system was rebooted. Unfortunately, the operator
forgot to remove the files in the /var/spool/mail still on the root
partition. This left root at 100%, but 50% of the space was now
hidden by the mount.
Comments
This is another "problem-solving question."
===============================================================================
Q. There is something "special" about how the "cd" command is implemented;
with respect to implementation, what is the difference between "cd"
and, say, "ls"? More importantly, why *must* "cd" be implemented in
this way?
A. The "cd" command *MUST* be implemented as a shell built-in command.
Current working directory is a process attribute, and it is not
possible for one process to change the attributes of another process.
If "cd" were a program, it would run as a different process from the
shell and, as such, could not affect the CWD of the process in which
the shell is running.
Comments
If the candidate needs a hint, ask if they've ever tried to do
"man cd" (or, if they have a system in front of them, suggest
that they try it).
===============================================================================
Q. What is an inode? What's the difference between an inode and it's
associated file? What is stored in the inode?
A. An inode holds information *about* the file, such as type, device
info, permissions, owner, group, size, number of links, and the block
map (whereas the file holds only data).
===============================================================================
Q. Is the file name stored in the inode?
A. No.
Q. Please describe (briefly) the difference between hard links and
symbolic links.
A. Hard links are multiple directory entries pointing to the same inode;
symlinks store the name of the "real" file as data (or maybe in spare
space in the inode).
Comments
If the candidate gets the first question wrong, they'll almost
certainly get the second one wrong, too.
===============================================================================
Q. Which conferences do you attend? How did you choose them? Given the
opportunity, would you go to these same conferences again? If you
haven't attended any conferences, why not?
Comments
For junior and *maybe* mid-level candidates, I will not hold it
against them if their answer is "none, because my employer
wouldn't pay for them." However, for senior-level candidates, I
do not consider this acceptable.