Since the password file is publicly readable, any data derived from it is public as well, so there is no real need to restrict access to our program’s intermediate files. However, because all of us at times have to deal with sensitive data, it is good to develop the programming habit of allowing file access only to those users or processes that need it. We therefore reset the umask (see “Default permissions” in Appendix B) as the first action in our program: umask 077 Restrict temporary file access to just us For accountability and debugging, it is helpful to have some commonality in temporary filenames, and to avoid cluttering the current directory with them: we name them with the prefix /tmp/pd.. To guard against name collisions if multiple instances of our program are running at the same time, we also need the names to be unique: the process number, available in the shell variable $$, provides a distinguishing suffix. (This use of $$ is described in more detail in Chapter 10.) We therefore define these shell variables to represent our temporary files: PERSON=/tmp/pd.key.person.$$ Unique temporary filenames When the job terminates, either normally or abnormally, we want the temporary files to be deleted, so we use the trap command: trap "exit 1" HUP INT PIPE QUIT TERM During development, we can just comment out the second trap, preserving temporary files for subsequent examination. (The trap command is described in “Trapping Process Signals” [13.3.2]. For now, it’s enough to understand that when the script exits, the trap command arranges to automatically run rm with the given arguments.) We need fields one and five repeatedly, and once we have them, we don’t require the input stream from standard input again, so we begin by extracting them into a temporary file: awk -F: '{ print $1 ":" $5 }' > We make the key:person pair file first, with a two-step sed program followed by a simple line sort; the sort command is discussed in detail in “Sorting Text” [4.1]. sed -e 's=/.*==' \ The script uses = as the separator character for sed’s s command, since both slashes and colons appear in the data. The first edit strips everything from the first slash to the end of the line, reducing a line like this: jones:Adrian W. Jones/OSD211/ to this: jones:Adrian W. Jones Result of first edit The second edit is more complex, matching three subpatterns in the record. The first part, ^\([^:]*\), matches the username field (e.g., jones). The second part, \(.*\)❒, matches text up to a space (e.g., Adrian❒W.❒; the ❒ stands for a space character). The last part, \([^❒]*\), matches the remaining nonspace text in the record (e.g., Jones). The replacement text reorders the matches, producing something like Jones, ❒Adrian W. The result of this single sed command is the desired reordering: jones:Jones, Adrian W. Printed result of second edit Next, we make the key:office pair file: sed -e 's=^\([^:]*\):[^/]*/\([^/]*\)/.*$=\1:\2=' < $USER | sort > $OFFICE The result is a list of users and offices: jones:OSD211 The key:telephone pair file creation is similar: we just need to adjust the match pattern: sed -e 's=^\([^:]*\):[^/]*/[^/]*/\([^/]*\)=\1:\2=' < $USER | sort > $TELEPHONE At this stage, we have three separate files, each of which is sorted. Each file consists of the key (the username), a colon, and the particular data (personal name, office, telephone number). The $PERSON file’s contents look like this: ben:Franklin, Ben The $OFFICE file has username and office data: ben:OSD212 The $TELEPHONE file records usernames and telephone numbers: ben:555-0022 By default, join outputs the common key, then the remaining fields of the line from the first file, followed by the remaining fields of the line from the second line. The common key defaults to the first field, but that can be changed by a command-line option: we don’t need that feature here. Normally, spaces separate fields for join, but we can change the separator with its –t option: we use it as –t:. The join operations are done with a five-stage pipeline, as follows:
Here’s the complete pipeline: join -t: $PERSON $OFFICE | The awk printf statement used here is similar enough to the shell printf command that its meaning should be clear: print the first colon-separated field left-adjusted in a 39-character field, followed by a tab, the second field, another tab, and the third field. Here are the full results: Franklin, Ben •OSD212•555-0022 That is all there is to it! Our entire script is slightly more than 20 lines long, excluding comments, with five main processing steps. We collect it together in one place in Example 5-1. Example 5-1. Creating an office directory #! /bin/sh umask 077 PERSON=/tmp/pd.key.person.$$ OFFICE=/tmp/pd.key.office.$$ TELEPHONE=/tmp/pd.key.telephone.$$ USER=/tmp/pd.key.user.$$ trap "exit 1" HUP INT PIPE QUIT TERM trap "rm -f $PERSON $OFFICE $TELEPHONE $USER" EXIT awk -F: '{ print $1 ":" $5 }' > $USER sed -e 's=/.*==' \ sed -e 's=^\([^:]*\):[^/]*/\([^/]*\)/.*$=\1:\2=' < $USER | sort > $OFFICE sed -e 's=^\([^:]*\):[^/]*/[^/]*/\([^/]*\)=\1:\2=' < $USER | sort > $TELEPHONE join -t: $PERSON $OFFICE | The real power of shell scripting shows itself when we want to modify the script to do a slightly different job, such as insertion of the job title from a separately join -t: $PERSON /etc/passwd.job-title | Extra join with job title The total cost for the extra directory field is one more join, a change in the sort fields, and a small tweak in the final awk formatting command. Because we were careful to preserve special field delimiters in our output, we can trivially prepare useful alternative directories like this: passwd-to-directory < /etc/passwd | sort -t'•' -k2,2 > dir.by-office As usual, • represents an ASCII tab character. A critical assumption of our program is that there is a unique key for each data record. With that unique key, separate views of the data can be maintained in files as key:value pairs. Here, the key was a Unix username, but in larger contexts, it could be a book number (ISBN), credit card number, employee number, national retirement system number, part number, student number, and so on. Now you know why we get so many numbers assigned to us! You can also see that those handles need not be numbers: they just need to be unique text strings.
blog comments powered by Disqus |
|
|
|
|
|
|
|