Site Administration Page 4 - Dealing with Files and Filesystems |
Combine basic Unix tools to become a formatting expert. Don't let the syntax of thesedcommand scare you off.sedis a powerful utility capable of handling most of your formatting needs. For example, have you ever needed to add or remove comments from a source file? Perhaps you need to shuffle some text from one section to another. In this hack, I'll demonstrate how to do that. I'll also show some handy formatting tricks using two other built-in Unix commands,trandcol. Adding Comments to Source Code sedallows you to specify an address range using a pattern, so let's put this to use. Suppose we want to comment out a block of text in a source file by adding//to the start of each line we wish to comment out. We might use a text editor to mark the block withbc-startandbc-end: % cat source.c and then apply asedscript such as: % sed '/bc-start/,/bc-end/s/^/\/\//' source.cto get: if (tTd(27, 1)) The script used search and replace to add//to the start of all lines (s/^/\/\//) that lie between the two markers (bc-start/,/bc-end/). This will apply to every block in the file between the marker pairs. Note that in thesedscript, the/character has to be escaped as\/so it is not mistaken for a delimiter. Removing Comments When we need to delete the comments and the twobc-lines (let's assume that the edited contents were copied back to source.c), we can use a script such as: % sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^\/\///' source.c Oops! My first attempt won't work. Thebc-lines must be deleted after they have been used as address ranges. Trying again we get: % sed '/bc-start/,/bc-end/s/^\/\///;/bc-start/d;/bc-end/d' source.c If you want to leave the twobc-marker lines in but comment them out, use this piece of trickery: % sed '/bc-start/,/bc-end/{/^\/\/bc-/\!s/\/\///;}' source.cto get: if (tTd(27, 1)) Note that in thebash shell you must use: % sed '/bc-start/,/bc-end/{/^\/\/bc-/!s/\/\///;}' source.c because the bang character (!) does not need to be escaped as it does intcsh. What's with the curly braces? They prevent a common mistake. You may imagine that this example: % sed -n '/$USER/p;p' *prints each line containing $USER twice because of the p;p commands. It doesn't, though, because the second pis not restrained by the/$USER/line address and therefore applies to every line. To print twice just those lines containing$USER, use: % sed -n '/$USER/p;/$USER/p' * or: % sed -n '/$USER/{p;p;}' * The construct{...}introduces a function list that applies to the preceding line address or range. A line address followed by!(or\!in thetcshshell) reverses the address range, and so the function (list) that follows is applied to all lines not matching. The net effect is to remove//from all lines that dont start with//bc-but that do lie within thebc-markers. Using the Holding Space to Mark Text sedreads input into the pattern space, but it also provides a buffer (called the holding space) and functions to move text from one space to the other. All other functions (such assandd) operate on the pattern space, not the holding space. Check out thissedscript: % cat case.script First, I have written the script to a file instead of typing it in on the command line. Lines starting with#are comments and are ignored. Other lines specify asedcommand, and commands are separated by either a newline or;character.sed reads one line of input at a time and applies the whole script file to each line. The following functions are applied to each line as it is read: h
y/ABC/abc/
/test/ {...}
x
p
Here is the test file: % cat case Here are the results of running oursedscript on it: % sed -n -f case.script case Notice thevvv ^^^markers around lines that containtest. Translating Case Thetrcommand can translate one character to another. To change the contents of case into all lowercase and write the results to file lower-case, we could use: % tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' \ tr works with standard input and output only, so to read and write files we must use redirection. Translating Characters To translate carriage return characters into newline characters, we could use: % tr \\r \\n < cr > lfwhere cr is the original file and lf is a new file containing line feeds in place of carriage returns.\nrepresents a line feed character, but we must escape the backslash character in the shell, so we use\\ninstead. Similarly, a carriage return is specified as\\r. Removing Duplicate Line Feeds trcan also squeeze multiple consecutive occurrences of a particular character into a single occurrence. For example, to remove duplicate line feeds from the lines file: % tr -s \\n < lines > tmp ; mv tmp lines Here we use the tmp file trick again becausetr, likegrepandsed, will trash the input file if it is also the output file. Deleting Characters trcan also delete selected characters. If for instance if you hate vowels, run your documents through this: % tr -d aeiou < fileTranslating Tabs to Spaces To translate tabs into multiple spaces, use the-xflag: % cat tabs In this example I have usedod -xto octal dump in hexadecimal the contents of the before and after files, which shows more clearly that the translation has worked. (09is the code for Tab and20is the code for Space.) See Also
blog comments powered by Disqus |
|
|
|
|
|
|
|