Site Administration Page 3 - Dealing with Files and Filesystems |
If you've ever had to change the formatting of a file, you know that it can be a time-consuming process. Why waste your time making manual changes to files when Unix systems come with many tools that can very quickly make the changes for you? Removing Blank Lines Suppose you need to remove the blank lines from a file. This invocation ofgrep will do the job: % grep -v '^$' letter1.txt > tmp ; mv tmp letter1.txtThe pattern ^$ anchors to both the start and the end of a line with no intervening characters--the regexp definition of a blank line. The -voption reverses the search, printing all nonblank lines, which are then written to a temporary file, and the temporary file is moved back to the original.
You can rewrite the preceding example insedas: % sed '/^$/d' letter1.txt > tmp ; mv tmp letter1.txt'/^$/d' is actually a sed script. sed's normal mode of operation is to read each line of input, process it according to the script, and then write the processed line to standard output. In this example, the expression'/^$/is a regular expression matching a blank line, and the trailingd'is ased function that deletes the line. Blank lines are deleted and all other lines are printed. Again, the results are redirected to a temporary file, which is then copied back to the original file. Searching with sed sed can also do the work ofgrep: % sed -n '/$USER/p' * This command will yield the same results as: % grep '$USER' * The-n(no-print, perhaps) option preventssed from outputting each line. The pattern/$USER/matches lines containing$USER, and thepfunction prints matched lines to standard output, overriding-n. Replacing Existing Text One of the most common uses forsedis to perform a search and replace on a given string. For example, to change all occurrences of2003into2004 in a file called date, include the two search strings in the format's/oldstring/newstring/', like so: % sed 's/2003/2004/' date Almost! Noticed that that last 2003 remains unchanged. This is because without theg(global) flag,sedwill change only the first occurrence on each line. This command will give the desired result: % sed 's/2003/2004/g' dateSearch and replace takes other flags too. To output only changed lines, use: % sed -n 's/2003/2004/gp' date Note the use of the-nflag to suppress normal output and thepflag to print changed lines. Multiple Transformations Perhaps you need to perform two or more transformations on a file. You can do this in a single run by specifying a script with multiple commands: % sed 's/2003/2004/g;/^$/d' date This performs both substitution and blank line deletion. Use a semicolon to separate the two commands. Here is a more complex example that translates HTML tags of the form<font>into PHP bulletin board tags of the form[font]: % cat index.html<title>hello </title> % sed 's/<\(.*\)>/[\1]/g' index.html [title]hello [/title] How did this work? The script searched for an HTML tag using the pattern'<.*>'. Angle brackets match literally. In a regular expression, a dot (.) represents any character and an asterisk (*) means zero or more of the previous item. Escaped parentheses,\(and\), capture the matched pattern laying between them and place it in a numbered buffer. In the replace string,\1refers to the contents of the first buffer. Thus the text between the angle brackets in the search string is captured into the first buffer and written back inside square brackets in the replace string.sedtakes full advantage of the power of regular expressions to copy text from the pattern to its replacement. % cat index1.html This time the same command fails because the pattern .* is greedy and grabs as much as it can, matching up to the second >. To prevent this behavior, we need to match zero or more of any character except<. Recall that[...]is a regular expression that lists characters to match, but if the first character is the caret (^), the match is reversed. Thus the regular expression[^<]matches any single character other than<. I can modify the previous example as follows: % sed 's/<\([^<]*\)>/[\1]/g' index1.html Remember,grepwill perform a case-insensitive search if you provide the-iflag.sed, unfortunately, does not have such an option. To search fortitlein a case-insensitive manner, form regular expressions using[...], each listing a character of the word in both upper- and lowercase forms: % sed 's/[Tt][Ii][Tt][Ll][Ee]/title/g' title.html See Also
blog comments powered by Disqus |
|
|
|
|
|
|
|