Home arrow Site Administration arrow Page 3 - Dealing with Files and Filesystems

HACK#15: Manipulate Files with sed - Administration

In this first of a two-part article, you will learn how to get the most out of certain BSD commands, as well as some useful ways to handle your filesystem. It is excerpted from chapter two of the book BSD Hacks, written by Dru Lavigne (O'Reilly, 2005; ISBN: 0596006799). Copyright 2005 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:
  1. Dealing with Files and Filesystems
  2. HACK#14: Get the Most Out of grep
  3. HACK#15: Manipulate Files with sed
  4. HACK#16: Format Text at the Command Line
  5. HACK#17: Delimiter Dilemma
By: O'Reilly Media
Rating: starstarstarstarstar / 5
December 28, 2006

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

If you've ever had to change the formatting of a file, you know that it can be a time-consuming process.

Why waste your time making manual changes to files when Unix systems come with many tools that can very quickly make the changes for you?

Removing Blank Lines

Suppose you need to remove the blank lines from a file. This invocation ofgrep will do the job:

  % grep -v '^$' letter1.txt > tmp ; mv tmp letter1.txt

The pattern ^$ anchors to both the start and the end of a line with no intervening characters--the regexp definition of a blank line. The -voption reverses the search, printing all nonblank lines, which are then written to a temporary file, and the temporary file is moved back to the original.

grepmust never output to the same file it is reading, or the file will end up empty.

You can rewrite the preceding example insedas:

  % sed '/^$/d' letter1.txt > tmp ; mv tmp letter1.txt

'/^$/d' is actually a sed script. sed's normal mode of operation is to read each line of input, process it according to the script, and then write the processed line to standard output. In this example, the expression'/^$/is a regular expression matching a blank line, and the trailingd'is ased function that deletes the line. Blank lines are deleted and all other lines are printed. Again, the results are redirected to a temporary file, which is then copied back to the original file.

Searching with sed

sed can also do the work ofgrep:

  % sed -n '/$USER/p' *

This command will yield the same results as:

  % grep '$USER' *

The-n(no-print, perhaps) option preventssed from outputting each line. The pattern/$USER/matches lines containing$USER, and thepfunction prints matched lines to standard output, overriding-n.

Replacing Existing Text

One of the most common uses forsedis to perform a search and replace on a given string. For example, to change all occurrences of2003into2004 in a file called date, include the two search strings in the format's/oldstring/newstring/', like so:

  % sed 's/2003/2004/' date
  Copyright 2004
  ...
  This was written in 2004, but it is no longer 2003.
  ...

Almost! Noticed that that last 2003 remains unchanged. This is because without theg(global) flag,sedwill change only the first occurrence on each line. This command will give the desired result:

  % sed 's/2003/2004/g' date

Search and replace takes other flags too. To output only changed lines, use:

  % sed -n 's/2003/2004/gp' date

Note the use of the-nflag to suppress normal output and thepflag to print changed lines.

Multiple Transformations

Perhaps you need to perform two or more transformations on a file. You can do this in a single run by specifying a script with multiple commands:

  % sed 's/2003/2004/g;/^$/d' date

This performs both substitution and blank line deletion. Use a semicolon to separate the two commands.

Here is a more complex example that translates HTML tags of the form<font>into PHP bulletin board tags of the form[font]:

  % cat index.html
 
<title>hello
 
</title>
 
% sed 's/<\(.*\)>/[\1]/g' index.html  
  [title]hello
  [/title]

How did this work? The script searched for an HTML tag using the pattern'<.*>'. Angle brackets match literally. In a regular expression, a dot (.) represents any character and an asterisk (*) means zero or more of the previous item. Escaped parentheses,\(and\), capture the matched pattern laying between them and place it in a numbered buffer. In the replace string,\1refers to the contents of the first buffer. Thus the text between the angle brackets in the search string is captured into the first buffer and written back inside square brackets in the replace string.sedtakes full advantage of the power of regular expressions to copy text from the pattern to its replacement.

  % cat index1.html
 
<title>hello</title>
  % sed 's/<\(.*\)>/[\1]/g' index1.html
  [title>hello</title]

This time the same command fails because the pattern .* is greedy and grabs as much as it can, matching up to the second >. To prevent this behavior, we need to match zero or more of any character except<. Recall that[...]is a regular expression that lists characters to match, but if the first character is the caret (^), the match is reversed. Thus the regular expression[^<]matches any single character other than<. I can modify the previous example as follows:

  % sed 's/<\([^<]*\)>/[\1]/g' index1.html
 
[title]hello[/title]

Remember,grepwill perform a case-insensitive search if you provide the-iflag.sed, unfortunately, does not have such an option. To search fortitlein a case-insensitive manner, form regular expressions using[...], each listing a character of the word in both upper- and lowercase forms:

  % sed 's/[Tt][Ii][Tt][Ll][Ee]/title/g' title.html

See Also

  1. man grep
  2. man sed
  3. man re_format(regular expressions)
  4. "sed& Regular Expressions" at http://main.rtfiber.com.tw/~changyj/sed/
  5. Coolsedtricks at http://www.wagoneers.com/UNIX/SED/sed. html
  6. ThesedFAQ (http://doc.ddart.net/shell/sedfaq.htm)
  7. ThesedScript Archive (http://sed.sourceforge.net/grabbag/ scripts/)



 
 
>>> More Site Administration Articles          >>> More By O'Reilly Media
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: