Home arrow Site Administration arrow Page 4 - Dealing with Files and Filesystems

HACK#16: Format Text at the Command Line - Administration

In this first of a two-part article, you will learn how to get the most out of certain BSD commands, as well as some useful ways to handle your filesystem. It is excerpted from chapter two of the book BSD Hacks, written by Dru Lavigne (O'Reilly, 2005; ISBN: 0596006799). Copyright 2005 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

TABLE OF CONTENTS:
  1. Dealing with Files and Filesystems
  2. HACK#14: Get the Most Out of grep
  3. HACK#15: Manipulate Files with sed
  4. HACK#16: Format Text at the Command Line
  5. HACK#17: Delimiter Dilemma
By: O'Reilly Media
Rating: starstarstarstarstar / 5
December 28, 2006

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Combine basic Unix tools to become a formatting expert.

Don't let the syntax of thesedcommand scare you off.sedis a powerful utility capable of handling most of your formatting needs. For example, have you ever needed to add or remove comments from a source file? Perhaps you need to shuffle some text from one section to another.

In this hack, I'll demonstrate how to do that. I'll also show some handy formatting tricks using two other built-in Unix commands,trandcol.

Adding Comments to Source Code

sedallows you to specify an address range using a pattern, so let's put this to use. Suppose we want to comment out a block of text in a source file by adding//to the start of each line we wish to comment out. We might use a text editor to mark the block withbc-startandbc-end:

  % cat source.c
    if (tTd(27, 1))
      sm_dprintf("%s (%s, %s) aliased to %s\n",
         a->q_paddr, a->q_host, a->q_user, p);
    bc-start
      if (bitset(EF_VRFYONLY, e->e_flags))
  
    {
      a->q_state = QS_VERIFIED;
      return;
   
}
    bc-end
    message("aliased to %s", shortenstring(p, MAXSHORTSTR));

and then apply asedscript such as:

  % sed '/bc-start/,/bc-end/s/^/\/\//' source.c

to get:

   if (tTd(27, 1))
       sm_dprintf("%s (%s, %s) aliased to %s\n",
           a->q_paddr, a->q_host, a->q_user, p);
  
//bc-start
  
//  if (bitset(EF_VRFYONLY, e->e_flags))
  
//  {
  
//      a->q_state = QS_VERIFIED;
  
//      return;
  
//  }
  
//bc-end
 
message("aliased to %s", shortenstring(p, MAXSHORTSTR));

The script used search and replace to add//to the start of all lines (s/^/\/\//) that lie between the two markers (bc-start/,/bc-end/). This will apply to every block in the file between the marker pairs. Note that in thesedscript, the/character has to be escaped as\/so it is not mistaken for a delimiter.

Removing Comments

When we need to delete the comments and the twobc-lines (let's assume that the edited contents were copied back to source.c), we can use a script such as:

  % sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^\/\///' source.c

Oops! My first attempt won't work. Thebc-lines must be deleted after they have been used as address ranges. Trying again we get:

  % sed '/bc-start/,/bc-end/s/^\/\///;/bc-start/d;/bc-end/d' source.c

If you want to leave the twobc-marker lines in but comment them out, use this piece of trickery:

  % sed '/bc-start/,/bc-end/{/^\/\/bc-/\!s/\/\///;}' source.c

to get:

 if (tTd(27, 1))
      sm_dprintf("%s (%s, %s) aliased to %s\n",
          a->q_paddr, a->q_host, a->q_user, p);
    //bc-start
  if (bitset(EF_VRFYONLY, e->e_flags))
  {
      a->q_state = QS_VERIFIED;
      return;
 
}
    //bc-end
  message("aliased to %s", shortenstring(p, MAXSHORTSTR));

Note that in thebash shell you must use:

  % sed '/bc-start/,/bc-end/{/^\/\/bc-/!s/\/\///;}' source.c

because the bang character (!) does not need to be escaped as it does intcsh.

What's with the curly braces? They prevent a common mistake. You may imagine that this example:

  % sed -n '/$USER/p;p' *

prints each line containing $USER twice because of the p;p commands. It doesn't, though, because the second pis not restrained by the/$USER/line address and therefore applies to every line. To print twice just those lines containing$USER, use:

  % sed -n '/$USER/p;/$USER/p' *

or:

  % sed -n '/$USER/{p;p;}' *

The construct{...}introduces a function list that applies to the preceding line address or range.

A line address followed by!(or\!in thetcshshell) reverses the address range, and so the function (list) that follows is applied to all lines not matching. The net effect is to remove//from all lines that dont start with//bc-but that do lie within thebc-markers.

Using the Holding Space to Mark Text

sedreads input into the pattern space, but it also provides a buffer (called the holding space) and functions to move text from one space to the other. All other functions (such assandd) operate on the pattern space, not the holding space.

Check out thissedscript:

  % cat case.script
 
# Sed script for case insensitive search  
  #
  # copy pattern space to hold space to preserve it
  h  
  y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
abcdefghijklmnopqrstuvwxyz/
  # use a regular expression address to search for lines containing:
  /test/ {
  i\
  vvvv
  a\
  ^^^^
  }
  # restore the original pattern space from the hold space
  x;p

First, I have written the script to a file instead of typing it in on the command line. Lines starting with#are comments and are ignored. Other lines specify asedcommand, and commands are separated by either a newline or;character.sed reads one line of input at a time and applies the whole script file to each line. The following functions are applied to each line as it is read:

h

Copies the pattern space (the line just read) into the holding space.

y/ABC/abc/

Operates on the pattern space, translatingAtoa,Btob, andCtocand so on, ensuring the line is all lowercase.

/test/ {...}

Matches the line just read if it includes the texttest(whatever the original case, because the line is now all lowercase) and then applies the list of functions that follow. This example appends text before (i\) and after (a\) the matched line to highlight it.

x

Exchanges the pattern and hold space, thus restoring the original contents of the pattern space.

p

Prints the pattern space.

Here is the test file:

  % cat case
  This contains text      Hello
  that we want to         TeSt
  search for, but in      test
  a case insensitive      XXXX
  manner using the sed    TEST
  editor.                 Bye bye.
  %

Here are the results of running oursedscript on it:

  % sed -n -f case.script case
 
This contains text      Hello
  vvvv
  that we want to         TeSt
  ^^^^
  vvvv
  search for, but in      test
  ^^^^
  a case insensitive      XXXX
  vvvv
  manner using the sed    TEST
  ^^^^
  editor.                 Bye bye.

Notice thevvv ^^^markers around lines that containtest.

Translating Case

Thetrcommand can translate one character to another. To change the contents of case into all lowercase and write the results to file lower-case, we could use:

  % tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' \
    < case > lower-case

tr works with standard input and output only, so to read and write files we must use redirection.

Translating Characters

To translate carriage return characters into newline characters, we could use:

  % tr \\r \\n < cr > lf

where cr is the original file and lf is a new file containing line feeds in place of carriage returns.\nrepresents a line feed character, but we must escape the backslash character in the shell, so we use\\ninstead. Similarly, a carriage return is specified as\\r.

Removing Duplicate Line Feeds

trcan also squeeze multiple consecutive occurrences of a particular character into a single occurrence. For example, to remove duplicate line feeds from the lines file:

  % tr -s \\n < lines > tmp ; mv tmp lines

Here we use the tmp file trick again becausetr, likegrepandsed, will trash the input file if it is also the output file.

Deleting Characters

trcan also delete selected characters. If for instance if you hate vowels, run your documents through this:

  % tr -d aeiou < file

Translating Tabs to Spaces

To translate tabs into multiple spaces, use the-xflag:

  % cat tabs
 
col     col    col
 
% od -x tabs
  0000000     636f 6c09  636f  6c09  636f 6c0a  0a00
 
0000015
 
% col -x < tabs > spaces
  % cat spaces
  col  col  col
 
% od -h spaces
 
0000000    636f  6c20  2020  2020  636f 6c20  2020 2020
  0000020    636f  6c0a  0a00
  0000025

In this example I have usedod -xto octal dump in hexadecimal the contents of the before and after files, which shows more clearly that the translation has worked. (09is the code for Tab and20is the code for Space.)

See Also

  • man sed
  • man tr
  • man col
  • man od


 
 
>>> More Site Administration Articles          >>> More By O'Reilly Media
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: