BrainDump
  Home arrow BrainDump arrow Page 5 - More Amazing Things to Do With Pipelin...
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Mobile Linux 
App Generation ROI 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BRAINDUMP

More Amazing Things to Do With Pipelines
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 3
    2008-07-02

    Table of Contents:
  • More Amazing Things to Do With Pipelines
  • 5.4 Word Lists
  • Word Lists, continued
  • 5.5 Tag Lists
  • 5.6 Summary

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    More Amazing Things to Do With Pipelines - 5.6 Summary


    (Page 5 of 5 )

    This chapter has shown how to solve several text processing problems, none of which would be simple to do in most programming languages. The critical lessons of this chapter are:

    1. Data markup is extremely valuable, although it need not be complex. A unique single character, such as a tab, colon, or comma, often suffices.
    2. Pipelines of simple Unix tools and short, often inline, programs in a suitable text processing language, such as awk, can exploit data markup to pass multiple pieces of data through a series of processing stages, emerging with a useful report.
    3. By keeping the data markup simple, the output of our tools can readily become input to new tools, as shown by our little analysis of the output of the word-frequency filter, wf, applied to Shakespeare’s texts.
    4. By preserving some minimal markup in the output, we can later come back and massage that data further, as we did to turn a simple ASCII office directory into a web page. Indeed, it is wise never to consider any form of electronic data as final: there is a growing demand in some quarters for page-description languages, such as PCL, PDF, and PostScript, to preserve the original markup that led to the page formatting. Word processor documents currently are almost devoid of useful logical markup, but that may change in the future. At the time of this writing, one prominent word processor vendor was reported to be considering an XML representation for document storage. The GNU Project’s gnumeric spreadsheet, the Linux Documentation Project,* and the OpenOffice.org† office suite already do that. 
    5. Lines with delimiter-separated fields are a convenient format for exchanging data with more complex software, such as spreadsheets and databases. Although such systems usually offer some sort of report-generation feature, it is often easier to extract the data as a stream of lines of fields, and then to apply filters written in suitable programming languages to manipulate the data further. For example, catalog and directory publishing are often best done this way.

     


     

    * On some systems, file formats are in Section 7; thus, you might need to use man 7 passwd instead.

    * In addition to this book (listed in the Bibliography), hundreds of books on SGML and derivatives are listed at
    http://www.math.utah.edu/pub/tex/bib/sgml.html and http://www.math.utah.edu/pub/tex/bib/sgml2000.html.

    a E. F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, 13(6) 377–387, June (1970), and Relational Database: A Practical Foundation for Productivity, Communications of the ACM, 25(2) 109–117, February (1982) (Turing Award lecture).

    b By Kevin Kline and Daniel Kline, O’Reilly & Associates, 2000, ISBN 1-56592-744-3. See also
    http://www.math.utah.edu/pub/tex/bib/
    sqlbooks.html for an extensive list of SQL books.

    * Available at http://www.math.utah.edu/pub/sgml/.

    * Available at ftp://ftp.ox.ac.uk/pub/wordlists/, ftp://qiclab.scn.rain.com/pub/wordlists/, ftp://ibiblio.org/pub/
    docs/books/gutenberg/etext96/pgw*,
    and http://www.phreak.org/html/wordlists.shtml. A search for “word list” in any Internet search engine turns up many more.

    * Programming Pearls: A Literate Program: A WEB program for common words, Comm. ACM 29(6), 471–483, June (1986), and Programming Pearls: Literate Programming: Printing Common Words, 30(7), 594–599, July (1987). Knuth’s paper is also reprinted in his book Literate Programming, Stanford University Center for the Study of Language and Information, 1992, ISBN 0-937073-80-6 (paper) and 0-937073-81-4 (cloth).

    * Programming Pearls: Associative Arrays, Comm. ACM 28(6), 570–576, June, (1985). This is an excellent introduction to the power of associative arrays (tables indexed by strings, rather than integers), a common
    feature of most scripting languages.

    † Available in the wonderful Project Gutenberg archives at
    http://www.gutenberg.net/.

    * Indeed, the only word related to the root of computer that Shakespeare used is computation, just once in each of two plays, Comedy of Errors and King Richard III. “Arithmetic” occurs six times in his plays, “calculate” twice, and “mathematics” thrice.

    * See http://www.tldp.org/.

    † See http://www.openoffice.org/.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · This article is an excerpt from the book "Classic Shell Scripting," published by...
     

    Buy this book now. This article is excerpted from chapter five of Classic Shell Scripting, written by Arnold Robbins and Nelson H.F. Beebe (O'Reilly; ISBN: 0596005954). Check it out today at your favorite bookstore. Buy this book now.

       

    BRAINDUMP ARTICLES

    - Advanced File I/O
    - More Amazing Things to Do With Pipelines
    - Pipelines Can Do Amazing Things
    - Better Command Execution with bash
    - Executing Commands with bash
    - Outsourcing: the Hoopla, the Reality
    - MySQL Plays in the Sun
    - All About SQL Functions
    - SQL: Functioning in the Real World
    - More Advanced SQL Statements
    - Beginning SQL the SEQUEL: Working with Advan...
    - Beginning SQL
    - A Look at the VI Editor
    - A Quick Tour of Boo
    - Book Review: Open Source Licensing





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
    Stay green...Green IT