BrainDump
  Home arrow BrainDump arrow More Amazing Things to Do With Pipelines
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BRAINDUMP

More Amazing Things to Do With Pipelines
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2008-07-02


    Table of Contents:
  • More Amazing Things to Do With Pipelines
  • 5.4 Word Lists
  • Word Lists, continued
  • 5.5 Tag Lists
  • 5.6 Summary

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    More Amazing Things to Do With Pipelines
    ( Page 1 of 5 )

    In this second part of a two-part series on pipelines in Unix, you will learn some fun ways to cheat at word puzzles and other, more useful tricks. This article is excerpted from chapter 5 of Classic Shell Scripting, written by Arnold Robbins and Nelson H.F. Beebe (O'Reilly; ISBN: 0596005954). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

    5.3 Cheating at Word Puzzles

    Crossword puzzles give you clues about words, but most of us get stuck when we cannot think of, say, a ten-letter word that begins with a b and has either an x or a z in the seventh position.

    Regular-expression pattern matching with awk or grep is clearly called for, but what files do we search? One good choice is the Unix spelling dictionary, available as /usr/dict/words, on many systems. (Other popular locations for this file are /usr/share/dict/words and /usr/share/lib/dict/words.) This is a simple text file, with one word per line, sorted in lexicographic order. We can easily create other similar-appearing files from any collection of text files, like this:

      cat file(s) | tr A-Z a-z | tr -c a-z\' '\n' | sort -u

    The second pipeline stage converts uppercase to lowercase, the third replaces nonletters by newlines, and the last sorts the result, keeping only unique lines. The third stage treats apostrophes as letters, since they are used in contractions. Every Unix system has collections of text that can be mined in this way—for example, the formatted manual pages in /usr/man/cat*/* and /usr/local/man/cat*/*. On one of our systems, they supplied more than 1 million lines of prose and produced a list of about 44,000 unique words. There are also word lists for dozens of languages in various Internet archives.*

    Let us assume that we have built up a collection of word lists in this way, and we stored them in a standard place that we can reference from a script. We can then write the program shown in Example 5-4.

    Example 5-4. Word puzzle solution helper

    #! /bin/sh
    # Match an egrep(1)-like pattern against a collection of
    # word lists.
    #
    # Usage:
    #   puzzle-help egrep-pattern [word-list-files]

    FILES="
      /usr/dict/words
      /usr/share/dict/words
      /usr/share/lib/dict/words
      /usr/local/share/dict/words.biology
      /usr/local/share/dict/words.chemistry
      /usr/local/share/dict/words.general
      /usr/local/share/dict/words.knuth
      /usr/local/share/dict/words.latin
      /usr/local/share/dict/words.manpages
      /usr/local/share/dict/words.mathematics
      /usr/local/share/dict/words.physics
      /usr/local/share/dict/words.roget
      /usr/local/share/dict/words.sciences
      /usr/local/share/dict/words.unix
      /usr/local/share/dict/words.webster
          "
    pattern="$1"

    egrep -h -i "$pattern" $FILES 2> /dev/null | sort -u -f

    The FILES variable holds the built-in list of word-list files, customized to the local site. The grep option –h suppresses filenames from the report, the –i option ignores lettercase, and we discard the standard error output with 2> /dev/null, in case any of the word-list files don’t exist or they lack the necessary read permission. (This kind of redirection is described in “File Descriptor Manipulation” [7.3.2].) The final sort stage reduces the report to just a list of unique words, ignoring lettercase.

    Now we can find the word that we were looking for:

      $ Puzzle-help '^b.....[xz]...$' | fmt
     
    bamboozled Bamboozler bamboozles bdDenizens bdWheezing Belshazzar
      botanizing Brontozoum Bucholzite bulldozing

    Can you think of an English word with six consonants in a row? Here’s some help:

      $ puzzle-help '[^aeiouy]{6}' /usr/dict/words
     
    Knightsbridge
      mightn't
      oughtn't

    If you don’t count as a vowel, many more turn up: encryption, klystron, porphyry, syzygy, and so on.

    We could readily exclude the contractions from the word lists by a final filter step—egrep -i '^[a-z]+$'—but there is little harm in leaving them in the word lists.



     
     
    >>> More BrainDump Articles          >>> More By O'Reilly Media
     

       

    BRAINDUMP ARTICLES

    - Demystifying SELinux on Kernel 2.6
    - Yahoo and Microsoft Create Ad Partnership
    - The Advantages of Obscure Open Source Browse...
    - Dell Announces CSI-style Digital Forensics S...
    - Milepost GCC Speeds Open-Source Development
    - Learn These 10 Programming Languages
    - Tomcat Capacity Planning
    - Internal and External Performance Tuning wit...
    - Tomcat Benchmark Procedure
    - Benchmarking Tomcat Performance
    - Tomcat Performance Tuning
    - Wubi: Windows-based Ubuntu Installer
    - Configuring and Optimizing Your I/O Scheduler
    - Linux I/O Schedulers
    - Advising the Linux Kernel on File I/O





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 1 Hosted by Hostway
    Stay green...Green IT