BrainDump
  Home arrow BrainDump arrow Page 3 - More Amazing Things to Do With Pipelines
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BRAINDUMP

More Amazing Things to Do With Pipelines
By: O'Reilly Media
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2008-07-02


    Table of Contents:
  • More Amazing Things to Do With Pipelines
  • 5.4 Word Lists
  • Word Lists, continued
  • 5.5 Tag Lists
  • 5.6 Summary

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    More Amazing Things to Do With Pipelines - Word Lists, continued
    ( Page 3 of 5 )

    The results are about as expected for English prose. More interesting, perhaps, is to ask how many unique words there are in the play:

      $ wf 999999 < hamlet | wc -l
     
    4548

    and to look at some of the least-frequent words:

    $ wf 999999 < hamlet | tail -n 12 | pr -c4 -t -w80
     1 yaw 1 yesterday 1 yielding 1 younger
    1 yawn 1 yesternight 1 yon 1 yourselves
    1 yeoman 1 yesty 1 yond 1 zone

    There is nothing magic about the argument 999999: it just needs to be a number larger than any expected count of unique words, and the keyboard repeat feature makes it easy to type.

    We can also ask how many of the 4548 unique words were used just once:

      $ wf 999999 < hamlet | grep -c '^ *1•'
     
    2634

    The following the digit 1 in the grep pattern represents a tab. This result is surprising, and probably atypical of most modern English prose: although the play’s vocabulary is large, nearly 58 percent of the words occur only once. And yet, the core vocabulary of frequently occurring words is rather small:

      $ wf 999999 < hamlet | awk '$1 >= 5' | wc -l
     
    740

    This is about the number of words that a student might be expected to learn in a semester course on a foreign language, or that a child learns before entering school.

    Shakespeare didn’t have computers to help analyze his writing,* but we can speculate that part of his genius was in making most of what he wrote understandable to the broadest possible audience of his time.

    When we applied wf to the individual texts of Shakespeare’s plays, we found that Hamlet has the largest vocabulary (4548), whereas Comedy of Errors has the smallest (2443). The total number of unique words in the Shakespeare corpus of plays and sonnets is nearly 23,700, which shows that you need exposure to several plays to enjoy the richness of his work. About 36 percent of those words are used only once, and only one word begins with x: Xanthippe, in Taming of the Shrew. Clearly, there is plenty of fodder in Shakespeare for word-puzzle enthusiasts and vocabulary analysts!



     
     
    >>> More BrainDump Articles          >>> More By O'Reilly Media
     

       

    BRAINDUMP ARTICLES

    - Demystifying SELinux on Kernel 2.6
    - Yahoo and Microsoft Create Ad Partnership
    - The Advantages of Obscure Open Source Browse...
    - Dell Announces CSI-style Digital Forensics S...
    - Milepost GCC Speeds Open-Source Development
    - Learn These 10 Programming Languages
    - Tomcat Capacity Planning
    - Internal and External Performance Tuning wit...
    - Tomcat Benchmark Procedure
    - Benchmarking Tomcat Performance
    - Tomcat Performance Tuning
    - Wubi: Windows-based Ubuntu Installer
    - Configuring and Optimizing Your I/O Scheduler
    - Linux I/O Schedulers
    - Advising the Linux Kernel on File I/O





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    Stay green...Green IT