Home arrow BrainDump arrow Page 3 - More Amazing Things to Do With Pipelines

Word Lists, continued - BrainDump

In this second part of a two-part series on pipelines in Unix, you will learn some fun ways to cheat at word puzzles and other, more useful tricks. This article is excerpted from chapter 5 of Classic Shell Scripting, written by Arnold Robbins and Nelson H.F. Beebe (O'Reilly; ISBN: 0596005954). Copyright © 2007 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

  1. More Amazing Things to Do With Pipelines
  2. 5.4 Word Lists
  3. Word Lists, continued
  4. 5.5 Tag Lists
  5. 5.6 Summary
By: O'Reilly Media
Rating: starstarstarstarstar / 5
July 02, 2008

print this article



The results are about as expected for English prose. More interesting, perhaps, is to ask how many unique words there are in the play:

  $ wf 999999 < hamlet | wc -l

and to look at some of the least-frequent words:

$ wf 999999 < hamlet | tail -n 12 | pr -c4 -t -w80

 1 yaw 1 yesterday 1 yielding 1 younger
1 yawn 1 yesternight 1 yon 1 yourselves
1 yeoman 1 yesty 1 yond 1 zone

There is nothing magic about the argument 999999: it just needs to be a number larger than any expected count of unique words, and the keyboard repeat feature makes it easy to type.

We can also ask how many of the 4548 unique words were used just once:

  $ wf 999999 < hamlet | grep -c '^ *1•'

The following the digit 1 in the grep pattern represents a tab. This result is surprising, and probably atypical of most modern English prose: although the play’s vocabulary is large, nearly 58 percent of the words occur only once. And yet, the core vocabulary of frequently occurring words is rather small:

  $ wf 999999 < hamlet | awk '$1 >= 5' | wc -l

This is about the number of words that a student might be expected to learn in a semester course on a foreign language, or that a child learns before entering school.

Shakespeare didn’t have computers to help analyze his writing,* but we can speculate that part of his genius was in making most of what he wrote understandable to the broadest possible audience of his time.

When we applied wf to the individual texts of Shakespeare’s plays, we found that Hamlet has the largest vocabulary (4548), whereas Comedy of Errors has the smallest (2443). The total number of unique words in the Shakespeare corpus of plays and sonnets is nearly 23,700, which shows that you need exposure to several plays to enjoy the richness of his work. About 36 percent of those words are used only once, and only one word begins with x: Xanthippe, in Taming of the Shrew. Clearly, there is plenty of fodder in Shakespeare for word-puzzle enthusiasts and vocabulary analysts!

>>> More BrainDump Articles          >>> More By O'Reilly Media

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Apple Founder Steve Jobs Dies
- Steve Jobs` Era at Apple Ends
- Google's Chrome Developer Tool Updated
- Google's Chrome 6 Browser Brings Speed to th...
- New Open Source Update Fedora 13 is Released...
- Install Linux with Knoppix
- iPad Developers Flock To SDK 3.2
- Managing a Linux Wireless Access Point
- Maintaining a Linux Wireless Access Point
- Securing a Linux Wireless Access Point
- Configuring a Linux Wireless Access Point
- Building a Linux Wireless Access Point
- Migrating Oracle to PostgreSQL with Enterpri...
- Demystifying SELinux on Kernel 2.6
- Yahoo and Microsoft Create Ad Partnership

Developer Shed Affiliates


Dev Shed Tutorial Topics: