There is nothing magic about the argument 999999: it just needs to be a number larger than any expected count of unique words, and the keyboard repeat feature makes it easy to type.
We can also ask how many of the 4548 unique words were used just once:
$ wf 999999 < hamlet | grep -c '^ *1•' 2634
The • following the digit 1 in the grep pattern represents a tab. This result is surprising, and probably atypical of most modern English prose: although the play’s vocabulary is large, nearly 58 percent of the words occur only once. And yet, the core vocabulary of frequently occurring words is rather small:
$ wf 999999 < hamlet | awk '$1 >= 5' | wc -l 740
This is about the number of words that a student might be expected to learn in a semester course on a foreign language, or that a child learns before entering school.
Shakespeare didn’t have computers to help analyze his writing,* but we can speculate that part of his genius was in making most of what he wrote understandable to the broadest possible audience of his time.
When we applied wf to the individual texts of Shakespeare’s plays, we found that Hamlet has the largest vocabulary (4548), whereas Comedy of Errors has the smallest (2443). The total number of unique words in the Shakespeare corpus of plays and sonnets is nearly 23,700, which shows that you need exposure to several plays to enjoy the richness of his work. About 36 percent of those words are used only once, and only one word begins with x: Xanthippe, in Taming of the Shrew. Clearly, there is plenty of fodder in Shakespeare for word-puzzle enthusiasts and vocabulary analysts!