Finishing the System`s Outlines

This second part of a two-part article completes our coverage of how to talk to a client so that you are both on the same page when designing a system and understanding what it will be required to do. It is excerpted from Prefactoring, written by Ken Pugh (O’Reilly; ISBN: 596008740). Copyright © 2007 O’Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O’Reilly Media.

Clumping

When Sam described his customers in detail, he mentioned that he needed to keep track of each customer’s home address, including street, city, state, and Zip Code, as well as credit card billing address, including street, city, state, and Zip Code.

I asked him, "Do both of those addresses contain the same information?"

He replied affirmatively.

I said, "Then let’s just describe the combination as an Address. That way, you don’t have to keep mentioning all the parts unless there is something different about them."

"OK," he answered.

We clumped the data into a class, as follows:

  class Address
      {
      String line1;
      String line2;
      String city;
      String state;
      String zip;
      }

At this point, we simply clump the related data, even though we have not assigned any behavior to the class. This data object helps in abstraction and in cutting down parameter lists. Even though the class contains only data at this point, we might be able to assign responsibility to it later on.*

Clumping and lumping look similar, but they have distinctly different meanings. Clumping involves combining a set of attributes into a single named concept. The attributes should form a cohesive whole. Lumping involves using a single name for two different concepts. Clumping is an abstraction technique, which makes for an efficient description of a set of data. Lumping can hide relevant distinctions between concepts.

CLUMP DATA SO THAT THERE IS LESS TO THINK ABOUT

Clumping data cuts down on the number of concepts that have to be kept in mind.

{mospagebreak title=Abstracting} 

In creating a description of a use case or a model of a possible class, avoid using primitive data types. Pretend that ints or doubles do not exist. Almost every type of number can be described with an abstract data type (ADT). Items are priced in Dollars (or CurrencyUnits, if you are globally oriented). The number of physical copies of an item in an inventory is a Count. The discount that a good customer receives is denoted with a Percentage. The size of a CDDisc is expressed as a Length (or LengthInMeters or LengthInInches if you are going to be sending a satellite into space). The time for a single song on a CDRelease could be stored in a TimePeriod.

Using an ADT places the focus on what can be done with the type, not on how the type is represented. An ADT shows what you intend to do with the variable. You can declare the variable as a primitive data type and name the variable to reflect that intent. However, a variable declared as an abstract data type can have built-in validation, whereas a variable declared as a primitive cannot.

Each ADT needs a related description. For example, a Count represents a number of items. A Count can be zero or positive. If a Count is negative, it represents an invalid count. Declaring a variable as a Count conveys this information. You can create variations of Count. You may have a CountWithLimit data type with a maximum count that, if exceeded, would signal an error.

You can place limits on many different data types. For example, Ages (of humans) can range between 0 and 150 years, SpeedLimits (for automobiles) between 5 and 80 mph, and Elevations (for normal flying) between 0 and 60,000 feet.* All these types can be represented by an int or a double, but that is an implementation issue, not an abstraction issue.

Abstract types can contain more than just validation. A price can be represented in Dollars. The string representation of a Dollar differs from the string representation of a double. A string for Dollar has at least a currency symbol and perhaps some thousands separators (e.g., commas). Multiplying a Dollar by a number can result in a Dollar with cents, but not fractions of a cent. Here is a possible Dollar class:

  class Dollar
     
{
     
Dollar multiply_with_rounding(double multiplier);
     
Dollar add(Dollar another_dollar);
     
Dollar subtract(Dollar another_dollar);
     
String to_string(); //
     
};

If your language provides the ability to define operators for a class (such as + and -), you can use arithmetic symbols for the corresponding operations. You can also use the appropriate method name to have a Dollar be converted automatically to a String if it appears in an appropriate context. How you represent the abstract data type that you use for a value is an implementation detail. You can make up a class for each type. If you work with C++, you can make up typedefs for each simple type for which there is no additional functionality. For other languages, you can convert some simple types into primitive types. In that case, you might want to use variable names that include the type (e.g., double price_in_dollars).

WHEN YOU’RE ABSTRACT, BE ABSTRACT ALL THE WAY

Do not describe data items using primitive data types.

This guideline suggests using explicit typing in describing the problem and the solution. By using abstract data types in the beginning, you are, in effect, more abstract. If you explicitly type all your attributes and parameters, you can always switch to implicit typing if the explicit typing gets in the way. It is much harder to go in reverse.


ADTS AND SPEED OF DEVELOPMENT

I was the chief judge of the Droege Developer Competition. In this event, pairs of developers competed to code a project for a nonprofit organization in one day. They were provided specifications and the results were judged the next day. Developers who used a product from Magic Software (http:// www.magicsoftware.com/) consistently scored high and often won the competition. That product includes the ability to define data types that contain validation rules, display rules (e.g., drop-down lists or radio buttons), and formatting rules. The record layout in the tables is defined using these data types, rather than primitive Structured Query Language (SQL) types. To display a record, the record fields were placed onto the display window. The requisite formatting and validation for each field were already coded. Additional validation could be added when necessary.

The data typing feature, which is a particular implementation of abstract data typing, was key in the ability to create a working system quickly.


{mospagebreak title=Not Just a String}

A more descriptive data type can represent many data types represented typically by a String. For example, although a name can be declared as a String, you could declare it as a

String . For example, although a name can be declared as a String , you could declare it as a A more descriptive data type can represent many data types represented typically by a String. For example, although a name can be declared as a String, you could declare it as a clumped data object:

  class Name
      {
      String first_name;
      String last_name;
      String title;        // e.g. Mr. Mrs.
      String suffix;    // e.g. Jr. III
      };

To avoid the "everything is a String" syndrome, come up with a different type name to describe a variable that holds a set of characters that does not have any validation, formatting, or other meta-information associated with it. Suppose you decide on CommonString. Use that name in place of String to declare the data types of attributes, and reserve String as an implementation type. Then ask the question "Is that attribute really a CommonString?

Let us revisit the Address class. Using CommonString, we can describe the class as follows:

  class Address
      {
      CommonString line1;
      CommonString line2;
      CommonString city;
      CommonString state;
      CommonString zipcode;
      }

CommonStrings can contain any characters, just like an int can contain any integer values (within hardware limits). In Address, some fields are definitely not CommonStrings. A state is not a CommonString. Only certain values represent a valid state. For U.S. postal addresses, if we use abbreviations to represent the state, the abbreviation must appear on the U.S.

Postal Service’s official list of abbreviations. So the state should be declared as a data type called State. This data type can provide an appropriate validation mechanism. That mechanism can check to see that a string is in the official list, or it can supply a set of strings of all official abbreviations for use in a drop-down display box.

A U.S. Zip Code is not just a CommonString either. You can describe it as a NumericString data type (e.g., one with all digits), as a FormattedString data type (one with five digits plus a dash plus four digits), or as a ZipCode data type. If any combination of digits was valid, using NumericString or FormattedString might be appropriate. However, declaring the attribute as a ZipCode type allows us to abstract away its actual representation.

Do not combine classes simply for implementation purposes. You can define both SocialSecurityNumbers and PhoneNumbers as strings of digits with two dashes. That does not make them equivalent–that is just accidental cohesion. They are two distinctly different classes with different validation. A phone number in the U.S. cannot begin with a 1 or a 0. Certain ranges of Social Security numbers are not used. These numbers can use the same type of formatted string for input or display purposes, but the semantics of each class are entirely different. You would never send a Social Security number to be dialed, nor would you attempt to record payroll taxes against a phone number. (All right, someone at some time will come up with a counterexample, so perhaps I should never say "never.")

Much data that might be a CommonString can be assigned its own data type. For example, filenames are usually typed as strings, but they cannot contain certain characters. In the Windows world, you cannot have any of the following characters in a filename: ,/,:,*,",?,<, >, or |. A FileName data type can represent a filename and enforce this limitation. An advantage of using a data type becomes apparent in graphical user interface (GUI) development. For instance, the user interface code could recognize the FileName data type and automatically insert a browse button next to a text field.

On the Web, parameter values for the Hypertext Transfer Protocol (HTTP) commands GET and PUT use encoded strings. Characters that are not alphabetic or numeric are encoded using their hexadecimal values. The encoded string is sent to the web server. Although the unencoded string and the encoded string are both implemented with strings, they are different. You can have invalid encoded strings–ones with unencoded punctuation, such as a hacker might send to a server. You could use an EncodedWebString class to represent strings on a server. If an input were not validated as an EncodedWebString, it would be rejected.

MOST STRINGS ARE MORE THAN JUST A STRING

Treat String as a primitive data type. Describe attributes with abstract data types, instead of as Strings.

{mospagebreak title=Constant Avoidance}

Similar to the way in which most strings are more than strings, most constant values are more than just constants. A constant value can usually be assigned a name that denotes the meaning of that constant. Avoid using the explicit value in a specification or executable code.* Declare the value as a constant and use the name of that constant in the document or the code. 

If Sam mentions that the late fee for a rental is $3, I create a constant:

  Dollar RENTAL_LATE_FEE = 3.00;

When reading the relevant documents later on, I need not concentrate on the actual value, only on the assigned name. Suppose this value was not transformed into a constant and the value of 3.00 was used frequently in the documents for other purposes. If I went searching for it, I would have to examine each appearance carefully to see if it was a reference to the rental late fee or to some other value.

You might not get rid of every constant value. The value 0 often appears in initializing variables or setting the initial index for an array. There is little to be gained by creating a named constant for zero.

If the value that a name represents is subject to change, the value should be kept with a configuration mechanism. In that case, the code would use the symbolic name to look up the configured value. The configuration mechanism could use an XML configuration file, a database table, or another form of persistence to store the values. For example, RENTAL_ LATE_FEE is probably something that should exist in a configuration file rather than a con-

NEVER LET A CONSTANT SLIP INTO CODE

Use a symbolic name for all values.

{mospagebreak title=Prototypes Are Worth a Thousand Words}

It is often said that a picture is worth a thousand words. A prototype is like a picture. A user interface described in text is often harder for the customer to visualize than the same interface described with a diagram or picture. Use cases can provide excellent textual descriptions. A prototype (or screen mockup) gives a more concrete perspective on a program’s intended operation. The prototype can spark feedback from the client in both the program’s operation and in missing requirements.


DON’T THROW AWAY INFORMATION

When I was presenting these prefactoring guidelines in a talk at the Software Development Conference, Jerry Weinberg made an interesting observation about some of the guidelines in this chapter. He stated that they revolve around the central principle of not throwing away information. For example, describing a price as a double, rather than as a Dollar, decreases the information about the price. Lumping a group of concepts into a single class, rather than splitting them into multiple classes, hides information. Now, if I only could have convinced my mother that my comic book collection was really information…


One of the dangers of making a perfect-looking GUI for a prototype is that the interface represents the program to the user. If the interface is complete, the user might expect that the system is almost complete. Some user interface experts suggest that interfaces be designed using whiteboards or Post-it notes. If you are programming in Java, you can use the Napkin Look and Feel (http://napkinlaf.sourceforge.net/). Tim and I created a rough-draft prototype of the screens for the uses cases we worked on with Sam (Figure 2-1). We went


Figure 2-1.  Rental screens

over it with Sam. The cases are simple, so he had no changes in its interface. He did note that the buttons should use a large font so that he could read them without his glasses.

PROTOTYPES ARE WORTH A THOUSAND WORDS

A picture of an interface, such as a screen, can be more powerful than just a description.

 


 

* For further information on use cases, see Writing Effective Use Cases by Alistair Cockburn (Addison-Wesley Professional, 2000).

* Finding existing solutions can be problematic. Sometimes it can be hard to describe the solution you seek in such a way that Google™ can find a match.

† See Software Tools by Brian W. Kernighan (Addison-Wesley Professional, 1976) for the earliest discussion I have found on the issue of using tools to create solutions.

* See http://www.literateprogramming.com/ for a discussion of names.

* Not all objects have behavior. Objects that contain just data (sometimes called data transfer objects) are useful in interfacing with GUI classes and passing as objects between networked systems. Data
transfer objects are covered in Chapter 7.

† One reviewer notes that clumping without a responsibility definition for the class can lead to datapolluted classes. Clumping should also involve assigning operations to the clumped concept.

* Once upon a time, Montana had no speed limit other than “reasonable speed.” The upper limit still could be a reasonable number (e.g., 200).

† For an article on strong testing with languages that have implicit typing, see http://www.artima.com/
weblogs/viewpost.jsp?thread=4639.

* Michael Green, a reviewer, called this principle “No magic numbers!” after having to deal with numbers that could not be changed or removed (without negative side effects in apparently unrelated
code). He could find no one who knew what they were for or why they were there.

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye