An SCM system does exactly what the name implies; it helps you manage source code and related resources. To appreciate why SCM is important, consider for a moment what life without it might be like. Assume that you are a developer working on a new project that will consist of C++ sources ﬁles, a Makeﬁle, and perhaps some resources such as icons or images. Obviously, these items must live somewhere, and so on day one of the project, you create a directory on your local desktop system where these ﬁles will live, and write a few hundred lines of code, storing the results in that directory. After a few weeks of hacking, you come up with version 0.1 of your creation, and after some light testing, you decide that your creation is ready to be pushed out onto the Web, with the hope of generating some user feedback from a small community of users.
After a few weeks, your e-mail inbox has accumulated several feature requests from users, and perhaps a dozen or so bug reports (in addition to a ton of spam because you gave out your e-mail address to the public). You get busy implementing some of these features and ﬁxing the worst of the bugs, and after a few more weeks, you are ready to post version 0.2. This process repeats itself for a few months, and before you know it, you are shipping version 1.0 to an even wider audience.
The 1.0 release is a success, but isn’t without its share of problems. First off, users are beginning to report a nasty bug in a feature that was ﬁrst introduced in version 0.8, and was working ﬂawlessly until version 1.0 was released to the public. You are able to duplicate the bug in a release version of the 1.0 binary, but can’t duplicate it in a debugger. In an attempt to understand the problem, you pour over the code in search of clues; but after numerous hours of looking, you realize that you have no idea what might have caused this bug to surface. About the only way you can think of identifying the cause of the problem is to determine what speciﬁc changes you made to the codebase that might have led to the bug’s manifestation. However, all you have to work with is the 1.0 source code, and you have no way of identifying the changes you made between 0.9 and 1.0.
The second problem before you is a request. It turns out that you removed a feature that was present in version 1.0, and removing the feature has angered a lot of your users, who are clamoring for it to be reinstated in version 1.1. However, the code for this feature no longer exists, having been deleted from the source code long ago.
It is these two situations that, in my experience, make the use of a source code management system a necessity, cross-platform or not, no matter how large the project is or how many developers are involved. A good source code management system will allow you to re-create, in an instance, a snapshot of the source tree at some point in the past, either in terms of a speciﬁc date and time, or in terms of a speciﬁc release version. It will also help you to keep track of where and when changes to the source code have been made, so that you can go back and isolate speciﬁc changes related to a feature or a bug ﬁx.
Had the developer used a source control management system, he or she could have retrieved versions of the source code starting at 0.9 and used this to isolate exactly what change(s) to the source caused the bug to ﬁrst surface. And, to retrieve the source code for the feature that was removed in 1.0, the developer could have used the source code management system to retrieve the code associated with the feature, and undo its removal (or reengineer it back into the current version of the source code).
The beneﬁts of a source code management system increase signiﬁcantly as soon as multiple developers are assigned to a project. The main beneﬁts from the point of view of a multideveloper project are accountability and consistency. To see how these beneﬁts are realized, I need to describe in more detail how a source code management system works. Earlier, I described how a developer will typically manage a body of source code, in the absence of a source code management system, in a directory, which is usually created somewhere on the developer’s system where a single copy of the source is stored and edited. The use of a source code management system changes things dramatically, however. When using a source code management system, the source code for a project is maintained in something called a repository, which you can think of as being a database that stores the master copy of a project’s source code and related ﬁles. To work with the project source code, a developer retrieves a copy of the source code from the repository. Changes made to the local copy of the source code do not affect the repository. When the developer is done making changes to his or her copy of the source, he or she submits the changes to the source code management system, which will update the master copy maintained in the repository. It is important to realize that the repository records only the changes made to the ﬁle, instead of a complete copy of the latest version.
By storing only changes, you can easily retrieve earlier versions of ﬁles stored in the repository. The date of the change, the name or the ID of the developer who made the change, and any comments provided by the developer along with the change are all stored in the repository along with the change itself. The implications for developer accountability should be obvious; at any time, you can query the source code management system for a log of changes, when they were made, and by whom. This is a great help in locating the source of bugs and who may have caused them.
The ability to attribute a change in the repository to a developer, bug, or feature is directly affected by the granularity of check-ins made by developers on the project. Frequent, small changes to the repository will increase the ability of a developer to use the source control management system to identify and isolate change, and will also help ensure that other developers on the project gain access to the latest changes in a timely manner. A good rule of thumb is to limit the number of bugs ﬁxed by a check-in to the repository to one (unless there are multiple, related bugs ﬁxed by the same change).
So, now that you know the basic ideas being using an SCM, let’s talk brieﬂy about the implications to portability. First off, using an SCM is not a magic pill that makes your project portable. Portability requires attention to a lot more than just a source code management system to happen. (If that were not the case, this book would not need to be written.) But, using a source code management system that is available on each of the platforms that your organization is supporting (or plans to support) is, in my view, a critical part of any successful cross-platform project. It does no one any good if only Windows developers are able to pull source code, but Linux and Macintosh developers are left without a solution, after all. Not only should the SCM software be available everywhere, it at least should support a “lowest common denominator” user interface that behaves the same on all platforms, and to me, that means that the user interface needs to be command line based (both CVS and Subversion [SVN] support a command-line interface).
Because cross-platform availability and a common user interface are requirements, there are only two choices for an SCM system that I can see at the time of writing this book: CVS and SVN. At Netscape, and at countless other places (open source or not), CVS is the SCM of choice. It has stood the test of time, and is capable. It has been ported nearly everywhere, and its user interface is command line based. A very close cousin of CVS is SVN. After using SVN in a professional project, I have come to the conclusion that for the programmers using it, SVN is quite similar to CVS in terms of how one approaches it and the commands that it offers, so either would be a good choice. (It is not without its quirks, however.) In this book, when I refer to an SCM, I am referring to CVS, but I could have easily said the same thing about SVN.
Besides providing a location from which Tinderbox can pull sources (see Item 12) and its support for Windows, Mac OS X, and Linux, perhaps the most important contribution of CVS to cross-platform development is its ability to create diff (or patch) ﬁles. The implications to cross-platform development of patch ﬁles are detailed in Item 14; in the following paragraphs, I describe what a patch ﬁle is and how CVS can be used to create a patch ﬁle.
A diff ﬁle, or a patch, is created by executing the cvs diff command. For example, assume I have added a method called GetAlignment() to a ﬁle named nsLabel.h in the Mozilla source tree. By typing cvs diff, I can easily identify the lines containing changes that I made:
$ cvs diff
The preceding output tells us that a line was added around line 61 of the ﬁle nsLabel.h, and one was removed around line 71 of the ﬁle. I can take this output, mail it to others on my team, and ask them to review it for errors or comments before checking in the changes. I can also look at this patch and make sure that it contains only those changes that I intended to land in the repository. I can’t stress how important cvs diff is as a tool for identifying inadvertent check-ins before they are made.
With a lot of changes, the default output format that is shown here can be difﬁcult to understand. A better output would show context lines, and make it more obvious which lines were added to the source, and which lines were deleted. The -u argument to cvs diff causes it to generate a “uniﬁed” diff, as follows:
$ cvs diff -u
NS_IMETHOD PreCreateWidget(nsWidgetInitData *aInitData);
@@ -68,7 +69,6 @@
The differences in the output are the inclusion of context lines before and after the affected lines, and the use of + and -to indicate lines that have been added, or removed, respectively, from the source. This format is generally much easier on everyone who must read the patch, and it is the format that I recommend you use. You can change the number of lines of context generated by cvs diff by appending a count after the –u argument. For example, to generate only one line of context, issue the following command:
$ cvs diff -u1
@@ -70,3 +71,2 @@
Generally, you’ll want to generate somewhere between three or ﬁve lines of context for patches of moderate complexity. I use -u3 almost religiously, and it is the default number of lines for svn diff (which does context diffs by default, too). However, don’t be surprised if developers working with your patch ﬁles ask for more lines of context.
Please check back for the continuation of this article.
blog comments powered by Disqus