The Future of SQL

In this article, Vikram Vaswani discusses the ways in which SQL plays an important role in the computer market today, and what may be in store for this database language in the future. This excerpt comes from chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004).

SQL and SQL-based relational databases are one of the most important foundation technologies of the computer market today. From its first commercial implementation about two decades ago, SQL has grown to become the standard database language. In its first decade, the backing of IBM, the blessing of standards bodies, and the enthusiastic support of DBMS vendors made SQL a dominant standard for enterprise-class data management. In its second decade, the dominance of SQL extended to personal computer and workgroup environments and to new, database-driven market segments, such as data warehousing. In the early part of its third decade, SQL stands as the standard database technology for Internet-based computing. The market evidence clearly shows the importance of SQL:

  • The world’s second-largest software company, Oracle, has been built on the success of SQL-based relational data management, through both its flagship database servers and tools and its SQL-based enterprise applications.
  • IBM, the world’s largest computer company, offers its SQL-based DB2 product line as a common foundation across all of its product lines and for use on competitor’s systems as well, and has expanded its commitment to SQL with the acquisition of Informix’s SQL DBMS.

  • Microsoft, the world’s largest software company, uses SQL Server as a critical part of its strategy to penetrate the enterprise computing market with server editions of its Windows operating systems, and a key part of its .NET architecture for delivering Internet web services.

  • Every significant database company offers either a SQL-based relational database product or SQL-based access to its nonrelational products.

  • All of the major packaged enterprise applications (Enterprise Resource Planning (ERP), Supply Chain Management (SCM), Human Resource Management (HRM), Sales Force Automation (SFA), Customer Relationship Management (CRM), and so on) are built on SQL-based databases.

  • SQL is emerging as a standard for specialized databases in applications ranging from data warehousing to mobile laptop databases to embedded applications in telecomm and data communications networks.

  • SQL-based access to databases is an integral feature of Windows, available on the vast majority of personal computer systems, and it is a built-in capability of popular PC software products such as spreadsheets and report writers.

  • SQL-based access to databases is a standard part of Internet application servers, required by the J2EE specification.

This chapter describes some of the most important current trends and developments in the database market, and projects the major forces acting on SQL and database management over the next several years.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Database Market Trends}

Today’s market for database management products exceeds $12 billion per year in products and services revenues, up from about $5 billion per year a decade ago. On several occasions over the last decade, lower year-over-year growth in the quarterly revenues of the major database vendors has led analysts to talk about a maturing database market. Each time, a wave of new products or new data management applications has returned the market to double-digit growth. Client/server architecture, ERP applications, data warehousing and business intelligence, three-tier web site architectures—each of these spurred a new wave of database technology and a new wave of SQL-based database deployments. If the history of the last two decades is any indication, database technology will continue to find new applications and generate increasing revenues for years to come. The trends shaping the market bode well for its continued health and point to a continuing tension between market maturity and consolidation on the one hand, and exciting new database capabilities and applications on the other.

Enterprise Database Market Maturity

Relational database technology has become accepted as a core enterprise data processing technology, and relational databases have been deployed by virtually all large corporations. Because of the importance of corporate databases and years of experience in using relational technology, many, if not most, large corporations have selected a single DBMS brand as an enterprisewide database standard. Once such a standard has been established and widely deployed within a company, there is strong resistance to switching brands. Even though an alternative DBMS product may offer advantages for a particular application or may pioneer a new, useful feature, an announcement by the standard vendor that such features are planned for a future release will often forestall the loss of a customer by the established vendor.

The trend to corporate database standards has tended to reinforce and strengthen the market positions of the established major DBMS vendors. The existence of large direct sales forces, established customer support relationships, and multiyear volume purchase agreements has become as important as, or more important than, technology advantage. With this market dynamic, the large established players tend to concentrate on growing their business within their existing installed base instead of attempting to take customers away from competitors. In the late 1990s, industry analysts saw and predicted this tendency at both Informix and Sybase. Oracle, with a much larger share of the market, was forced to aggressively compete for new accounts in its attempt to maintain its database license revenue growth. Microsoft, as the upstart in the enterprise database market, was cast in the role of challenger, attempting to leverage its position in workgroup databases into enterprise-level prototypes and pilot projects as a way to pry enterprise business away from the established players.

One important impact of the trend to corporate DBMS vendor standardization has been a consolidation in the database industry. New startup database vendors tend to pioneer new database technology and grow by selling it to early adopters. These early adopters have helped to shape the technology and identified the solution areas where it can deliver real benefits. After a few years, when the advantages of the new technology have been demonstrated, the startup vendors are acquired by large established players. These vendors can bring the new technology into their installed base, and bring their marketing and sales muscle to bear in an attempt to win business in their competitor’s accounts. The early 1990s saw this cycle play out with database vendor acquisitions of database tools vendors. In the late 1990s, the same cycle applied to mergers and acquisitions of database vendors. Informix’s purchase of Illustra (a pioneering object-relational vendor), Red Brick (a pioneering data warehousing vendor), and Cloudscape (a pioneering pure Java database vendor) are three examples of the pattern. Just a few years later, Informix itself was acquired by IBM, continuing this particular chain of consolidation.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Market Diversity and Segmentation}

Despite the maturing of some parts of the database market (especially the market for corporate enterprise-class database systems), it continues to develop new segments and niches that appear and then grow rapidly. For much of the 1990s, the most useful way to segment the database market has been based on database size and scale—there were PC databases, minicomputer databases, mainframe databases, and later, workgroup databases. Today’s database market is much more diverse and is more accurately segmented based on target application and specialized database capabilities to address unique application requirements. Market segments that have appeared and have experienced high growth include:

  • Data warehousing databases, focused on managing thousands of gigabytes of data, such as historical retail purchase data.

  • Online analytic processing (OLAP) and relational online analytic processing (ROLAP) databases, focused on carrying out complex analyses of data to discover underlying trends (data mining), allowing organizations to make better business decisions.

  • Mobile databases, in support of mobile workers such as salespeople, support personnel, field service people, consultants, and mobile professionals. Often, these mobile databases are tied back to a centralized database for synchronization.

  • Embedded databases, which are an integral, transparent part of an application sold by an independent software vendor (ISV) or a value-added reseller (VAR). These databases are characterized by small footprints and very simple administration.

  • Microdatabases, designed for appliance-type devices, such as smart cards, network computers, smart phones, and handheld PCs and organizers.

  • In-memory databases, designed for ultra-high-performance OLTP applications, such as those embedded in telecomm and data communications networks and used to support customer interaction in very high-volume Internet applications.

  • Clustered databases, designed to take advantage of powerful, low-cost servers used in parallel to perform database management tasks with high scalability and reliability.

Packaged Enterprise Applications

A decade or two ago, the vast majority of corporate applications were developed in-house by the company’s information systems department. Decisions about database technology and vendor standardization were part of the company’s IS architecture planning function. Leading-edge companies sometimes took a risk on new, relatively unproven database technologies in the belief that they could gain competitive advantage by using them. Sybase’s rise to prominence in the financial services sector during the late 1980s and early 1990s is an example.

Today, most corporations have shifted from make to buy strategies for major enterprisewide applications. Examples include ERP applications, SCM applications, HRM applications, SFA applications, CRM applications, and others. All of these areas are now supplied as enterprise-class packaged applications, along with consulting, customization, and installation services, by groups of software vendors. Several of these vendors have reached multihundred-million-dollar annual revenues. All of these packages are built on a foundation of SQL-based relational databases.

The emergence of dominant purchased enterprise applications has had a significant effect on the dynamics of the database market. The major enterprise software package vendors have tended to support DBMS products from only two or three of the major DBMS vendors. For example, if a customer chooses to deploy SAP as its enterprisewide ERP application, the underlying database is restricted to those supported by the SAP packages. This has tended to reinforce the dominant position of the current top-tier enterprise database players and make it more difficult for newer database vendors. It has also tended to lower average database prices, as the DBMS is viewed more as a component part of an application-driven decision rather than a strategic decision in its own right.

The emergence of packaged enterprise software has also shifted the relative power of corporate IS organizations and the packaged software vendors. The DBMS vendors today have marketing and business development teams focused on the major enterprise application vendors to ensure that the latest versions of the applications support their DBMS and to support performance tuning and other activities. The largest independent DBMS vendor, Oracle Corporation, is playing both roles, supplying both DBMS software and major enterprise applications (that run on the Oracle DBMS, of course). Oracle’s single-vendor approach has created some considerable tension between Oracle and the largest enterprise applications vendors, especially in the ranks of their field sales organizations. Some industry analysts attribute the growing DBMS market share of IBM and Microsoft to a tendency for enterprise application vendors to steer prospective customers away from Oracle’s DBMS products as a result.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Hardware Performance Gains}

One of the most important contributors to the rise of SQL has been a dramatic increase in the performance of relational databases. Part of this performance increase was due to advances in database technology and query optimization. However, most of the DBMS performance improvement came from gains in the raw processing power of the underlying computer systems, and changed in the DBMS software designed to capitalize on those gains. While the performance of mainframe systems steadily increased, the most dramatic performance gains have been in the UNIX-based and Windows-based server markets, where processing power has doubled or more year by year.

Some of the most dramatic advances in server performance come from the growth of symmetric multiprocessing (SMP) systems, where two, four, eight, or even dozens of processors operate in parallel, sharing the processing workload. A multiprocessor architecture can be applied to OLTP applications, where the workload consists of many small, parallel database transactions. Traditional OLTP vendors, such as Tandem, have always used a multiprocessor architecture, and the largest mainframe systems have used multiprocessor designs for more than a decade. In the 1990s, multiprocessor systems became a mainstream part of the UNIX-based server market, and somewhat later, an important factor at the high end of the PC server market.

With Intel’s introduction of multiprocessor chipsets, SMP systems featuring two-way and four-way multiprocessing achieved near-commodity status in the LAN server market, and were available for well under $10,000. In the midrange of the UNIX-based server market, database servers from Sun, Hewlett-Packard, and IBM routinely have 8 or 16 processors and sell in the hundred-thousand-dollar price range. High-end UNIX servers today can be configured with more than 100 processors and tens of gigabytes of main memory. These systems, which rival the computing power of traditional mainframes, carry multimillion-dollar price tags.

SMP systems also provided performance benefits in decision support and data analysis applications. As SMP servers became more common, the DBMS vendors invested in parallel versions of their systems that were able to take the work of a single complex SQL query and split it into multiple, parallel paths of execution. When a DBMS with parallel query capabilities is installed on a four-way or eight-way SMP system, a query that might have taken two hours on a single-processor system can be completed in less than an hour. Companies are taking advantage of this hardware-based performance boost in two ways: either by obtaining business analysis results in a fraction of the time previously required or by leaving the timeframe constant and carrying out much more complex and sophisticated analysis.

Operating system support for new hardware features (such as multiprocessor architectures) has often lagged the availability of the hardware capabilities—often by several quarters or even years. This has posed a special dilemma for DBMS vendors, who need to decide whether to bypass the operating system in an attempt to improve database performance. The Sybase DBMS, for example, when originally introduced, operated as a single process and took responsibility for its own task management, event handling, and input/output—functions that are usually handled by an operating system such as UNIX or VMS. In the short term, this gave Sybase a major performance advantage over rival DBMS products with less parallel processing capability.  

But when operating system SMP support arrived, many of its benefits were automatically available to rival systems that had relied on the operating system for task management, while Sybase had the continuing burden of extending and enhancing its

low-level performance-oriented software. This cycle has played out for SMP designs, with major database vendors now relying on operating systems for thread support and SMP scaling. But the same trade-offs continue to apply to new hardware features as they appear and require explicit strategic decisions on the part of the DBMS vendors.

Today, the quest for higher and higher database performance certainly shows no signs of stopping. With today’s highest-performance servers featuring hundreds of multigigahertz processors, hardware advances have more than overcome the higher overhead of the relational data model, giving it performance equal to, or better than, the best nonrelational databases of the past. At the same time, of course, the demand for higher and higher transaction rates against larger and larger databases continues to grow. At the top end of the database market, it appears that one can never have too much database performance.

Database Server Appliances

Another hardware-based market trend in the 1980s and early 1990s was the emergence of companies that combined high-performance microprocessors, fast disk drives, and multiprocessor architectures to build dedicated systems that were optimized as database servers. These vendors argued that they could deliver much better database performance with a specially designed database engine than with a general-purpose computer system. In some cases, their systems included application-specific integrated circuits (ASICs) that implement some of the DBMS logic in hardware for maximum speed. Dedicated database systems from companies such as Teradata and Sharebase (formerly Britton-Lee) found some acceptance in applications that involve complex queries against very large databases. However, they have not become an important part of the mainstream database market, and these vendors eventually disappeared or were acquired by larger, general-purpose computer companies.

Interestingly, the notion of a packaged, all-in-one database server appliance was briefly rekindled at the end of the 1990s by Oracle Corporation and its CEO, Larry Ellison. Ellison argued that the Internet era had seen the success of other all-in-one products, such as networking equipment and web cache servers. Oracle announced partnerships with several server hardware vendors to build Oracle-based database appliances. Over time, however, these efforts had little market impact, and Oracle’s enthusiasm for database appliances faded from media attention.

Several venture-backed startups have recently embraced the idea of database server appliances once again, this time in the form of database caching servers that reside in a network between the application and an enterprise database. These startups point to the widespread success of web page caching within the Internet architecture, and posit a similar opportunity for data caching. Unlike web pages, however, database contents tend to have an inherent transactional character, which makes the synchronization of cache contents with the main database both much more important (to insure that requests satisfied by the database cache come up with the right response) and much more difficult. Whether the notion of a database caching appliance will catch on or not remains an open question as of this writing. 

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Benchmark Wars}

As SQL-based relational databases have moved into the mainstream of enterprise data processing, database performance has become a critical factor in DBMS selection. User focus on database performance, coupled with the DBMS vendors’ interest in selling high-priced, high-margin, high-end DBMS configurations, has produced a series of benchmark wars among DBMS vendors. Virtually all of the DBMS vendors have joined the fray at some point over the last decade. Some have focused on maximum absolute database performance. Others emphasize price/performance and the cost-effectiveness of their DBMS solution. Still others emphasize performance for specific types of database processing, such as OLTP or OLAP. In every case, the vendors tout benchmarks that show the superior performance of their products while trying to discredit the benchmarks of competitors.

The early benchmark claims focused on vendor-proprietary tests, and then on two early vendor-independent benchmarks that emerged. The Debit/Credit benchmark simulated simple accounting transactions. The TP1 benchmark, first defined by Tandem, measured basic OLTP performance. These simple standardized benchmarks were still easy for the vendors to manipulate to produce results that cast them in the most favorable light.

In an attempt to bring more stability and meaning to the benchmark data, several vendors and database consultants banded together to produce standardized database benchmarks that would allow meaningful comparisons among various DBMS products. This group, called the Transaction Processing Council, defined a series of official OLTP benchmarks, known as TPC-A, TPC-B, and TPC-C. The Council has also assumed a role as a clearinghouse for validating and publishing the results of benchmarks run on various brands of DBMS and computer systems. The results of TPC benchmarks are usually expressed in transactions per minute (e.g., tpmC), but it’s common to hear the results referred to simply by the benchmark name (e.g., “DBMS Brand X on hardware Y delivered 10,000 TPC-Cs”).

The most recent TPC OLTP benchmark, TPC-C, attempts to measure not just raw database server performance, but the overall performance of a client/server configuration. Modern multiprocessor workgroup-level servers are delivering thousands or tens of thousands of transactions per minute on the TPC-C test. Enterprise-class UNIX-based SMP servers are delivering multiple tens of thousands of tpmC. The maximum results on typical commercially available systems (a multimillion-dollar 64-bit Alpha processor cluster) exceed 100,000 tpmC.

The Transaction Processing Council has branched out beyond OLTP to develop benchmarks for other areas of database performance. The TPC-D benchmark focuses on data warehousing applications. The suite of tests that comprise TPC-D are based on a database schema typical of warehousing environments, and they include more complex data analysis queries, rather than the simple database operations more typical of OLTP environments. Interestingly, the TPC benchmarks specify that the size of the database must increase as the claimed number of transactions per minute goes up. A TPC benchmark result of 5000 tpmC may reflect results on a database of hundreds of megabytes of data, for example, while a result of 20,000 tpmC on the same benchmark may reflect a test on a multigigabyte database. This provision of the TPC benchmarks is designed to add more realism to the benchmark results since the size of database and computer system needed to support an application with demands in the 5000 tpm range is typically much smaller than the scale required to support an application with 20,000 tpm demands.

In addition to raw performance, the TPC benchmarks also measure database price/ performance. The price used in the calculation is specified by the council as the five-year ownership cost of the database solution, including the purchase price of the computer system, the purchase price of the database software, five years of maintenance and support costs, and so on. The price/performance measure is expressed in dollar-per-TPC (e.g., “Oracle on a Dell four-way server broke through the $500-per- TPC-C barrier”). While higher numbers are better for transactions-per-minute results, lower numbers are better for price/performance measures.

Over the last several years, vendor emphasis on TPC benchmark results have waxed and waned. The existence of the TPC benchmarks, and the requirement that published TPC results be audited, have added a level of integrity and stability to benchmark claims. It appears that benchmarking and performance testing will be part of the database market environment for some time to come. In general, benchmark results can help with matching database and hardware configurations to the rough requirements of an application. On an absolute basis, small advantages in benchmark performance for one DBMS over another will probably be masked by other factors.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=SQL Standardization}

The adoption of an official ANSI/ISO SQL standard was one of the major factors that secured SQL’s place as the standard relational database language in the 1980s. Compliance with the ANSI/ISO standard has become a checkoff item for evaluating DBMS products, so each DBMS vendor claims that its product is compatible with or based on the ANSI/ISO standard. Through the late 1980s and early 1990s, all of the popular DBMS products evolved to conform to the parts of the standard that represented common usage. Other parts, such as the module language, were effectively ignored. This produced slow convergence around a core SQL language in popular DBMS products.

As discussed in Chapter 3, the SQL1 standard was relatively weak, with many omissions and areas that are left as implementation choices. For several years, the standards committee worked on an expanded SQL2 standard that remedies these weaknesses and significantly extends the SQL language. Unlike the first SQL standard, which specified features that were already available in most SQL products, the SQL2 standard, when it was published in 1992, was an attempt to lead rather than follow the market. It specified features and functions that were not yet widely implemented in current DBMS products, such as scroll cursors, standardized system catalogs, much broader use of subqueries, and a new error message scheme. DBMS vendors are still in the process of evolving their products to support the full features of SQL2. In practice, proprietary extensions (such as enhanced support for multimedia data or stored procedures or object extensions) have often been more important to a DBMS vendor’s success than higher levels of SQL2 compliance.

The progress of the SQL standards groups continued, with work on a SQL3 standard begun even before the SQL2 standard was published. As delays set in and the number of different areas to be addressed by the next standard grew, the work on SQL3 was divided into separate, parallel efforts, focused on the core of the language, a Call-Level Interface (CLI), persistent stored modules (stored procedures), distributed transaction capabilities, time-based data, and so fourth. Some of these efforts were published a few years later as enhancements to the 1992 SQL2 standard. A SQL2-compatible CLI standard was released in 1995, as SQL-CLI. A year later, in 1996, a standardized stored procedure capability was released as SQL-PLM. In 1998, object language bindings for SQL were standardized in the SQL-OLB specification. A basic set of OLAP capabilities were published in a SQL-OLAP standard in 2000.

While progress continued on these additions to the SQL2 standard, the work on the core language part of SQL3 (called the foundation part of the standard) focused on how to add object capabilities to SQL2. This quickly became a very controversial activity. Relational database theorists and purists took a strong stand against many of the proposed extensions. They claimed that the proposals confuse conceptual and architectural issues (e.g., adding substructure beyond the row/column tables) with implementation issues (e.g., performance issues of normalized databases and multitable joins). Proponents of the proposed SQL3 object extensions pointed to the popularity of object-oriented programming and development techniques, and insist that the rigid row/column structure of relational databases must be extended to embrace object concepts or it would be bypassed by the object revolution. Their argument was bolstered in the marketplace as the major relational DBMS vendors added object-oriented extensions to their products, to blunt the offensive from pure object-oriented databases, and were largely successful with this strategy.

The controversy over the SQL3 work was finally resolved after a seven-year effort, with the publication of the SQL:1999 standard. (The term SQL3, which was used during the development of the standard, has now been replaced by the official term SQL:1999.) The SQL:1999 standard is structured in a series of parts:  

  • Part 1: Framework. Describes the overall goals and structure of the standard, and the organization of its other parts.

  • Part 2: Foundation. Is the main body of the standard, focused on the SQL language itself. SQL statements and clauses, transaction semantics, database structure, privileges, and similar capabilities are specified here. This part also contains the object-oriented extensions to SQL.

  • Part 3: Call-Level Interface. Contains the SQL-CLI (1995) extensions to the SQL-92 standard, updated to conform to SQL:1999.

  • Part 4: Persistent Stored Modules. Similarly contains the SQL-PSM (1996) extensions to the SQL-92 standard, updated to conform to SQL:1999.

  • Part 5: Host Language Bindings. Deals with the interactions between procedural host languages (such as C or COBOL) and SQL.

  • Part 9: Management of External Data. Describes how a SQL-based database should manage data external to the database itself.

  • Part 10: Object Language Bindings. Deals with the same issues as Part 5, but for object-oriented languages.

Some parts of the standard are still under development at this writing, as indicated by the missing part numbers. In addition, other SQL-related standardization efforts have broken off into their own, separate standards activities. A separate standard is under development for SQL-based handling of multimedia data, such as full-text documents, audio, and video content. This is itself a multipart standard; some parts have already been published. Another separate standard makes official the embedded SQL for Java work known as SQLJ.

In the progression from SQL1 to SQL2, and then to SQL:1999, the official ANSI/ISO SQL standards have ballooned in scope. The original SQL1 standard was less than 100 pages; the Framework section (Part 1) of the SQL:1999 standard alone is nearly that large. The Foundation section of the SQL:1999 standard runs well over 1000 pages, and the currently published parts, taken together, run over 2000 pages. The broadly expanded scope of the SQL:1999 standard reflects the wide usefulness and applicability of SQL, but the challenge of implementing and conforming to such a voluminous set of standards is very formidable, even for large DBMS vendors with large development staffs.

It’s worth noting that the SQL:1999 standard takes a very different approach to standards conformance claims than the SQL1 and SQL2 standards. The SQL2 standard defined three levels of conformance, Entry, Intermediate, and Full, and laid out the specific features of the standard that must be implemented to claim conformance at each level. In practice, DBMS vendors found some features at each level to be important to their customers, and others relatively unimportant. So virtually all current SQL implementations claim some form of compliance with SQL2, but very few, if any, implement all of the features required for formal Intermediate or Full conformance.

With this experience in mind, the SQL:1999 standards group instead defined only one Core SQL level of conformance, which corresponds roughly to the Entry level of SQL2 plus selected features from the Intermediate and Full levels. Beyond this Core SQL, additional features are grouped together in packages, to which conformance can individually be claimed. There is a package for the SQL-CLI capabilities, one for SQLPSM, one for enhanced data integrity functions, one for enhanced date and time functions, and so on. This structure allows individual DBMS vendors to pick and choose the areas of the standard that are most important to the particular markets they serve, and makes conformance to parts of the standard more practical.

At this writing, the SQL:1999 standard is too new to fully gauge its impact on the DBMS market. If the experience with SQL2 is any guide, vendors will carefully evaluate individual new pieces of SQL:1999 functionality and seek feedback from their customer base about which ones are useful. With the very large new functionality required by SQL:1999 features such as user-defined types and recursive queries, implementation of some parts of SQL:1999 will be a multiyear project for even the largest DBMS vendors. In practice, the SQL1 (SQL-89) standard defines the core SQL capabilities supported by virtually all products; the SQL2 (SQL-92) standard represents the current state of the art in large enterprise database products, and the SQL:1999 standard is a roadmap for future development.

In addition to the official SQL standard, IBM’s and Oracle’s SQL products will continue to be a powerful influence on the evolution of SQL. As the developer of SQL and a major influencer of corporate IS management, IBM’s SQL decisions have always had a major impact on other vendors of SQL products. Oracle’s dominant market position has given it similar clout when it has added new SQL features to its products. When the IBM, Oracle, and ANSI SQL dialects have differed in the past, most independent DBMS vendors have chosen to follow the IBM or Oracle standards.

The likely future path of SQL standardization thus appears to be a continuation of the history of the last several years. The core of the SQL language will continue to be highly standard. More features will slowly become a part of the core, and will be defined as add-on packages or new standards in their own right. Database vendors will continue to add new, proprietary features in an ongoing effort to differentiate their products and offer customers a reason to buy.

Remember: this is chapter three of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=SQL in the Next Decade}

Predicting the path of the database market and SQL over the next five to ten years is a risky proposition. The computer market is in the midst of a major transition into an Internet-driven era. The early stages of that era, dominated by the World Wide Web and user/browser interaction, are giving way to a ubiquitous Internet used to deliver all communication services, information services, and e-business interaction. The emergence of the PC and its creation of the client/server era of the 1980s and 1990s illustrates how shifts in the underlying computer systems market can produce major changes in data management architectures. It’s likely that the Internet will have at least as large, if not a larger, impact on the data management architectures of the next ten years. Nonetheless, several trends appear to be safe predictions for the future evolution of database management. They are discussed in the final sections of this chapter.  

Distributed Databases

As more and more applications are used on an enterprisewide basis or beyond, the ability of a single, centralized database to support dozens of major applications and thousands of concurrent users will continue to erode. Instead, major corporate databases will become more and more distributed, with dedicated databases supporting the major applications and functional areas of the corporation. To meet the higher service levels required of enterprisewide or Internet-based applications, data must be distributed; but to ensure the integrity of business decisions and operations, the operation of these distributed databases must be tightly coordinated.

Another strain on centralized database architectures will be the continuing growth of mobile personal computers and other mobile information appliance devices. These devices are, by their nature, more useful if they can become an integral part of a distributed network. However, by their nature, they are also occasionally connected— they work in a sometimes-disconnected, sometimes-connected mode, using either wired or wireless networks. The databases at the heart of mobile applications must be able to operate in this occasionally connected environment.

These trends will drive heavy demand for data distribution, database integration, data synchronization, data caching, data staging, and distributed database technology. A one-size-fits-all model of distributed data and transaction is inadequate for the highly distributed, anywhere/anytime environment that will emerge. Instead, some transactions will require absolute synchronization with a centralized master database, while others will demand support for long-duration transactions where synchronization may take hours or days. Developing ways to create and operate these distributed environments, without having them become a database administrator’s nightmare, will be a major challenge for DBMS vendors in the next decade, and a major source of revenues for the vendors that provide practical, relatively easy-to-use solutions.

Massive Data Warehousing

The last few years have demonstrated that companies that use database technology aggressively and treat their data as a valuable corporate asset can gain tremendous competitive advantage. The competitive success of WalMart, for example, is widely attributed to its use of information technology (led by database technology) to track its inventory and sales on a daily basis, based on cash register transaction data. This allowed the company to minimize its inventory levels and closely manage its supplier relationships. Data mining techniques have allowed companies to discover unexpected trends and relationships based on their accumulated data—including the legendary discovery by one retailer that late-night sales of diapers were highly correlated with sales of beer.

It seems clear that companies will continue to accumulate as much information as they can on their customers, sales, inventories, prices, and other business factors. The Internet creates enormous new opportunities for this kind of information-gathering. Literally every customer or prospective customer’s interaction with a company’s web site, click-by-click, provides potential clues to the customer’s wants, needs and behavior. That type of click-by-click information can easily generate tens of gigabytes of data or more per day on a busy web site. The databases to manage these massive quantities of data will need to support multilevel storage systems. They will need to rapidly import vast quantities of new data, and rapidly peel off large data subsets for analysis. Despite the high failure rate of data warehousing projects, the large potential payoffs in reduced operating costs and more on-target marketing and sales activities will continue to drive data warehousing growth.

Beyond the collection and warehousing of data, pressure will build to perform business analyses in real time. IS consulting groups are writing about the zero-latency enterprise or the real-time enterprise to describe an architecture in which customer interactions translate directly into changes in business plans with zero or very little delay. To meet this challenge, database systems will continue to take advantage of processor speed advances and multiprocessing technologies.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Ultra-High-Performance Databases}

The emergence of an Internet-centric architecture is exposing enterprise data processing infrastructures to new peak-load demands that dwarf the workloads of just a few years ago. When databases primarily supported in-house applications used by a few dozen employees at a time, database performance issues may have produced employee frustration, but they did not really impact customers. The advent of call centers and other customer support applications produced a closer coupling between data management and customer satisfaction, but applications were still limited to at most hundreds of concurrent users (the people manning the phones in the call center).

With the Internet, the connection between a customer and the company’s databases becomes a direct one. Database performance problems translate directly into slow customer response times. Database unavailability translates directly into lost sales. Furthermore, databases and other parts of the data processing infrastructure are no longer buffered from peak-load transaction rates. If a financial services firm offers online trading or portfolio management, it will need to prepare for peak-load volumes on days of heavy stock price movement that may be 10 or 20 times the average daily volume. Similarly, an online retailer must gear up to support the heaviest end-of-year selling season, not just mid-March transaction rates.  

The demands of e-commerce and real-time Internet information access are already producing peak-load transaction rates from the most popular Internet services that are one or two orders of magnitude higher than the fastest conventional disk-based RDBMS systems. To cope with these demands, companies will increasingly turn to distributed and replicated databases. They will pull hot data forward and cache it closer to the customer interaction within the network. To meet peak-load demands, they will use in-memory databases. This will, in turn, require new database support for deciding which data to cache, and which levels of synchronization and replication are appropriate. At first, these issues will apply only to the largest and highest-volume sites, but just as web page caching has become an accepted and then an essential technique for maintaining adequate web browser performance, hot data caching will become a mainstream Internet data management architecture as volumes grow.

Internet and Network Services Integration

In the Internet era, database management will increasingly become just one more network service, and one that must be tightly integrated with other services, such as messaging, transaction services, and network management. In some of these areas, standards are well established, such as the XA standard for distributed transaction management. In others, standards are in their infancy or are just emerging, such as the SOAP standard for sending XML data over the Internet’s HTTP protocol and the UDDI standards for finding services in a distributed network environment.

The multitier architecture that is dominating Internet-centric applications also poses new questions about which roles should be played by the database manager and by other components of the overall information system. For example, when network transactions are viewed from the point of distributed databases, a two-phase commit protocol, implemented in a proprietary way by a DBMS vendor, may provide a solution. When network transactions involve a combination of legacy applications (e.g., mainframe CICS transactions), relational database updates, and interapplication

messages, the transaction management problem moves outside the database, and external mechanisms are required.

A similar trade-off surrounds the emergence of Java-based application servers as a middle-tier platform for executing business logic. Before the Internet era, stored procedures became known as the accepted DBMS technique for embedding business logic within the database itself. More recently, Java has emerged as a viable stored procedure language, an alternative to earlier, vendor-proprietary languages. Now, application servers create an alternative platform for business logic written in Java, in this case, external to the database. It’s not yet clear how these two trends will be rationalized, and whether business logic will continue its migration into the database or will settle in an application server layer. Whichever trend predominates, tighter integration between database servers and application servers will be required. Several of the DBMS vendors now produce their own application servers, and it seems likely that they will provide the best integration within their own product lines. Whether this approach will prevail against a best-of-breed approach remains another open question.

Embedded Databases

Relational database technology has reached into many parts of the computer industry, from small handheld devices to large mainframes. Databases underlie nearly all enterprise-class applications as the foundation for storing and managing their information. Lightweight database technology underlies an even broader range of applications. Directory services, a foundation technology for the new era of value-added data communications network services, are a specialized form of database technology. Lightweight, very-high-performance databases also form an integral part of telecommunications networks, enabling cellular networks, advanced billing schemes, smart messaging services, and similar capabilities.

These embedded database applications have traditionally been implemented using proprietary, custom-written data management code tightly integrated with the application. This application-specific approach produced the highest possible performance, but at the expense of an inflexible, hard-to-maintain data management solution. With declining memory prices and higher-performance processors, lightweight SQL-based relational databases are now able to economically support these applications.

The advantages of a standards-based embedded database are substantial. Without a serious compromise in performance, an application can be developed in a more modular fashion, changes in database structure can be handled transparently, and new services and applications can be rapidly deployed atop existing databases. With these advantages, embedded database applications appear destined to be a new area of growth potential for SQL and relational database technology. As in so many other areas of information technology, the ultimate triumph of SQL-based databases may be that they disappear into the fabric of other products and services—invisible as a stand-alone component, but vital to the product or service that contains them.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.

{mospagebreak title=Object Integration}

The most significant unknown in the future evolution of SQL is how it will integrate with object-oriented technologies. Modern application development tools and methodologies are all based on object-oriented techniques. Two object-oriented languages, C++ and Java, dominate serious software development, for both client-side and server-side software. The core row/column principles of the relational data model and SQL, however, are rooted in a much earlier COBOL era of records and fields, not objects and methods.

The object database vendors’ solution to the relational/object mismatch has been the wholesale discarding of the relational model in favor of pure object database structures. But the lack of standards, steep learning curve, lack of simple query facilities, and other disadvantages have prevented pure object databases from having any significant market success to date. The relational database vendors have responded to the object database challenge by embracing object-oriented features, but the result has been a proliferation of nonstandard, proprietary database features and SQL extensions.  

It’s clear that relational database technology and object technology must be more tightly integrated if relational databases are to remain an integral part of the next generation of applications. Several trends are visible today:

  • Java-based interfaces to RDBMS’s, such as JDBC and embedded SQL for Java, will continue to grow rapidly in popularity.
  • Java will become a more important stored procedure language for implementing business logic within a RDBMS. Virtually all of the major DBMS vendors have announced plans to support Java as an alternative to their proprietary stored procedure languages.

  • DBMS products will expand support for abstract, complex data types that exhibit object-oriented capabilities such as encapsulation and inheritance. Beyond high-level agreement on the need to store objects within a row/column structure, the specifics (nested tables, arrays, complex columns) vary dramatically.

  • The SQL:1999 standard for object-oriented extensions to SQL will influence vendor products, but slowly, as vendors continue to seek competitive advantages and user lock-in through proprietary object-oriented extensions.

  • Message-oriented interfaces, including database triggers that produce messages external to the DBMS for integration with other applications, will grow in importance, as the database becomes a more active component for integrating systems together.

  • XML will emerge as an important standard format for representing both data retrieved from a SQL database, and data to be entered into or updated in a database.

  • DBMS vendors will offer SQL extensions to store and retrieve XML documents, and to search and retrieve their contents.

Whether these extensions to SQL and the relational model can successfully integrate the worlds of RDBMS and objects remains to be seen. The object-oriented database vendors continue to maintain that object capabilities bolted onto an RDBMS can’t provide the kind of transparent integration needed. Most of them have enthusiastically embraced XML as the newest wave of object technology. The enterprise DBMS vendors have announced and added substantial object-relational capabilities, and more recently, XML integration products and features, but it’s hard to determine how many of them are actually being used. In addition, the emergence of XML as an important Internet standard has given birth to a new round of database challengers, offering native XML databases. With all of these competing alternatives, the further integration of object technologies into the world of relational databases seems certain. The specific path that this evolution will take remains the largest unknown in the future of SQL.

Summary

SQL continues to play a major role in the computer industry, and appears poised to continue as an important core technology:

  • SQL-based databases are flagship software products for the three largest software vendors in the world: Microsoft, Oracle, and IBM.

  • SQL-based databases operate on all classes of computer systems, from mainframes and database servers to desktop computer clients, notebook computers, and handheld PDAs.

  • All of the major enterprise applications used in large organizations rely on enterprise-class SQL databases to store and structure their data.

  • SQL-based databases have responded successfully to the challenges of the object model, with SQL extensions in object/relational databases.

  • SQL-based databases are responding to the needs of Internet-based architectures by incorporating XML and integrating with application servers.

Remember: this is chapter 26 of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. 
Buy this book now.
 

Google+ Comments

Google+ Comments