In this article, Vikram gives us a sneak-peek under the hood of MySQL to see what makes it tick, all the while explaining the various MySQL subsystems and how they interact with each other. This excerpt comes from Chapter two of MySQL: The Complete Reference (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004).
To take advantage of multiprocessor architecture, MySQL is built using a multi-threaded design, which allows threads to be allocated between processors to achieve a higher degree of parallelism. This is important to know not only for the database administrator, who needs to understand how MySQL takes best advantage of processing power, but also for developers, who can extend MySQL with custom functions. All custom functions must be thread safe—that is, that they must not interfere with the workings of other threads in the same process as MySQL.
Threading the Needle
To better understand threads in general, we must look at the relationship between threads and processes. A process contains both an executing program and a collection of resources, such as the file map and address space. All threads associated with a given task share the task’s resources. Thus, a thread is essentially a program counter, a stack, and a set of registers; all the other data structures belong to the task. A process effectively starts out as a task with a single thread.
A thread is a miniprocess that has its own stack and that executes a given piece of code. A thread normally shares its memory with other threads, unlike true processes, which will usually have a different memory area for each one. A thread group is a set of threads all executing inside the same process. They all share the same memory and thus can access the same global variables, the same heap memory, the same set of file descriptors, and so on. All the threads in a thread group execute either by using time slices when running on a single processor or in parallel if multiple processors are available.
The advantage of using a thread group over using a process group is that context switching between threads is much faster than context switching between processes; in other words, the system switches from running one thread to running another thread much faster than it can switch from one process to another. Also, communication between two threads is usually faster and easier to implement than communication between two processes, since the threads already share common address space in which to share variables.
The POSIX thread libraries are a standards-based thread API for C and C++. They are most effectively used on multiprocessor systems, where the process flow can be scheduled to run on another processor, thus increasing speed through parallel or distributed processing. Threads require less overhead than forking, or spawning a new process, because the system will not initialize a new virtual memory space and environment for the process. While POSIX is most beneficial on a multiprocessor system, gains are also found on single processor systems, which exploit latency in input/output (I/O) and other system functions that can halt process execution.
MySQL makes use of various thread packages, depending on the platform. POSIX threads are used on most UNIX variants, such as FreeBSD and Solaris. LinuxThreads are used for Linux distributions, while, for efficiency reasons, Windows threads are used on the Windows platform (but the code that handles them is designed to simulate POSIX threads).
Because MySQL is a threaded application, it is able to let the operating system take over the task of coordinating the allocation of threads to balance the workload across multiple processors. MySQL uses these threads to do the following:
A global connection thread handles all connection requests and creates a new dedicated thread to handle authentication and SQL query processing for each connection. This type of thread works on both TCP/IP and namedpipe connections.
Internal semaphores and alarm listening functions are handled by a separate thread.
When requested, a dedicated thread is created to flush all tables at the specified interval.
Every single table on which a delayed INSERT is being used gets its own thread.
In replication, master-host synchronization is handled by separate threads.
Of course, another way to take advantage of multiprocessing is to run multiple instances of MySQL on the same machine, thereby spawning a separate process for each instance. This approach is especially practical for hosting companies and even for internal hosting within corporate environments. By running multiple instances of MySQL on the same computer, you can easily accommodate multiple user bases that need different configuration options.
Remember: this is chapter two of MySQL: The Complete Reference, by Vikram Vaswani (McGraw-Hill/Osborne, ISBN 0-07-222477-0, 2004). Vikram is the founder of Melonfire, and has had numerous articles featured on Dev Shed. Buy this book now.