Microsoft SQL Server and many mainstream database products on the market today use the Write-Ahead Logging (WAL) protocol.
Write-Ahead Logging (WAL) Protocol
The term protocol is an excellent way to describe WAL. It is a specific and defined set of implementation steps necessary to ensure that data is stored and exchanged properly and can be recovered to a known state in the event of a failure. Just as a network contains a defined protocol to exchange data in a consistent and protected manner, so too does the WAL describe the protocol to protect data.
All versions of SQL Server open the log and data files using the Win32
CreateFile function. The
dwFlagsAndAttributes member includes the
FILE_FLAG_WRITE_THROUGH option when opened by SQL Server.
FILE_FLAG_WRITE_THROUGH
This option instructs the system to write through any intermediate cache and go directly to disk. The system can still cache write operations, but cannot lazily flush them.
The FILE_FLAG_WRITE_THROUGH option ensures that when a write operation returns successful completion the data is correctly stored in stable storage. This aligns with the Write Ahead Logging (WAL) protocol specification to ensure the data.
Many disk drives (SATA, ATA, SCSI and IDE based) contain onboard caches of 512 KB, 1 MB, and larger. Drive caches usually rely on a capacitor and not a
battery-backed solution. These caching mechanisms cannot guarantee writes across a power cycle or similar failure point. They only guarantee the completion of the sector write operations. As the drives continue to grow in size, the caches become larger, and they can expose larger amounts of data during a failure.
Enhanced caching controller systems disable on-disk cache and provide a functional battery-backed caching solution. These caches can maintain the data in the cache for several days and even allow the caching card to be placed in a second computer. When power is properly restored, the unwritten data is completely flushed before any further data access is allowed. Many of them allow percentage of read versus write cache to be established for optimal performance. Some contain large memory storage areas. In fact, for a very specific segment of the market, some hardware vendors provide high-end battery-backed disk caching systems with multiple gigabytes of cache. These can significantly improve database performance.
I/O transfers that are performed without the use of a cache can be significantly longer because of hard drive spin rates, the mechanical time that is needed to move the drive heads, and other limiting factors. SQL Server installations are targeted at systems that provide caching controllers. These controllers disable the on-disk caches and provide stable media caches to satisfy SQL Server I/O requirements. They avoid performance issues related to disk seek and write times by using the various optimizations of the caching controller.
There are many types of subsystem implementations. RAID and SAN are two examples of these types of subsystem implementations. These systems are typically built with SCSI-based drives. There are several reasons for this. The following section generically describes high level drive considerations.
SCSI drives:
- Are typically manufactured for heavy duty use.
- Are typically targeted at multiuser, server-based implementations.
- Typically have better meantime to failure rates than other implementations.
- Contain sophisticated heuristics to help predict imminent failures.
Other drive implementations, such as IDE, ATA, and SATA:
- Are typically manufactured for light and medium duty use.
- Are typically targeted at single user-based applications.
- Some newer implementations contain sophisticated heuristics to help predict imminent failures.
Non-SCSI, desktop-based controllers require more main processor (CPU) bandwidth, and are frequently limited by a single active command. For example, when a non-SCSI drive is adjusting a bad block, the drive requires that the host commands wait. The ATA bus presents another example. The ATA bus supports 2 devices, but only a single command can be active. This leaves one drive idle while the other drive services the pending command. Raid systems built on desktop technologies can all experience these symptoms and be greatly affected by the slowest responder. Unless these systems use advanced designs, their performance will not be as efficient as the performance of SCSI-based systems.
There are situations in which a desktop-based drive or array is an appropriate low cost solution. For example, if you set up a read-only database for reporting, you should not encounter many of the performance factors of an OLTP database when disk caching is disabled.
Drive sizes continue to increase. Low cost, high capacity drives can be very appealing. But when you configure the drive for SQL Server and your business response time needs, you should carefully consider the following issues:
- Access path design
- The requirement to disable the on-disk cache
The following table provides high level comments. The comment information is based on common manufacture configurations.
Drive type System base | Comments |
IDE and ATA | - Spin at 7,200 RPM.
- Targeted at desktops and single user applications.
- By default, the disk cache is enabled. Use the 'Disk Properties', Hardware tab to access the 'Properties', 'Policy' tab to control the disk cache setting.
Note Some drives do not honor this setting. These drives require a specific manufacturer utility to disable cache. - Disk caching should be disabled in order to use the drive with SQL Server.
- IDE and ATA-based systems can postpone host commands when they perform activities such as bad block adjustment. This could lead to periods of stalled I/O activity.
|
IDE/Portable computer | - Spin at 5,200 RPM.
- Targeted at portable computers.
- See other comments in IDE and ATA section.
- ReadyDrive and NAND Flash storage may provide stable I/O capabilities.
|
SATA | - Spin at 7,200 RPM.
- Targeted at desktops and medium range servers.
- By default, the disk cache is enabled. Use the 'Disk Properties', Hardware tab to access the 'Properties', 'Policy' tab to control the disk cache setting.
Note Some drives do not honor this setting. These drives require a specific manufacturer utility to disable cache. - Larger storage capacity and larger caches than IDE and ATA counter parts.
- Disk caching should be disabled in order to use the drive with SQL Server.
- Smaller cabling and interconnects.
- Easy interconnect with Serial Attached SCSI (SAS) drives and installations.
- Some include failure prediction constructs.
|
SCSI | - Spin at 10,000 and 15,000 RPM.
- Designed for servers and multiuser applications.
- Designed for heavier duty cycles and decreased failure rates.
- By default, the disk cache is enabled. Use the 'Disk Properties', Hardware tab to access the 'Properties', 'Policy' tab to control the disk cache setting.
Note Some drives do not honor this setting. These drives require a specific manufacturer utility to disable cache. - Disk caching should be disabled in order to use the drive with SQL Server.
- Array and SAN systems typically use smaller cabling and interconnects.
- Include failure prediction constructs.
|
Serial Attached SCSI (SAS) includes advanced queuing up to 256 levels. SAS also includes features such as head of queue and out of order queuing. The SAS backplane is designed in a way that enables the use of both SAS and SATA drives within the same system. The benefits of the SAS design are obvious.
Your SQL Server installation depends on the controller's ability to disable the on-disk cache and to provide a stable I/O cache. Writing data out of order to various disks is not a hindrance to SQL Server as long as the controller provides the correct stable media caching capabilities. The complexity of the controller design increases with advanced data security techniques such as mirroring.