Software Engineering

Thursday, July 1, 2010

The request failed with HTTP status 417: Expectation failed.

When using WebServices and calling some Web Methods, you may come across to this exception:

"The request failed with HTTP status 417: Expectation failed."

Solution:

It is easy, here are two alternative ways:

1) We need to have this line of code before making any web requests.

System.Net.ServicePointManager.Expect100Continue = false;

2) Add these lines to the application's configuration file (between <configuration> and </configuration>):

1:
2:
3:
4:
5:
<system.net>
 <settings>
  <servicePointManager expect100Continue="false" />
 </settings>
</system.net>

While this workaround is nice, I think a better long term solution is to upgrade/replace the proxy server to handle 100-continue calls.

So far in this short tip we've learned what is Expect 100, how it is caused and how to workaround this expectation.

ASP.NET AJAX - Update Panel Timeout

When using the asp:UpdatePanel server control for asyncronous communication with the web server the default timeout is 90 seconds. For some processes this may not be long enough.

For example, I am currently working on a reporting application and at the moment I have to query a table with over 36 million records. Some queries take longer than 90 seconds. But unfortunately I had trouble finding out how to extend this.

It turns out that this is a page-level setting. The asp:ScriptManager server control has a property namedAsyncPostBackTimeout. Set this value to the number of seconds you need to resonably run the process in question and you'll be fine.

<asp:scriptmanager id="scriptManager1" runat="server" enablepartialrendering="true" asyncpostbacktimeout="1000"></asp:ScriptManager>

Friday, April 9, 2010

SOAP vs REST Web Services

For the past two years, the hype surrounding the Simple Object Access Protocol (SOAP) has barely waned, although its opponents have gradually risen in number. While some critics are simply tired of hearing about Web services, a small handful of Internet architects have come up with a surprisingly good argument for pushing SOAP aside: there's a better method for building Web services in the form of Representational State Transfer (REST).

REST is more an old philosophy than a new technology. Whereas SOAP looks to jump-start the next phase of Internet development with a host of new specifications, the REST philosophy espouses that the existing principles and protocols of the Web are enough to create robust Web services. This means that developers who understand HTTP and XML can start building Web services right away, without needing any toolkits beyond what they normally use for Internet application development.

Interface Flexibility
The key to the REST methodology is to write Web services using an interface that is already well known and widely used: the URI. For example, exposing a stock quote service, in which a user enters a stock quote symbol to return a real-time price, could be as simple as making a script accessible on a Web server via the following URI: http://www.somebrokerage.com/quote?symbol=QQQ.

Any client or server application with HTTP support could easily call that service with an HTTP GET command. Depending on how the service provider wrote the script, the resulting HTTP response might be as simple as some standard headers and a text string containing the current price for the given ticker symbol. Or, it might be an XML document.

This interface method has significant benefits over SOAP-based services. Any developer can figure out how to create and modify a URI [a1] to access different Web resources. SOAP, on the other hand, requires specific knowledge of a new XML specification, and most developers will need a SOAP toolkit to form requests and parse the results.

Lighter on Bandwidth
Another benefit of the RESTful interface is that requests and responses can be short. SOAP requires an XML wrapper around every request and response. Once namespaces and typing are declared, a four- or five-digit stock quote in a SOAP response could require more than 10 times as many bytes as would the same response in REST.

SOAP proponents argue that strong typing is a necessary feature for distributed applications. In practice, though, both the requesting application and the service know the data types ahead of time; thus, transferring that information in the requests and responses is gratuitous.

How does one know the data types—and their locations in the response—ahead of time? Like SOAP, REST still needs a corresponding document that outlines input parameters and output data. The good part is that REST is flexible enough that developers could write WSDL files for their services if such a formal declaration was necessary. Otherwise, the declaration could be as simple as a human-readable Web page that says, "Give this service an input of some stock ticker symbol, in the format q=symbol, and it will return the current price of one share of stock as a text string."

Security Safeguards
Probably the most interesting aspect of the REST vs. SOAP debate is the security angle. Although the SOAP camp insists that sending remote procedure calls through standard HTTP ports is a good way to ensure Web services support across organizational boundaries, REST followers argue that the practice is a major design flaw that compromises network safety. REST calls also go over HTTP or HTTPS, but with REST the administrator (or firewall) can discern the intent of each message by analyzing the HTTP command used in the request. For example, a GET request can always be considered safe because it can't, by definition, modify any data. It can only query data.

A typical SOAP request, on the other hand, will use POST to communicate with a given service. And without looking into the SOAP envelope—a task that is both resource-consuming and not built into most firewalls—there's no way to know whether that request simply wants to query data or delete entire tables from the database.

As for authentication and authorization, SOAP places the burden in the hands of the application developer. The REST methodology instead takes into account the fact that Web servers already have support for these tasks. Through the use of industry-standard certificates and a common identity management system, such as an LDAP server, developers can make the network layer do all the heavy lifting.

This is not only helpful to developers, but it eases the burden on administrators, who can use something as simple as ACL[a2] files to manage their Web services the same way they would any other URI.

Not For Everything
To be fair, REST isn't the best solution for every Web service. Data that needs to be secure should not be sent as parameters in URIs[a3] . And large amounts of data, like that in detailed purchase orders, can quickly become cumbersome or even out of bounds within a URI. In these cases, SOAP is indeed a solid solution. But it's important to try REST first and resort to SOAP only when necessary. This helps keep application development simple and accessible.

Fortunately, the REST philosophy is catching on with developers of Web services. The latest version of the SOAP specification now allows certain types services to be exposed through URIs (although the response is still a SOAP message). Similarly, users of Microsoft .NET platform can publish services so that they use GET requests. All this signifies a shift in thinking about how best to interface Web services.

Developers need to understand that sending and receiving a SOAP message isn't always the best way for applications to communicate. Sometimes a simple REST interface and a plain text response does the trick—and saves time and resources in the process.

[a1]Universal Resouce Identifier

[a2]Active Control List

[a3]Very Important Notes

Thursday, March 25, 2010

C# MIDI

Introduction

This is the fifth version of my .NET MIDI toolkit. I had thought that the previous version was the final one, but I have made many changes that have warranted a new version. This version takes a more traditional C#/.NET approach to flow-based programming, which I'll describe below. I wasn't comfortable with version four's implementation along these lines, so I took a step back and made changes that keep the flow-based approach while remaining within C#/.NET accepted idioms. I'm hoping that this will make the toolkit easier to use and understand.
The toolkit has seen many revisions over the past two to three years. Each revision has been an almost total rewrite of the previous one. When writing software, it is usually a bad idea to make updates that break software using previous versions. However, my goal in creating this toolkit has been to provide the best design possible. As I have grown as a programmer, I have improved my skills and understanding of software design. This has led me to revise the earlier designs of the toolkit without regard to how these revisions will break code. Not exactly the attitude one wants to adopt in a professional setting, but since the toolkit is free and since I have used it as a learning experience to improve my craft, my priorities are different.

Flow-based programming

Before I get into the specifics of the toolkit, I would like to talk about its architecture. With each version of the toolkit, I have struggled with how to structure the flow of messages through the system. I wanted an approach that would be versatile and allow customization. It would be nice, I thought, if users could plug their own classes into the flow of MIDI messages to do whatever they want. For example, say you wanted to write a harmonizer, a class capable of transposing notes in a specified key to create harmony parts automatically. It should be easy to simply plug the harmonizer into the toolkit without affecting other classes. In other words, the toolkit should be customizable and configurable.
Investigating this problem led me to J. Paul Morrison's excellent website on flow-based programming. He has written a book on the subject, which can be found on his website as well as at Amazon.
The idea is simple and will probably seem familiar to most: Data flows through a network of components. Each component can do something interesting with the data before passing it along to the next component. In design pattern terms, this approach is most like the Pipe and Filter pattern and is also similar to the Chain of Responsibility pattern. Please check out J. Paul Morrison's excellent book for more information.
(Just to be clear: when I say "component," I'm not necessarily talking about classes that implement the IComponent interface. I'm speaking in more general terms. A component is simply an object in a chain of objects designed to process the flow of messages.)
Below is a very basic network of components designed to handle the flow of MIDI channel messages:
Screenshot - MessageFlow.PNG

The flow of messages begins with the input device. An input device receives MIDI messages from an external source. Next, the messages flow through a user component. This component might want to do something like change the MIDI channel, transpose note messages, or change the messages in some way. Then the messages pass through the channel stopper. This component simply keeps track of all currently sounding notes. When the message flow stops, the channel stopper can turn off all sounding notes so that none of them hang. Finally, the messages reach the output device. Here they are sent to an external MIDI device.

Implementing flow-based programming in C#

Well, this is something I really struggled with. You can read a bit about the different ways I tried to achieve this by reading my blog. I found myself going round in circles on this. In version four of the toolkit, I settled on the idea of using a source/sink abstraction. I created an interface representing "Sources." A source represents a source of MIDI messages. "Sinks" were represented by delegates that could be connected to sources; a sink is simply a method capable of receiving a MIDI message. This worked well but it was a little confusing because the implementation looked a little "funny." That is to say, a C# programmer looking at the code for the first time might be confused as to what is going on.
I decided to do away with the sink/source infrastructure and use something more idiomatic. Sources of MIDI messages raise events when they have messages to send. Instead of implementing an interface and having Connect and Disconnect methods for hooking to sinks, they would simply have events. There are two advantages here: First, sources no longer have to implement an ISource interface, and second, .NET events are something very familiar to us. So sources of MIDI messages now look like your everyday class that just happens to have one or more events.
How about sinks, those objects that can receive MIDI messages? A sink can be anything. It's just an object that has a method that can receive a MIDI message. In version four, I had a Sink delegate for representing methods of objects capable of receiving MIDI messages. These delegates were used to connect with sources. This approach of using delegates to "connect" sources and sinks is still used in the toolkit but not as before. Instead, delegates are used as adaptors that connect to the events raised in sources and adapts the events so that objects that need to receive the messages can do so without any knowledge of the source.
Let's look at an example. Say that we're using an InputDevice to receive MIDI messages from a MIDI device, such as your soundcard. The InputDevice raises a ChannelMessageReceived event each time it receives a channel message. Suppose that we want to keep track of any note-on channel messages so that when we decide to stop receiving messages, we can turn off any currently sounding notes to keep them from "hanging." The ChannelStopper class is just for this purpose. However, the ChannelStopper has no knowledge of the InputDevice class. We need a way to hook them up so that messages generated by the InputDevice can be passed along to the ChannelStopper. Here is how we can do this with an anonymous method:

InputDevice inDevice = new InputDevice(0);
ChannelStopper stopper = new ChannelStopper();

inDevice.ChannelMessageReceived += delegate(object sender, ChannelMessageEventArgs e)
{
    stopper.Process(e.Message);
};

inDevice.StartRecording();

// ...

In this example, an anonymous method adapts events raised by the InputDevice so that they can be processed by a ChannelStopper. The InputDevice is the source of channel messages and the ChannelStopper is a sink capable of receiving and processing channel messages. The nice thing about this approach is that no explicit source/sink infrastructure is needed. Neither class knows anything about being a source or sink. The flow of messages is orchestrated by an external agent, in this case, an anonymous method.

MIDI messages

There are several categories of MIDI messages: Channel, System Exclusive, Meta, etc. In designing a MIDI toolkit, the challenge is to decide how to represent these messages. One approach is to create two or three general MIDI classes and have specific types of MIDI messages represented through the properties of those classes. The Java MIDI API takes this route. Another approach is to create a large collection of finely grained classes to represent all of the different types of MIDI messages. For example, there are many types of Channel messages such as the Note-on, Note-off, Program change, and Pitch Bend messages. The fine grained approach would create a class for each of those message types. My approach was to take a middle ground. I created classes for the general categories of MIDI messages but left the specific types of messages as properties within those classes. This kept the class hierarchy lightweight and manageable while providing enough specialization to make working with MIDI messages easy.
Here is the hierarchy of MIDI message classes in the MIDI toolkit:

IMidiMessage
- ShortMessage
  - ChannelMessage
  - SysCommonMessage
  - SysRealtimeMessage
- SysExMessage
- MetaMessage

Specific types of messages are represented through properties. For example, the ChannelMessage class has a Command property that can be set to represent the various types of Channel messages.

Message builders

All message classes are immutable. This makes sharing messages throughout an application safe. To create messages, you pass the desired property values to their constructor. Additionally, the toolkit provides a set of builder classes for making message creation more convenient.
The toolkit provides the following message builders:

ChannelMessageBuilder
SysCommonMessageBuilder
KeySignatureBuilder
MetaTextBuilder
SongPositionPointerBuilder
TempoChangeBuilder
TimeSignatureBuilder

The ChannelMessageBuilder and the SysCommonBuilder also use the Flyweight design pattern. When a new message is built, it is stored in a cache. When another message is needed that has the exact same properties as a message that has already been built, the previous message is retrieved rather than creating a new one. When you consider that a typical MIDI sequence is made up of thousands of messages, many of them identical, it is easy to see how the Flyweight pattern is applicable.
Here is an example of creating a ChannelMessage object representing a note-on message:

ChannelMessageBuilder builder = new ChannelMessageBuilder();

builder.Command = ChannelCommand.NoteOn;
builder.MidiChannel = 0;
builder.Data1 = 60;
builder.Data2 = 127;
builder.Build();
Console.WriteLine(builder.Result);

After the builder has been initialized with the desired properties, the MIDI message is built with a call to the Build method. The MIDI message can then be retrieved via the Result property.
There are several builder classes for creating specific types of meta messages. For example, to create a key signature meta message, you use the KeySignatureBuilder class:

KeySignatureBuilder builder = new KeySignatureBuilder();
builder.Key = Key.CMajor;
builder.Build();
Console.WriteLine(builder.Result);

MessageDispatcher class

Often there is a need to process a collection of IMidiMessages. How each message is processed depends on its type. The problem is that you cannot tell an IMidiMessage's type without an explicit check. The IMidiMessage provides a MessageType property just for this purpose. However, having to repeatedly check message types throughout your code can be cumbersome.
The MessageDispatch class is designed to automate these checks. This class acts as a source for every type of MIDI message. It raises an event each time it dispatches a message. The type of event is determined by the type of message it is dispatching.

Clocks

MIDI playback is driven by ticks that occur periodically. The source of these ticks are MIDI clocks. MIDI clocks come in all shapes and sizes. For example, playback can be driven by an internal or external clock. Also, the way in which the ticks are generated depends on whether the MIDI sequence has pulses per quarter note resolution or SMPTE resolution. For the vast majority of situations, an internal clock generating ticks with pulses per quarter note resolution is all you need.
The IClock interface represents the basic functionality for all MIDI clocks:

public interface IClock
{
    event EventHandler Tick;
    event EventHandler Started;
    event EventHandler Continued;
    event EventHandler Stopped;
    bool IsRunning
    {
        get;
    }
}

The Tick event occurs when a MIDI tick has elapsed. The Started, Continued, and Stopped events are self-explanatory. However, it should be pointed out that when the Started event occurs, sequence playback starts from the beginning of the sequence. When the Continued event occurs, playback starts from the current position. The IsRunning property indicates whether the clock is running.
You may notice that there are no methods in the interface for starting and stopping a clock. That is because with clocks that are driven by an external source, the source is responsible for starting and stopping the clock. The clocks receive messages via MIDI and based on those messages starts or stops generating ticks. Since all MIDI clocks implement IClock, it only represents the functionality common to all the clocks.
At this time, the toolkit provides only one clock class, the MidiInternalClock. This clock generates MIDI ticks internally using pulses per quarter note resolution. For the majority of situations, this clock will work fine.
The MidiInternalClock has a Tempo property for setting the tempo in microseconds per beat. To set the tempo to 120 bpm, for example, you would set the Tempo property to 500000. It can receive meta tempo change messages. When a meta tempo change message is passed to it, it changes its tempo to match the tempo represented by the message.

MidiEvent class

A MIDI file is made up of several tracks. Each track contains one or more timestamped MIDI messages. The timestamp represents the number of ticks since the last message was played. This timestamp is called delta ticks. The MidiEvent class represents a timestamped MIDI message. It has three public properties:

DeltaTicks
AbsoluteTicks
MidiMessage

The DeltaTicks property represents the number of ticks since the last MidiEvent. In other words, this value represents how long to wait after playing the previous MidiEvent before playing the current MidiEvent. For example, if the DeltaTicks value is 10, we would allow 10 ticks to elapse before playing the MIDI message represented by the current MidiEvent.
The AbsoluteTicks represents the overall position of the MidiEvent. This is the total number of ticks that have elapsed until the current MidiEvent.
The MidiMessage property is the IMidiMessage represented by the MidiEvent.
In addition there are two internal properties, one which points to the previous MidiEvent in the track, and one which points to the next MidiEvent in the track. In other words, the MidiEvent class acts as a node in a doubly linked list of MidiEvents.

Track class

The Track class represents a collection of MidiEvents. It is responsible for maintaining a collection of MidiEvents in proper order. MidiEvents are not directly added to a Track. Instead, you add an IMidiMessage, specifying its absolute position in the Track. The Track then creates a MidiEvent to represent the message and inserts it into its collection of MidiEvents.
In addition to providing functionality for adding and removing MIDI events, the Track class also provides several iterators. There is a standard iterator that simply iterates over the MidiEvents one at a time. Another iterator takes a MessageDispatcher object and passes each IMidiMessage to the dispatcher which in turn raises an event specific to the type of message it is dispatching. The value the iterator returns is the absolute ticks for the current MidiEvent.
Perhaps the most useful iterator is the one that when advanced moves forward only one tick at a time. The iterator keeps track of its tick position in the Track. When the tick count has reached a value in which it is time to play the next MidiEvent, it passes the IMidiMessage represented by the MidiEvent to the MessageDispatcher and returns the absolute tick count. This iterator also takes a ChannelChaser object as well as a start position value and "chases" up to the start position before switching to the playback mode. In essence, this iterator allows us to stream the Track in real-time.

Sequence class

The Sequence class represents a collection of Tracks. It also provides functionality for loading and saving MIDI files, so Sequences can load and save themselves.
Every Sequence has a division value. This value represents the resolution of the Sequence and is represented by a property. There are two types of division values: Pulses per quarter note and SMPTE. The Sequence has a SequenceType property indicating the sequence type. Unfortunately, SMPTE sequences aren't supported at this time.

MIDI devices

There are several MIDI device classes in the toolkit. Each device class is derived directly or indirectly from the abstract Device class in the Sanford.Multimedia namespace. The InputDevice class represents a MIDI device capable of receiving MIDI messages from an external source, such as a MIDI keyboard controller or synthesizer. The OutputDeviceBase class is an abstract class that serves as the base class for the output device classes. The OutputDevice class represents a MIDI device capable of sending MIDI messages to an external source or your soundcard. And the OutputStream class encapsulates the Windows Multimedia MIDI output stream API. It is capable of playing back timestamped MIDI messages.
There can be more than one of these devices present on your computer. To determine the number of input devices present, for example, you would query the InputDevice's static DeviceCount property. The output device classes also have this property.
Each MIDI device has its own unique ID. This is simply an integer value representing the device's order in the list of devices available. For example, the first output device on your system would have an ID of 0. The second output device would have an ID of 1, and so on. The same is true for the input devices. When you create a MIDI device, you pass it the ID of the device you wish to use to its constructor. If there was an error in opening the device, an exception is thrown.
To find out the capabilities of a device, you query the class' static GetDeviceCapabilities method, passing it the device ID of the device you are interested in. This method will return a structure filled with values representing the capabilities of the specified MIDI device.
Let's describe each device class in detail:

InputDevice class

The InputDevice class represents a MIDI device capable of receiving MIDI messages. It has an event for each of the major MIDI message it can receive. To receive MIDI messages, you connect to one or more of these events. Then you call the StartRecording method. Recording will continue until either StopRecording or Reset is called. The InputDevice lets you set the size of the sysex buffer it uses to receive sysex messages. When the InputDevice has received a complete sysex message, it raises the SysExReceived event.

OutputDeviceBase class

The OutputDeviceBase class is an abstract class that provides basic functionality for sending MIDI messages. It has several overloaded Send methods for sending various types of MIDI messages.

OutputDevice class

The OutputDevice class represents a MIDI device capable of sending MIDI messages. It inherits most of its functionality from the OutputDeviceBase class. It also provides running status functionality.
The following code creates an OutputDevice, builds and sends a note-on message, sleeps for one second, and then builds and sends a note-off message:

using(OutputDevice outDevice = new OutputDevice(0))
{
    ChannelMessageBuilder builder = new ChannelMessageBuilder();

    builder.Command = ChannelCommand.NoteOn;
    builder.MidiChannel = 0;
    builder.Data1 = 60;
    builder.Data2 = 127;
    builder.Build();

    outDevice.Send(builder.Result);

    Thread.Sleep(1000);

    builder.Command = ChannelCommand.NoteOff;
    builder.Data2 = 0;
    builder.Build();

    outDevice.Send(builder.Result);
}

OutputStream class

The OutputStream class is also derived from the OutputDeviceBase class. It encapsulates the Windows multimedia MIDI output stream API. It provides functionality for playing back MIDI messages.
To play MIDI messages, you call StartPlaying. The OutputStream will then begin playing back any MIDI messages in its queue. To place MIDI messages in the queue, you first write one or more MidiEvents using the Write method. After writing the desired number of MidiEvents, you call Flush. This flushes the events to the stream causing it to play them back.

Sequencer Class

The Sequencer class is back. It's a lightweight class for playing back Sequences. I felt the previous MidiFilePlayer class was not the best means for playing back MIDI sequences. I wanted to give the toolkit the ability to play Sequences your create programmatically. One issue that caused me to shy away from a Sequencer class (after having created one in earlier versions) is the problem of a Sequence changing as it's being played by a Sequencer. I still haven't solved that problem, but I didn't want that issue to prevent easy Sequence playback. So I'm putting in a new version of the Sequencer class with the understanding that it's meant to be used for simple playback. For something more sophisticated, you can use it as the basis for creating something more.

Dependencies

The MIDI toolkit depends on the DelegateQueue class from my Sanford.Threading namespace; the InputDevice and OutputDevice classes use it for queueing MIDI events. In turn, the Sanford.Threading namespace depends on my Sanford.Collection namespace, so that assembly is also necessary for the toolkit to compile. Finally, the toolkit uses the Sanford.Multimedia namespace. I've provided all of the assemblies with the download. I've linked the projects that use them to the assemblies in hopes that the toolkit will compile out of the box. Hopefully, the days of having trouble compiling my projects because of not having the right assemblies are over.

Tuesday, March 9, 2010

DBMS: Indexes By Abdul Malik

DBMS: Indexes By Abdul Malik

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random look ups and efficient access of ordered records. The disk space required to store the index is typically less than that required by the table (since indexes usually contain only the key-fields according to which the table is to be arranged, and excludes all the other details in the table), yielding the possibility to store indexes in memory for a table whose data is too large to store in memory.

In a relational database, an index is a copy of one part of a table. Some databases extend the power of indexing by allowing indexes to be created on functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper case versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indexes, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions.

Indexes may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing duplicate entries in the index and thus, the backing table. Microsoft SQL Server allows a unique index to contain a single NULL value.

Index Architecture

Index architectures can be classified clustered or unclustered.

Unclustered

An unclustered index is structured in a way that doesn't correspond to the order of the actual data records. It resembles the words index at the back of a book. For this reason, it will perform worse than clustered indexes on ranged searches where the result set is large, since each result could cost an additional I/O-operation to get the actual data record. One can have any number of these kinds of indexes—all that is required is the space to store the index itself; one does not copy the actual data to create a new index.

Clustered

Clustering alters the data block into a certain distinct order to match the index, hence it is also an operation on the data storage blocks as well as on the index. An address book ordered by first name resembles a clustered index in its structure and purpose. The exact operation of database systems vary, but because storing data is very redundant the row data can only be stored in one order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected.

Since the physical records are in this sort order on disk[AM1] , the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s).

Index implementations

Indexes can be implemented using a variety of data structures. Popular indexes include balanced trees, B+ trees and hashes.

In Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index. Each relation can have a single clustered index and many unclustered indexes.

Applications and limitations

Indexes are useful for many applications but come with some limitations. Consider the following SQL statement:

SELECT first_name FROM people WHERE last_name = 'Smith';.

To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the b-tree data structure [AM2] until the Smith entry has been found; this is much less computationally expensive than a full table scan.

Consider this SQL statement:

SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';.

This query would yield an email address for every customer whose email address ends with "@yahoo.com", but even if the email_address column has been indexed the database still must perform a full table scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying b-tree data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this:

SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@yahoo.com');. This puts the wild-card at the right-most part of the query (now moc.oohay@%) which the index on reverse(email_address) can satisfy.

Types

Bitmap index

A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (bitmaps) and answers most queries by performing bitwise logical operations on these bitmaps. The most commonly used index, such as B+trees, are most efficient if the values it indexes do not repeat or repeat a smaller number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the gender field in a customer database usually contains two distinct values: male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

Dense index

A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indexes with duplicate keys, the dense index points to the first record with that key.[4]

Sparse index

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indexes with duplicate keys, the sparse index points to the lowest search key in each block.

Reverse index

A reverse key index reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where new key values monotonically increase.

Covering Index

In most cases, an index is used to quickly locate the data record(s) from which the required data is read. In other words, the index is only used to locate data records in the table and not to return data.

A covering index is a special case where the index itself contains the required data field(s) and can return the data.

Consider the following table (other fields omitted):

ID	Name	Other Fields
12	Plug	...
13	Lamp	...
14	Fuse	...

To find the Name for ID 13, an index on (ID) will be useful, but the record must still be read to get the Name. However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record.

A covering index can dramatically speed up data retrieval but may itself be large due to the additional keys, which slows down data insertion & update. To reduce such index size, some systems allow non-key fields to be included in the index. Non-key fields are not themselves part of the index ordering but only included at the leaf level, allowing for a covering index with less overall index size.

[AM1]Usman – Plz comment on this statement.

[AM2]This should be discussed