Optimizer

The optimizer chooses the proper index to access the data in tables.

The tables in a FROM clause are processed in the order they appear.

How to write an efficient query on one table

A table is very much like a telephone book (white pages).

When you create a table, the rows you insert into the base table file will always be physically ordered by the CLUSTERED INDEX (aka native index) order.

A database table can have many supplementary indexes to speed up data retrieval.

For example, the customer table from the mytest database is declared as follows:

CREATE TABLE customer (
    custid      INT         NOT NULL IDENTITY(1000, 1) PRIMARY KEY,
    firstname   VARCHAR(40) NULL,
    lastname    VARCHAR(40) NOT NULL,
    birthdate   DATE        NULL,
    gender      VARCHAR(1)  NULL,
    street      VARCHAR(20) NULL,
    city        VARCHAR(20) NULL,
    country_id  INT         NULL
)

CREATE INDEX idx_name ON customer(lastname, firstname)

CREATE INDEX idx_name_birthdate ON customer(lastname, birthdate) -- useful if many searches are performed on name and birth date

Three indexes exist on this table:

If you specify the leading columns of an index in the WHERE clause of a SELECT command, the optimizer may choose this index to access the rows.

All possibilities are listed below:

SELECT * from customer                             -- will use the primary key index
WHERE custid = 1234

SELECT * from customer                             -- will use index `idx_name`
WHERE lastname = 'DOE' AND firstname = 'John'

SELECT * from customer                             -- will use index `idx_name`
WHERE firstname = 'John' AND lastname = 'DOE'      --   the order of appearance of the columns is not important

SELECT * from customer                             -- will use index `idx_name`
WHERE 'John' = firstname AND 'DOE' = lastname      --   the order of arguments is not important

SELECT * from customer                             -- will use index `idx_name_birthdate`
WHERE lastname = 'DOE' AND birthdate = '19501001'

SELECT * from customer                             -- will use either index `idx_name` or index `idx_name_birthdate`
WHERE lastname = 'DOE'                             --   because both indexes contain the leading column [lastname]

For an index to be used, the WHERE clause must contain the leading columns of the index.

For example, if we create the index below:

CREATE INDEX idx_abc ON customer(lastname, firstname, birthdate, gender)

All the following queries can use the index idx_abc:

SELECT * from customer
WHERE lastname = ...

SELECT * from customer
WHERE lastname = ... AND firstname = ...

SELECT * from customer
WHERE lastname = ... AND firstname = ... AND birthdate = ...

SELECT * from customer
WHERE lastname = ... AND firstname = ... AND birthdate = ... AND gender = ...

If the WHERE clause doesn’t contain the leading columns of the index, it will not be used.

Some examples of queries that won't use idx_abc index:

SELECT * from customer
WHERE firstname = ...

SELECT * from customer
WHERE birthdate = ...

SELECT * from customer
WHERE firstname = ... AND birthdate = ... AND gender = ...

The operator OR is an optimizer killer

The comparison operators must be separated by AND operator, not OR. Else, index will not be used.

SELECT * from customer
WHERE custid = 1234 OR custid = 3000  -- NO INDEX WILL BE USED


It is better to do:

SELECT * from customer  -- will use the primary key index
WHERE custid = 1234

UNION ALL

SELECT * from customer  -- will use the primary key index
WHERE custid = 3000

Range Lookup

Range lookup are like equality lookup, except that >, >=, <, >=, BETWEEN operators are used in the WHERE clause.

LIKE also can use index if the argument is a literal string (not an expression nor a variable) of the form LIKE 'prefix%', where prefix contains no placeholder character (%, _, [, ]).

SELECT * from customer                     -- will use index `idx_name` or `idx_name_birthdate`
WHERE lastname >= 'k'

SELECT * from customer                     -- will use index `idx_name` or `idx_name_birthdate`
WHERE lastname >= 'k' AND lastname < 'p'

SELECT * from customer                     -- will use index `idx_name` or `idx_name_birthdate`
WHERE lastname < 'k'

SELECT * from customer                     -- will use index `idx_name` or `idx_name_birthdate`
WHERE lastname LIKE 'SMI%'

SELECT * from customer                     -- will use the primary key index
WHERE custid BETWEEN 1000 AND 1020

Sometimes, if the range is large and the optimizer chooses an index, a table scan may be faster.

Table Scan

If no suitable index is found, a table scan is performed, which reads all record in the base table, from the first to the last.

How to write an efficient query on many tables

You start by writing an efficient query on one table.

Then, you join another table and specify in the JOIN ... ON or WHERE clause the leading columns (or all columns) of any existing index, exactly like you do when you optimize a query on a single table.

Run the query and measure the execution time. If it is not good, try to understand why an index has not been used.

Repeat this for each table you add.

Example with the mytest database.

SELECT *                 -- query on the first table
FROM items i

SELECT *                 -- add orders table
FROM items i, orders o
WHERE o.oid = i.oid      -- for each item, the primary key index of 'orders' will be used to lookup the order row

SELECT *                 -- add customer table
FROM items i, orders o, customer c
WHERE o.oid = i.oid AND c.custid = o.custid  -- primary key of 'custid' will be used

SELECT cy.cid, cy.name, sum(i.price)
FROM items i, orders o, customer c, country cy
WHERE o.oid = i.oid AND c.custid = o.custid AND cy.cid = c.country_id
GROUP BY cy.cid, cy.name

Test database

During development, you should always work on a test database, with tables containing realistic number of rows. You can fill them with dummy data, like the mytest sample database.

You will test your queries on your database, so that you can measure the execution time.

Education

For a database developper, understanding the internal working of indexes is the most important skill.

Read the CREATE TABLE page, in particular the Physical File Content part.

It describes the physical layout on disk of indexes.

And search the web, there is plenty of information on indexes.

RSQL, a simple alternative to Microsoft SQL Server