Share

WordPress

Database Indexing: How to Create an Index in MySQL

Asghar Mirzaie

March 05, 2024

Database Indexing: How to Create an Index in MySQL

Share

Welcome to the world behind the scenes of databases, where optimizing data retrieval time is the name of the game. Whether you’re a complete beginner or have some intermediate understanding, this guide will take you on a journey through the intricate world of database indexing. We’ll break down complex concepts into easy-to-understand principles and provide practical examples to help you grasp the fundamentals. So, buckle up, and let’s dive into database indexing!

How Does Database Indexing Works

Imagine we have a table of users that includes their names and salaries:

--------------------------------------
| id | name  | salary |
--------------------------------------
| 1  | Alice | 50000  |
| 2  | Bob   | 60000  |
| 3  | Sam   | 55000  |
| 4  | Eve   | 70000  |
| 5  | Carol | 45000  |
| 6  | Dave  | 58000  |
--------------------------------------

Querying without an index

Suppose we want to execute the query SELECT * FROM users WHERE name='Sam' and retrieve all records for the user Sam. In normal cases where we do not have indexing, the database system would need to scan through each row in the table sequentially to find all occurrences of “Sam.” This process is called a full table scan. For our sample table with 6 records, this may not seem like a big deal, but imagine what would happen if we had millions of records. The time it takes to scan through each row would increase significantly, resulting in slower query performance.

Plus, just like the SELECT statements, UPDATE and DELETE queries also benefit from index optimization techniques.

Querying with an index: B-Tree DBMS

An index is an additional data structure that is added to the database to speed up queries. When you add an index to a database column, the indexed structure’s data is typically stored separately from the actual data in the database. In MySQL indexes are stored in a separate file on disk as a B-tree data structure, but they are managed and accessed by the DBMS (Database Management System). The index file is kept in sync with the data in the table. When you insert, update, or delete rows in the table, the DBMS automatically updates the index to reflect the changes.

With the index on the ‘name’ column in place, querying for records based on names becomes much more efficient. Essentially, an index acts like a map or guide for the database system, allowing it to quickly locate specific rows based on the values stored in the indexed column. So, when we say ‘index on the name column,’ it means that the database has created a special data structure (like a B-tree) to organize and store the names in a way that facilitates rapid retrieval. This indexing process significantly speeds up queries that involve searching, sorting, or filtering by name. Let’s visualize how the column “name” looks in each step of the B-tree DBMS process:

Initial index:

--------------------------------------
| id | name  | salary |
--------------------------------------
| 1  | Alice | 50000  |
| 2  | Bob   | 60000  |
| 5  | Carol | 45000  |
| 6  | Dave  | 58000  |
| 4  | Eve   | 70000  |
| 3  | Sam   | 55000  |
--------------------------------------

B-tree (balanced tree) is a way of organizing data in a special tree structure., which makes searching for specific information in the database really quick and efficient, whether you’re looking for exact matches or ranges of values.

Step 1

The database system looks at the sorted index’s middle value, “Dave.” Since “Dave” comes before “Sam” alphabetically, the system knows that all names after “Dave” in the index are also greater than “Sam.” Thus, it eliminates the second half of the index.

--------------------------------------
| id | name  | salary |
--------------------------------------
| 1  | Alice | 50000  |
| 2  | Bob   | 60000  |
| 5  | Carol | 45000  |
--------------------------------------

Step 2

The system focuses on the first half of the index and repeats the process. It looks at the middle value, which is “Bob.” Since “Bob” comes before “Sam,” it eliminates the portion of the index before “Bob.”

--------------------------------------
| id | name  | salary |
--------------------------------------
| 5  | Carol | 45000  |
--------------------------------------

Step 3

The system focuses on the remaining portion of the index and repeats the process until it finds the occurrence of “Sam.” This process is called a “B-tree” search and it significantly reduces the number of comparisons needed compared to scanning through all records sequentially. It results in faster query performance, especially when dealing with large datasets, as the database system can leverage the sorted index to narrow down the search space and retrieve the desired records more efficiently.

Indexing effect in speed of different types of queries

It’s important to consider that while indexes can significantly improve the speed of SELECT, UPDATE, and DELETE queries, they can also have an impact on insert speed. This is because every time a new record is inserted into a table with indexes, the indexes need to be updated accordingly, which can potentially slow down the insertion process. Therefore, it’s crucial to strike a balance between the benefits of indexing for query performance and the potential trade-off in insert speed, especially for databases that experience a higher rate of inserts.

Adding An Index To A MySQL Table

The MySQL CREATE INDEX statement is used to create an index on one or more columns of a table. The basic syntax for creating an index is as follows:

CREATE INDEX index_name 
ON table_name(column_list);

This statement requires specifying the name of the index, the table to which the index belongs, and the list of columns included in the index.

Let’s illustrate this with an example. We’ll create a table named users:

CREATE TABLE users (
    id INT NOT NULL,
    name VARCHAR(255) NOT NULL,
    salary INT NOT NULL
);

CREATE INDEX idx_users_salary 
ON users(salary);

In this example, we’ve created an index named idx_users_salary on the salary column of the users table. Now, let’s insert some data into the users table:

INSERT INTO users (id, name, salary)
VALUES (1, 'Alice', 50000),
       (2, 'Bob', 60000),
       (5, 'Carol', 45000),
       (6, 'Dave', 58000),
       (4, 'Eve', 70000),
       (3, 'Sam', 55000);

Now, if we query data based on a specific salary, MySQL will utilize the index to locate the data efficiently. To check if MySQL uses the index for a query, you can put the EXPLAIN statement before the SELECT statement:

EXPLAIN
SELECT
    id,
    name,
    salary
FROM
    users
WHERE
    salary = 60000;

Adding Multicolumn Indexes To A MySQL Table

MySQL also supports multicolumn indexes. When creating an index with multiple columns, the order of columns is crucial as MySQL sorts the data first by the value of the leftmost column first, followed by subsequent columns specified in the CREATE INDEX statement.

For instance, let’s create a multicolumn index on the name and salary columns of the users table:

CREATE INDEX idx_users_name_salary 
ON users(name, salary);

MySQL will utilize this multicolumn index if the query contains conditions that match the column order defined in the index.

Queries where multicolumn indexed can be used

Filtering data based on the name column (leftmost column) alone:

SELECT *
FROM users
WHERE name = 'Sam';

To retrieve all indexes associated with a table, you can use either the SHOW INDEX command or query the information_schema.statistics table.

Filtering data based on both name and salary columns together:

SELECT *
FROM users
WHERE name = 'Sam' AND salary > 3000;

Queries where multicolumn indexed can be used

Filtering data based solely on the salary (which is not the leftmost column) column:

SELECT *
FROM users
WHERE salary > 3000;

Filtering based on a combination of name or salary simultaneously**:**

SELECT *
FROM users
WHERE name = 'Sam' **OR** salary > 3000;

Retrieve all indexes associated with a table

you can use either the SHOW INDEX command or query the information_schema.statistics table.

Using SHOW INDEX command:

SHOW INDEX FROM users;

This command will display information about all indexes associated with the users table, including the index name, column names, and other details such as index size (indexes increase the size of database due to the additional storage).

Using information_schema.statistics table:

SELECT
    INDEX_NAME
FROM
    information_schema.statistics
WHERE
    TABLE_NAME = 'users';

This query provides a similar result to the SHOW INDEX command, listing all indexes associated with the specified table.

Remove an index from a database

To remove an index from a database, you can use the DROP INDEX statement:

DROP INDEX [IF EXISTS] index_name;

For example, to remove the idx_users_salary index, you would use:

DROP INDEX idx_users_salary;

More tips on multicolumn indexes

Tip 1: The importance of index ordering in multicolumn indexes

Suppose we have a scenario where the name column is frequently used in queries for filtering data. In this case, we should place the name column first in the multicolumn index to cater to these queries.

-- Create a multicolumn index with name as the first column
CREATE INDEX idx_users_name_salary ON users(name, salary);

By placing name as the first column in the index, queries filtered by name will benefit from the index’s efficiency.

Tip 2: Handling range conditions in multicolumn indexes

Consider a scenario where we frequently perform range queries on the salary column. We should place columns with equality checks before columns with range conditions in the multicolumn index to optimize these queries.

-- Create a multicolumn index with salary as the first column
CREATE INDEX idx_users_salary_name ON users(salary, name);

Tip 3: Overlapping indexes in multicolumn indexes

Suppose we have an overlapping index on (name, salary) and another single-column index on name. Since the multicolumn index covers queries that the single-column index can handle, keeping both indexes would introduce unnecessary overhead during table modifications.

-- Remove the redundant single-column index on name
DROP INDEX idx_users_name;

By removing redundant indexes, we streamline index maintenance and improve overall database performance.

Indexing Principles To Supercharge Your Queries

Follow these principles when indexing to make your queries even more efficient.

1. Fast lookup regardless of database size

When you search for data in a database without an index, it’s like scanning through every page of every book in the library to find what you need – it’s slow and inefficient. However, with indexing, the database can quickly narrow down the search to a much smaller subset of data that matches the criteria specified in the query. This makes your queries lightning-fast, especially when dealing with large datasets.

Some developers might worry that creating indexes will increase the size of the database and slow down performance. However, in most cases, the opposite is true. While indexes do add some overhead in terms of storage space, they significantly improve query performance by enabling fast lookups. Additionally, modern databases are optimized to manage indexes efficiently, so developers can focus on improving query performance without worrying too much about the index size.

2. Scan in one direction

Consider a scenario where we want to efficiently retrieve records from a users table based on age, utilizing an index on the age column. Our objective is to execute the query: SELECT * FROM users WHERE age >= 30 ORDER BY age ASC LIMIT 2.

Users table:

--------------------------------------
| id | name  | age |
--------------------------------------
| 1  | Alice | 21  |
| 2  | Bob   | 42  |
| 3  | Sam   | 38  |
| 4  | Eve   | 18  |
| 5  | Carol | 23  |
| 6  | Dave  | 36  |
--------------------------------------

Initial index:

--------------------------------------
| id | name  | age |
--------------------------------------
| 4  | Eve   | 18  |
| 1  | Alice | 21  |
| 5  | Carol | 23  |
| 6  | Dave  | 36  |
| 3  | Sam   | 38  |
| 2  | Bob   | 42  |
--------------------------------------

When an index finds an offset in the query, this is how it proceeds :

Locating the Starting Point: Locates the first individual aged 30 or above. In our case, Dave (age 36) emerges as the initial person in our sorted index who fulfills this criterion.
Scanning Forward: With Dave identified, we proceed to scan forward in the index. The sorted nature of the index enables efficient retrieval of individuals with increasing ages until we reach our prescribed limit of two entries. Continuing our scan, we encounter the next eligible person, Sam (age 38).

Considerations For Advanced Index Optimization

In terms of an SQL execution order, operations proceed in a specific sequence: filtering, joining, grouping, and sorting. For indexes to be effectively utilized, they must accommodate this sequence, prioritizing columns used for filtering before those used for sorting in queries.

Effective index creation also requires consideration of complex queries, which involve multiple operations beyond simple WHERE conditions. In practical terms, this means designing indexes based on the specific requirements of each query and creating them strategically rather than adding them indiscriminately to every column.

Inequality (!=)

Indexes are highly efficient in locating specific values but encounter difficulties with “not-equals” operations like WHERE role != 'ceo' clauses. This results in the database having to scan the entire index, nullifying the benefits of its sorted structure.

Adding another column to the index can optimize queries involving inequality conditions. For instance, an index on name and role columns can efficiently handle queries filtering users for a specific role, reducing the need for full index scans.

Pattern Matching (LIKE)

Pattern matching conditions, often used with the LIKE operator, allow searching for strings that match a specified pattern. For example, name LIKE ‘Caro%’ searches for name values that start with “Caro.”

However, in databases, pattern-matching conditions like LIKE are internally translated into range conditions to facilitate efficient querying. This means the database engine interprets the pattern-matching operation as a range of values that align with the specified pattern.

It’s crucial to note that the efficacy of indexing comes into play when wildcards are positioned either in the middle or at the end of the search term. In such scenarios, creating an index on that column (here name column) proves to be advantageous.

ORDER BY

The ORDER BY clause is used to sort the result set of a query based on specified columns. When executing a query with an ORDER BY clause, the database engine sorts the results according to the specified columns, which can be resource-intensive, particularly for large result sets. However, if the columns used in the ORDER BY clause are part of an index, the database can avoid additional sorting steps during query execution, leading to improved query performance.

Aggregations (DISTINCT and GROUP BY)

Queries involving GROUP BY and DISTINCT operations are often used to aggregate data in databases. When optimizing these queries for efficient performance, it’s important to consider proper index creation. Including all non-aggregated columns in the GROUP BY clause in the index ensures optimal aggregation performance. This approach helps the database engine to efficiently group and aggregate the data without requiring unnecessary sorting steps.

Table Joins

When performing joins between database tables, the presence of appropriate indexes is paramount for query optimization. Without the necessary indexes, join operations can lead to significant performance degradation, resulting in slow query execution times.

Consider the following scenario where we have two tables: employees and departments. We aim to retrieve employee names along with their corresponding department names using a JOIN operation.

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    department_id INT NOT NULL
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(255) NOT NULL
);

-- Create indexes on the `employee_id` and `department_id` columns
CREATE INDEX idx_employee_id ON employees(employee_id);
CREATE INDEX idx_department_id ON departments(department_id);

-- Query to retrieve employee names and their department names using a JOIN operation
SELECT e.name AS employee_name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

In the above example, we’ve created indexes on the employee_id and department_id columns of the respective tables. These indexes significantly improve the efficiency of the join operation by facilitating quick lookup of matching records based on the join condition. Without indexes, the database engine would resort to full table scans.

It’s essential to align the creation of indexes with the join conditions specified in the query. In our example, we created indexes on the columns involved in the join condition (employee_id and department_id), ensuring optimal performance for the join operation. Additionally, understanding the execution order of joins is crucial for optimizing query performance. In multi-table join scenarios, the database engine processes tables from left to right and utilizes indexes on join fields for efficient data retrieval.

Conclusion

We discovered that indexing reduces query time by employing efficient data structures during searches. This article aimed to provide an overview of database indexes, covering aspects such as:

How database indexing works
How to add single or multicolumn indexes
Indexing principles that supercharge queries
Considerations for advanced index optimization

However, while we strived to offer a conceptual understanding of indexes to facilitate improved database operation and the creation of more effective indexes, there is still much more to explore. We will delve deeper into databases in the future, so check our blog again soon or join us on LinkedIn to not miss a thing.

WordPress