Today’s subject is a little bit advanced, but I think that beginners should be able to follow if they read carefully.
Key/value approach in database design could come in handy when we need to store some arbitrary data about another table. For example, we have a users table that holds our user data. Everything is working fine, but some day our client decides that he wants to collect 2 telephone numbers, sex of the user, date of birth …
If we try to predefine all the potential wishes of the customer in our table, it would be awkward and our table would grow horizontally beyond reason.
Here comes key/value table to the rescue. Let us create a users table (as simple as it can be):
[code lang=”sql”]
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`pass` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM ;
[/code]
This table will hold nothing but a login data. Let us create a user_data table which will hold everything else:
[code lang=”sql”]
CREATE TABLE IF NOT EXISTS`user_details` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`key` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`value` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
[/code]
The fields key and value will actually store the user data. Because this table is going to grow vertically, potentially we can store an unlimited and completely arbitrary information about a user. Two different parts of your app can handle storing the user details completely independent of each other. This is the power of this approach.
This approach is highly flexible, adding a new attribute at a late design phase is very simple. We can even create a mechanism that adds attributes according to real time configuration.
Enough about good sides, let me talk about drawbacks of this design. One major drawback is that there is no database data types in our user_data table. For example, consider this statement:
[code lang=”sql”]
UPDATE user_data SET value = ‘What?’ WHERE key = ‘date_of_birth’
[/code]
We are setting a string value for a key that should hold user’s date of birth. Of course, this can be handled through data validation on the application side, so it is not such a drawback.
A bigger problem is if we use user_data table for filtering. Imagine that we have only one users table with all the fields and need to retrieve all users that live in Chicago. In classic database design this would be the query:
[code lang=”sql”]
SELECT username FROM users WHERE city = ‘Chicago’
[/code]
Simple. But in our new key/value database design this query would look like this:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
WHERE b.key = ‘city’
AND b.value = ‘Chicago’
[/code]
Pretty complex for such an easy task. Wait till you see this example. What if you need to filter data by 2 criteria. For example in classic design:
[code lang=”sql”]
SELECT username FROM users WHERE city = ‘Chicago’ AND state = ‘Illinois’
[/code]
In key/value design you can easily fall in the, so called, AND trap. At fist glance everybody will say: This is easy and type something like this:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
WHERE
(b.key = ‘city’ AND b.value = ‘Chicago’)
AND
(b.key = ‘state’ AND b.value = ‘Illinois’)
[/code]
But this query will ALWAYS return nothing. Zero. Nada. That is because there is never ONE row in user_data that has both criteria met. So, now this complex query becomes monster:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
INNER JOIN user_data c
ON b.user_id = c.user_id
WHERE
(b.key = ‘city’ AND b.value = ‘Chicago’)
AND
(c.key = ‘state’ AND c.value = ‘Illinois’)
[/code]
Conclusion is that this approach is great but must be used with caution. I would mainly use it for storing some arbitrary data that will never be used for filtering.