Today’s subject is a little bit advanced, but I think that beginners should be able to follow if they read carefully.
Key/value approach in database design could come in handy when we need to store some arbitrary data about another table. For example, we have a users table that holds our user data. Everything is working fine, but some day our client decides that he wants to collect 2 telephone numbers, sex of the user, date of birth …
If we try to predefine all the potential wishes of the customer in our table, it would be awkward and our table would grow horizontally beyond reason.
Here comes key/value table to the rescue. Let us create a users table (as simple as it can be):
[code lang=”sql”]
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`pass` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM ;
[/code]
This table will hold nothing but a login data. Let us create a user_data table which will hold everything else:
[code lang=”sql”]
CREATE TABLE IF NOT EXISTS`user_details` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`key` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`value` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
[/code]
The fields key and value will actually store the user data. Because this table is going to grow vertically, potentially we can store an unlimited and completely arbitrary information about a user. Two different parts of your app can handle storing the user details completely independent of each other. This is the power of this approach.
This approach is highly flexible, adding a new attribute at a late design phase is very simple. We can even create a mechanism that adds attributes according to real time configuration.
Enough about good sides, let me talk about drawbacks of this design. One major drawback is that there is no database data types in our user_data table. For example, consider this statement:
[code lang=”sql”]
UPDATE user_data SET value = ‘What?’ WHERE key = ‘date_of_birth’
[/code]
We are setting a string value for a key that should hold user’s date of birth. Of course, this can be handled through data validation on the application side, so it is not such a drawback.
A bigger problem is if we use user_data table for filtering. Imagine that we have only one users table with all the fields and need to retrieve all users that live in Chicago. In classic database design this would be the query:
[code lang=”sql”]
SELECT username FROM users WHERE city = ‘Chicago’
[/code]
Simple. But in our new key/value database design this query would look like this:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
WHERE b.key = ‘city’
AND b.value = ‘Chicago’
[/code]
Pretty complex for such an easy task. Wait till you see this example. What if you need to filter data by 2 criteria. For example in classic design:
[code lang=”sql”]
SELECT username FROM users WHERE city = ‘Chicago’ AND state = ‘Illinois’
[/code]
In key/value design you can easily fall in the, so called, AND trap. At fist glance everybody will say: This is easy and type something like this:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
WHERE
(b.key = ‘city’ AND b.value = ‘Chicago’)
AND
(b.key = ‘state’ AND b.value = ‘Illinois’)
[/code]
But this query will ALWAYS return nothing. Zero. Nada. That is because there is never ONE row in user_data that has both criteria met. So, now this complex query becomes monster:
[code lang=”sql”]
SELECT username FROM users a
INNER JOIN user_data b
ON a.id = b.user_id
INNER JOIN user_data c
ON b.user_id = c.user_id
WHERE
(b.key = ‘city’ AND b.value = ‘Chicago’)
AND
(c.key = ‘state’ AND c.value = ‘Illinois’)
[/code]
Conclusion is that this approach is great but must be used with caution. I would mainly use it for storing some arbitrary data that will never be used for filtering.
Storing “key/value” tables is also called an Entity-Attribute-Value design. I agree one should be cautious and think twice before using EAV. Then think a third time, and as many times as necessary until you decide not to use it.
Read these debunkings of the EAV design:
http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/
http://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/
Since you can’t query the user_details efficiently, you might as well store a TEXT blob column in the users table, and store all the user details in XML or JSON or YAML or something.
Seems like a good place to store user settings from a preferences page.
Another important design disadvantage of EAV is that for real data stores, the “values” column can not be indexed meaningfully, can not be constrained and thus can not be a foreign key.
tri bui
Thanks for the comment.
You got a point there. Anyway, we can use EAV for some things, but for real optimizing we should use some of the NOSql solutinons.
why not use memcacheDB?
MemcacheDB is key-value ,and faster than mysql.
Yes, I know. Actually it is called Membase as MemcacheDB is not persistent. But the theme of this tutorial was using this in MySql.
In future tutorials I am planning to show all NOSql “databases” and how to use them.
Thanks for the comment
@Zvonko — memcached is not persistent, but memcachedb is.
Sorry, my mistake. Firm which produced memcached, published persistent key/value storage called Membase: http://www.membase.org/
I see that memcachedb also exist. And is persistent.
I’d rather store the key – value pairs in PHP array and in the database will have only one table users: with plus one more field which will be text (config_options) and will keep the serialized (use serialize() function) array from php in the database as string.
When you take it out from the database (without JOINs) use unserialize() and you will have your PHP array.
I think it’s much faster solution than joining queries from two tables which can get big and without proper indexing it will get slower.
Just to add that you can keep as many as you want config options pairs of key values and you can add unlimited fields to your key = value pair, for example language_id (to translate every key, value pair), and all of that in just one text field.
I am using LEFT JOINS too, but always trying to avoid it when i can
@Zoran Nice, but how would you search a user using one of the keys in array as condition?
you can make a simpler sql statement like this:
SELECT username FROM users a
where id in (select user_id from users_data where
(b.key = ‘city’ AND b.value = ‘Chicago’)
AND
(c.key = ‘state’ AND c.value = ‘Illinois’) )
Using subselects is a good option for performance and cleaner coding.
I’m using this in postgresql in bdigital.uncu.edu.ar
Add entity on EAV, Write datas on EAV and JSON, search information on EAV (thx horaciod) with SQL (sphinx or Lucene too), read datas with JSON.
Just one more table with all the datas encoded with JSON ?
or you could use prepared statements or mysql stored procedures. With some views it’s easy to do a where statement.
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
‘MAX(IF(metaKey = ”’,
metaKey,
”’, metaValue, NULL)) AS ‘,
metaKey
)
) INTO @sql
FROM appDataEntryValues;
SET @sql = CONCAT(‘SELECT entryID, ‘, @sql, ‘ FROM appDataEntryValues GROUP BY entryID’);
PREPARE stmt FROM @sql;
EXECUTE stmt;
this sql doesn’t use views it’s just a prepare statement. I haven’t yet figured out how to make it work with php.
the sql for the table i used is
CREATE TABLE `appDataEntryValues` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`metaKey` varchar(100) DEFAULT NULL,
`metaValue` varchar(150) NOT NULL,
`entryID` bigint(20) NOT NULL,
PRIMARY KEY (`id`)
)
Comments are closed.