Database Design Using Key-Value Tables

In the following article we are going to cover the use of key-value tables in a database design. They can help you with flexibility in design, but will trip you up in other areas (such as data filtering) if you are not careful. I assume that the audience for the following article is familiar with basic traditional relational database design and SQL language; nevertheless, someone that is not familiar with it should be able to comprehend it from the walkthrough example of this article.

Key-value tables – what is all about?

Consider the following example table:

 

CREATE TABLE EMPLOYEES (
ID          INT               NOT NULL,
NAME        VARCHAR (10)      NOT NULL,
SALARY      INT               NOT NULL);

ALTER TABLE EMPLOYEES ADD CONSTRAINT EMPLOYEES_PK PRIMARY KEY (ID);

INSERT INTO EMPLOYEES (ID, NAME, SALARY) VALUES (1, ‘Dany’,
4000);
INSERT INTO EMPLOYEES (ID, NAME, SALARY) VALUES (2, ‘Brit’,
5000);
INSERT INTO EMPLOYEES (ID, NAME, SALARY) VALUES (3, ‘Sharon’,
1000);

The above script produces the following table:

EMPLOYEES

ID

NAME

SALARY

1

Dany

4000

2

Brit

5000

3

Sharon

1000

This table holds employee description, specifically employee name and salary.

You might have noticed that there is an additional column called “ID.” We use this column as a primary key, i.e. a unique identifier of each row in the table, which can never be set with a NULL value.

In a general development process, after collecting the requirements and designing the model, the DBA would create this table, and the programmer would start coding with it.

In case the requirement changes, we will need to change the table design. For example, what happens if the customer adds a new requirement? We will look at the case of a customer wanting to include additional information which describes whether the employee has a company vehicle and the year that the employee started working with the company.

{mospagebreak title=Traditional database design vs. key-value table}

Consider the following two scenarios:

Scenario no. 1 – traditional database design method

The DBA will use the following SQL statements in order to create two additional columns:

ALTER TABLE EMPLOYEES ADD HAS_COMPANY_VEHICLE INT NULL;
ALTER TABLE EMPLOYEES ADD START_WORK_YEAR INT NULL;

In our example, we will update the data that already exists in the table with the following update sql statements:

UPDATE EMPLOYEES SET HAS_COMPANY_VEHICLE = 1, START_WORK_YEAR = 2004 WHERE ID = 1;
UPDATE EMPLOYEES SET HAS_COMPANY_VEHICLE = 0, START_WORK_YEAR = 2004 WHERE ID = 2;
UPDATE EMPLOYEES SET HAS_COMPANY_VEHICLE = 1, START_WORK_YEAR = 2005 WHERE ID = 3;

This will make our table look like this:

EMPLOYEES

ID

NAME

SALARY

HAS_COMPANY_VEHICLE

START_WORK_YEAR

1

Dany

4000

1

2004

2

Brit

5000

0

2004

3

Sharon

1000

1

2005

We might want to be able to make a new descriptive change without altering the database schema structure. In order to be able to do that, we can use key-value tables as described in the following scenario.

Scenario no. 2 – use of key-value table

The following scenario describes a generic approach to the challenge.

The DBA can create an additional table which will hold any additional information that describes the employee better. Such a table can be produced by the following sql statements:

CREATE TABLE EMPLOYEE_ATTRIBUTES (
EMPLOYEE_ID       INT               NOT NULL,
KEY1              VARCHAR (100)     NOT NULL,
VALUE1            VARCHAR (100)     NOT NULL);

ALTER TABLE EMPLOYEE_ATTRIBUTES ADD CONSTRAINT EMPLOYEE_ATTRIBUTES_PK PRIMARY KEY (EMPLOYEE_ID, KEY1);

Updating existing data in our example can be done using the following insert sql statements:

INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (1, ‘HAS_COMPANY_VEHICLE’, ‘1’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (1, ‘START_WORK_YEAR’, ‘2004’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (2, ‘HAS_COMPANY_VEHICLE’, ‘0’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (2, ‘START_WORK_YEAR’, ‘2004’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (3, ‘HAS_COMPANY_VEHICLE’, ‘1’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (3, ‘START_WORK_YEAR’, ‘2005’);

This would yield the following table:

EMPLOYEE_ATTRIBUTES

EMPLOYEE_ID

KEY1

VALUE1

1

HAS_COMPANY_VEHICLE

1

1

START_WORK_YEAR

2004

2

HAS_COMPANY_VEHICLE

0

2

START_WORK_YEAR

2004

3

HAS_COMPANY_VEHICLE

1

3

START_WORK_YEAR

2005

This information is stored in a more generic structure, allowing the programmer to add employee attributes on his own as well as create a mechanism that adds employee attributes at run-time.

The data type of the two new columns, KEY1 and VALUE1, should be VARCHAR so it can hold both character or numeric data. It is possible to limit the data type only to numeric data if we want to for some reason.

When a new description item for an employee will be required, the programmer would simply need to add this attribute by inserting a new record to EMPLOYEE_ATTRIBUTES table.

Consider the following insert statement as a new description item, describing the sex type of the employee named Dany.

INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (1, ‘SEX’, ‘MALE’);

For completion of the above example you can use the following SQL statements:

INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (2, ‘SEX’, ‘FEMALE’);
INSERT INTO EMPLOYEE_ATTRIBUTES (EMPLOYEE_ID, KEY1, VALUE1) VALUES (3, ‘SEX’, ‘FEMALE’);

The benefits of such a model are:

–     High flexibility – as we can see, adding a new attribute at a late design phase is very simple. We can even create a mechanism that adds attributes according to real time configuration.

–     We do not need to deal with data types when designing the schema.

–     Some database types has a record length limitation, e.g. MS SQL Server’s 2000 maximum record length is 8K. This means that, if we have five attributes that are each 2000 characters long, our only option is to add those attributes to an additional table, as we did with EMPLOYEE_ATTRIBUTES table.

Regarding the drawbacks, they are a little complex, so we are going to drill down into each one of them in the following paragraphs of this article.

{mospagebreak title=Drawback no. 1: No real use of database data types}

When designing a database table, in most cases the design includes exact data types. This is important because the use of correct data types can help enforce data integrity.

In our EMPLOYEES example, the VALUE1 column is able to contain any value that is less than or equal to 100 charters length.

Consider the following sql statement:

UPDATE EMPLOYEE_ATTRIBUTES SET VALUE1 = ‘AAA’ WHERE EMPLOYEE_ID = 1 AND KEY1 = ‘START_WORK_YEAR’;

This statement will execute without success and the result would be as follows:

EMPLOYEE_ATTRIBUTES

EMPLOYEE_ID

ATTRIBUTE_ID

KEY

VALUE

1

2

START_WORK_YEAR

AAA

Of course, having our data describing that employee no. 1 (Dany) has started working on date ‘AAA’ is indeed a data integrity failure.

In the case of scenario 1, the equivalent statement would be:

UPDATE EMPLOYEES SET START_WORK_YEAR = ‘AAA’;

This of course will raise a database error saying that the data provided for column START_WORK_YEAR has a data type mismatch.

Another data type issue would arise when we will need to cast the string VALUE1 to a numeric data type in order to perform some mathematic computations with it.

These problems can be worked around by defining a column of NUMERIC data type for numeric values and another column of data type VARCHAR for alphanumeric values, but it will never be as good as having the exact data type that matches our data.

{mospagebreak title=Drawback no. 2: Awkward use of database constraints}

Database constraints help us keep our data correct.

In the above (EMPLOYEES) example, assuming that our company was founded in 2004, let’s figure that we want to keep our data integrity in such a way that the employee start working date will always be between 2004 and 2005.

All we have to do in scenario no. 1 is add a check constraint that will enforce this rule. Check constraints place a value restriction on the contents of a column, allowing the table owner to define which exact values are permitted for this specific column. Such a constraint creation would look like this:

ALTER TABLE EMPLOYEES ADD CONSTRAINT START_WORK_YEAR_CK CHECK (START_WORK_YEAR >= 2004 AND START_WORK_YEAR <= 2005);

By defining such a constraint, a DML attempting to update a value of the column START_WORK_YEAR with values above 2005 or below 2004 would fail, with an error indicating that the update failed due to our check constraint (START_WORK_YEAR_CK).

In scenario 2, there is also an option to add such a database constraint. Consider the following constraint added:

 

ALTER TABLE EMPLOYEE_ATTRIBUTES ADD CONSTRAINT START_WORK_YEAR_CK
CHECK (     1 = CASE KEY1
WHEN ‘START_WORK_YEAR’ THEN
CASE VALUE1
WHEN ‘2004’ THEN 1
WHEN ‘2005’ THEN 1
ELSE 0
END
ELSE 1
END
);

As you can see, the above constraint use is possible but cumbersome, and most important, it will force us to add a constraint for each new ATTRIBUTE, which means that the model is no longer generic.

{mospagebreak title=Drawback no. 3: Problematic use of AND operator when relating to more than one attribute in the query criteria}

This is the most problematic issue that needs to be taken into consideration. Since it is complex, let’s start with an example. Consider the following report (report 1):

Fetch all employee names and their start working date, that had started working in 2004. The result of course should look like the following table:

NAME

START_WORK_YEAR

Dany

2004

Brit

2004

In scenario 1, the following query will provide the required results:

SELECT NAME, START_WORK_YEAR
FROM EMPLOYEES
WHERE START_WORK_YEAR = 2004;

In scenario 2, the following query will provide the required results:

SELECT E.NAME, EA.VALUE1 AS START_WORK_YEAR
FROM EMPLOYEES E INNER JOIN EMPLOYEE_ATTRIBUTES EA
    ON (E.ID = EA.EMPLOYEE_ID)
WHERE KEY1 = ‘START_WORK_YEAR’ AND VALUE1 = ‘2004’;

But in case we want to check more than one attribute in the criteria, and the relation between the two attributes would need to be defined by an AND operator, we need to be careful not to fall into the following trap. Consider the following report (report 2):

Fetch all employee names, and their start working date that are males AND had started working in 2004. The result of course should look like the following table:

In scenario 1, the following query will provide the required results:

SELECT NAME, START_WORK_YEAR
FROM EMPLOYEES
WHERE START_WORK_YEAR = 2004 AND SEX = ‘MALE’;

NAME

START_WORK_YEAR

Dany

2004

In scenario 2, many developers fall into the trap and introduce the following query as the report implementation:

SELECT E.NAME, EA.VALUE1 AS START_WORK_YEAR
FROM EMPLOYEES E INNER JOIN EMPLOYEE_ATTRIBUTES EA 
    ON (E.ID = EA.EMPLOYEE_ID)
WHERE (EA.KEY1 = ‘START_WORK_YEAR’ AND EA.VALUE1 = ‘2004’) 
    AND (EA.KEY1 = ‘SEX’ AND EA.VALUE1 = ‘MALE’);

The above query will provide the following results:

NAME

START_WORK_YEAR

We get an empty result-set!

The reason for it is that no record in the Cartesian product of EMPLOYEES and EMPLOYEE_ATTRIBUTES fulfill both conditions of

(EA.KEY1 = ‘START_WORK_YEAR’ AND EA.VALUE1 = ‘2004’)
AND (EA.KEY1 = ‘SEX’ AND EA.VALUE1 = ‘MALE’)

In order to better understand this issue, let’s look at what the database engine is doing:

First the engine creates the following Cartesian product of EMPLOYEES and EMPLOYEE_ATTRIBUTES:

E.ID

E.NAME

E.SALARY

EA.EMPL-
OYEE_ID

EA.KEY

EA.VALUE

1

Dany

4000

1

HAS_COMPANY_VEHICLE

1

1

Dany

4000

1

START_WORK_YEAR

2004

1

Dany

4000

1

SEX

MALE

2

Brit

5000

2

HAS_COMPANY_VEHICLE

0

2

Brit

5000

2

START_WORK_YEAR

2004

2

Brit

5000

2

SEX

FEMALE

3

Sharon

1000

3

HAS_COMPANY_VEHICLE

1

3

Sharon

1000

3

START_WORK_YEAR

2005

3

Sharon

1000

3

SEX

FEMALE

Then the database engine removes records which do not comply with the where criteria. As we can see, there is no record which complies with both:

(EA.KEY1 = ‘START_WORK_YEAR’ AND EA.VALUE1 = ‘2004’)
AND (EA.KEY1 = ‘SEX’ AND EA.VALUE1 = ‘MALE’)

The above pitfall can be workaround using the following query:

SELECT E.NAME, EA1.VALUE1 AS START_WORK_YEAR
FROM EMPLOYEES E INNER JOIN EMPLOYEE_ATTRIBUTES EA1 
    ON (E.ID = EA1.EMPLOYEE_ID) 
    INNER JOIN EMPLOYEE_ATTRIBUTES EA2 
        ON (EA1.EMPLOYEE_ID = EA2.EMPLOYEE_ID)
WHERE (EA1.KEY1 = ‘START_WORK_YEAR’ AND EA1.VALUE1 = ‘2004’) 
    AND (EA2.KEY1 = ‘SEX’ AND EA2.VALUE1 = ‘MALE’);

In this case the database engine will create the following Cartesian product and select the marked record using the criteria clause:

E.
ID

E.
NAME

E.
SALARY

EA1/2.
EMPL-
OYEE_ID

EA1.
KEY

EA1.
VALUE

EA2.
KEY

EA2.
VALUE

1

Dany

4000

1

SEX

MALE

SEX

MALE

1

Dany

4000

1

SEX

MALE

START_WORK_YEAR

2004

1

Dany

4000

1

START_WORK_YEAR

2004

SEX

MALE

1

Dany

4000

1

START_WORK_YEAR

2004

START_WORK_YEAR

2004

2

Brit

5000

2

SEX

FEMALE

SEX

FEMALE

2

Brit

5000

2

SEX

FEMALE

START_WORK_YEAR

2004

2

Brit

5000

2

START_WORK_YEAR

2004

SEX

FEMALE

2

Brit

5000

2

START_WORK_YEAR

2004

START_WORK_YEAR

2004

3

Sharon

1000

3

SEX

FEMALE

SEX

FEMALE

3

Sharon

1000

3

SEX

FEMALE

START_WORK_YEAR

2005

3

Sharon

1000

3

START_WORK_YEAR

2005

SEX

FEMALE

3

Sharon

1000

3

START_WORK_YEAR

2005

START_WORK_YEAR

2005

Conclusion

We can use key-value tables in a database design, but we should keep in mind that the same flexibility which has been enabling us to add attributes in a quick and easy way can also cause data integrity errors.

Furthermore, using key-value tables is recommended mostly with attributes which are required for long descriptions, rather than attributes which are used for data filtering. 

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye