This is an edited version of an argument I wrote in hopes of convincing some DBA’s to start adopting surrogate keys. This was for an Oracle shop, hence the heavy use of of Sequence speak, but the arguments are pretty much the same for any DB. We’re also using NHibernate, so that tool is also talked about here, however, other popular ORM frameworks will benefit from the arguments as well.
There’s long been a debate amongst practitioners as to what is better – a natural key, which is often a composite, or a surrogate key. Application and Database Developers tend to favor surrogate keys for their simplicity and ease to work with while DBA’s often favor natural keys for the same reasons. There are many arguments from both sides of the debate, each having validity. As with any decision, the “right” choice depends on the level of risk, cost and ROI.
A surrogate key, or artificial key, is a database field that acts as the primary key for a table but has no real meaning in the problem domain. Surrogate keys are typically in the form of an auto-incrementing integer (Sequence/Identity) or a UUID/GUID.
- The primary key values never change, so foreign key values are stable.
- All of the primary key and foreign key indexes are compact, which is very good for join performance.
- The SQL code expression of a relationship between any two tables is simple and very consistent.
- Data retrieval code can be written faster and with fewer bugs due to the consistency and simplicity provided by a surrogate key.
- With surrogate keys there is only one column from each table involved in most joins, and that column is obvious.
- Object Relational Mapper (ORM) frameworks, such as NHibernate, SubSonic, LLBLGEN and others are designed to work optimally with surrogate keys, offering much simpler implementations over composite keys.
- Allows for a higher degree of normalization since key data doesn’t need to be repeated.
- In order to guarantee data integrity, a unique index must be created for the fields that would have made up the composite key. This can increase administrative overhead.
- If a sequence/identity is used for the surrogate key, it must be created, which can increase administrative overhead.
- In Oracle, Sequences can have slight performance penalties, typically realized only under very heavy load, when a proper caching strategy is not utilized.
- Tables can be perceived by some as “cluttered” when an extra column of meaningless data is added.
- The primary key of the table has no real business meaning.
- The natural key values are often included in the WHERE clause, although not part of the join semantics.
- Extra disk space is needed to store the key values, although a sequence will account for only 8 bytes.
A natural key is the true unique indicators of a database record. It is this value, or composite values, that have business meaning and allow applications to distinguish one row from another. Unlike the surrogate key, the natural key can be one or more columns of any type. Examples include [social security number], [last name, date of birth, phone number]
- Natural keys have real business meaning and identify unique records by nature.
- When there is no surrogate key, there is no need to create unique indexes or sequences, thereby reducing administrative overhead.
- Fewer sequences and database objects give DBA’s less to worry about.
- Reduced performance concerns that could result from mismanaged sequences.
- Reduced disk space usage.
- Querying with joins can become more complicated as multiple columns are involved.
- The use of date fields in keys often requires error prone casting to write queries.
- The keys can change which can cause a ripple effect of breaking queries and require participating tables to be updated.
- Reduced form of normalization since key values will be duplicated throughout tables.
- Keys names and types are inconsistent which may require developers to visually inspect table definitions to understand how to query.
- Makes application development that interacts with a database more complex and time consuming due to the semantics of the keys and how they join to other tables.
- Makes using ORM frameworks very difficult and time consuming because they are designed to work best with surrogate keys.
Common Arguments against Surrogate Keys
- Using a sequence in Oracle will decrease performance.
It is true that using a sequence for a surrogate key comes with some overhead. This overhead is very minimal and apparent only under heavy load. Using a sequence caching strategy, however, can greatly improve performance issues associated with sequence generation.
- Using a surrogate key means more data has to be stored which will require more disk space. Disk space is cheap these days, but the cost of a software developer is not. Hardware costs will continue to decrease over time while the cost of developer staff will continue to rise.
- The record already has a meaningful, natural key. The key will be maintained in the form of a unique index, which provides the benefits of the natural key with the benefits of a surrogate key.
- Software development tools, such as ORMs, shouldn’t dictate database architecture. Although using surrogate keys opens many doors for the use of an ORM, there is also significant benefits beyond (see the list of Pros). A database is a place to store data used by applications. Without applications, the database offers little value. Because the database exists solely to support applications, it stands to reason that there is great benefit in optimizing the database to work with the applications that offer the real value to the business.
Object Relational Mapping
Why use an ORM
Using an ORM, such as NHibernate, provides significant value to applications and application developers. It removes the need to write tedious and time consuming database access code. It also optimizes queries and makes database query code more consistent and easy to read and write. In addition, an ORM manages database sessions and provides a consistent way for reusing connections while employing data caching and lazy loading to reduce unnecessary traffic. When implemented correctly, an ORM can save a tremendous amount of developer time and remove countless database related concerns during application development.
ORM’s with a Natural Key
Because ORM’s are designed to work with surrogate keys, it takes substantial effort and testing to “fit the square peg in the round hole”. Below is an outline of the ramifications of using natural keys in NHibernate (NH) and how to get around them.
|Natural Key Issue||Impact||Work Around|
|Because natural keys are assigned by the application, NH has no way to know if a record is a new or existing record (insert vs. update).||Upon saving an object or group of objects, NH must query the database first to determine how the record should be persisted.||Using a version column to each table, such as a timestamp that changes automatically with each update, can eliminate this. Without a version column, the extra trip to the database can’t be avoided.|
|NH defaults its queries to use a parent’s primary key as the foreign key into related records.||Loading related tables doesn’t “just work” out of the box. Lazy loading is lost and the developer must explicitly retrieve child objects with hand-written code.||To get around this, it may be possible to force developers to use hand-written XML configuration files and write queries that take the expected parameters but use them in an unexpected way to get the desired result. For example, when a needless date parameter is passed, the clause may contain a statement like “where ‘1/1/1800’ NOT EQUAL olicyDate”. Where this isn’t an option, hand written code will have to be written and called.|
Surrogate keys offer many benefits when a database is used for software development. Aside from the simplicity, consistency and stability it also makes the use of an ORM extremely viable. That’s not to say that they come without a price. This price falls mainly in the area of database administration and is relatively low, especially when weighed against cost benefits of using them. Since the databases exists to support applications, optimizing the database for that purpose seems like a sensible choice. Using an ORM in software development can have an extremely positive impact on not only development time but also quality. Both of these reasons lead to an increased bottom line in the form of lower development costs and decreased cost of ownership.
“Humans should use natural keys, and computers should use machine-generated surrogate keys”
- Jeffrey Palermo, CIO HeadSpring Systems, Austin TX