New Posts New Posts RSS Feed: TechTip: Unnatural Primary Keys
  FAQ FAQ  Forum Search   Calendar   Register Register  Login Login

TechTip: Unnatural Primary Keys

 Post Reply Post Reply
Author
CODDdisciple View Drop Down
Newbie
Newbie


Joined: 05-Jul-2007
Location: United States
Posts: 1
Post Options Post Options   Quote CODDdisciple Quote  Post ReplyReply Direct Link To This Post Topic: TechTip: Unnatural Primary Keys
    Posted: 05-Jul-2007 at 1:15pm
WELL DONE! Should be required reading for every Masters in CS program (so should a lot of other DB principals).
Back to Top
IdeaBlade View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 30-May-2007
Location: United States
Posts: 353
Post Options Post Options   Quote IdeaBlade Quote  Post ReplyReply Direct Link To This Post Posted: 30-May-2007 at 4:23pm
Level 100
DevForce Express
December 13, 2006
 
Most seasoned developers and database administrators know that primary key values should be intrinsically meaningless. They exist for internal use only as permanent record identifiers. They really shouldn't be shown to users.

By such reasoning, the primary key of an Order record should be completely arbitrary. It shouldn't matter if it is "1234567" or "96b28a7b-a328-4bf3-95a0-06a29e45f582" (which many of you recognize as the string representation of a "Globally Unique Identifier" or "Guid").

Our customers - the end users of our applications - are accustomed to referring to Orders by a unique reference number. Over the years, we've trained them to like calling an order by its number, "1234567". Thus acclimated, they start to expect that the next invoice created will be numbered "1234568". They start to worry if there are gaps in the sequence - if the order jumps to "1234570", for example. "What happened to '68' and '69' ?", they cry. They'll have no patience at all for "96b28a7b-a328-4bf3-95a0-06a29e45f582" or its successor, "81063553-6686-41f4-8b50-130a4deb444d".

Most seasoned developers and database administrators know that primary key values should be intrinsically meaningless. They exist for internal use only as permanent record identifiers. They really shouldn't be shown to users.

By such reasoning, the primary key of an Order record should be completely arbitrary. It shouldn't matter if it is "1234567" or "96b28a7b-a328-4bf3-95a0-06a29e45f582" (which many of you recognize as the string representation of a "Globally Unique Identifier" or "Guid").

Our customers - the end users of our applications - are accustomed to referring to Orders by a unique reference number. Over the years, we've trained them to like calling an order by its number, "1234567". Thus acclimated, they start to expect that the next invoice created will be numbered "1234568". They start to worry if there are gaps in the sequence - if the order jumps to "1234570", for example. "What happened to '68' and '69' ?", they cry. They'll have no patience at all for "96b28a7b-a328-4bf3-95a0-06a29e45f582" or its successor, "81063553-6686-41f4-8b50-130a4deb444d".

 
Key Misery

The customer demand for an uninterrupted progression of reference numbers is at cross purposes with our need as developers for programming flexibility and primary key stability.

We want to set a record's primary key upon insertion and never change it again. If the user creates a temporary record and then deletes it, we don't want to care that we just burned the next id.

It's a dead certainty that the customers' affection for meaningful primary keys is going to plague us. Next thing you know, they're going to ask us to fill in the sequence gaps. Then they will want to prepend the reference number with the year of the order ("06-1234567") and suffix it with the customer initials ("06-1234567-IB"). If the customer changes its name, they will want us to update the initials ("06-1234567-JC").

Did I mention that the customer wants to sort the orders by reference number? Imagine the fun we'll have sorting a string consisting of a mix of alpha decorations and an integer counter; one of my clients requires that orders sort by integer value, regardless of prefix, yielding sequences such as "A23", "A123", "B124", "A1123". No "leading zero" trick allowed!

"Surrogate" vs. "Natural" Keys

The customer is always right.

Sure, the customer vaguely understands that our pain will become his pain if his requirements become too demanding. He sort of understands that messing with the primary key could lead to bugs and data integrity problems. But, at the end of the day, that is our problem, not his.

Our best hope lies in damage control. Damage control begins with a distinction between the "surrogate key" and the "natural key".

The "surrogate key" is the permanent, arbitrary primary key that keeps our code and data consistent and simple. The "natural key" is the protean beast our customer loves. We can finesse the key misery problem if we can store both values in our database tables.

Woe to the developer whose database schema is fixed. I offer no words of comfort, no path to salvation.

If you are fortunate and can change the database even a little, this tip is for you.

Separating Surrogate and Natural Keys

I'm going to assume that you have an existing database and that you want to make as few changes as necessary.

In most applications, there are only a small number of tables with natural keys; the customer doesn't care about the ids of code tables (e.g., OrderStatus) or child tables (OrderItemDetail). The tables with natural keys tend to have comparatively few records; there are many more OrderItemDetail records than parent Order records.

So our approach will waste some table space in order to achieve simplicity and flexibility.

Create a script to do the following to each table with a natural key:
  1. Commit to yourself and the world that the current primary key is permanent and inviolate.
  2. Add a Natural Key column, preferrably a string.
  3. Add a Natural Key Sort column (typically an integer type).
  4. Copy the primary key values into the two new columns (converting the primary key to string for the Natural Key column).
  5. Constrain the new columns to be non-null.
  6. Constrain the new columns to be unique.
  7. Add an insert trigger to set the values of the new columns (more about this shortly).
You play your scripts and update your development database. You confirm that you get exactly what you expected. You check the scripts into source control.

The Insert Trigger

I'll get to the definition of the insert trigger in a moment. Let's establish the game plan first.

While a natural key may change, it isn't going to change often. Our objective is to establish a routine for creating the natural key when we insert a new record into the table.

The new value should be unique and should avoid sequence gaps. The easiest way to satisfy these constraints is to let the data tier calculate the natural key at the moment of insertion. That's the job of the trigger.

DevForce will play along nicely as we'll soon see.

Rebuild the Business Object Model

You have to rebuild the business model to pick up both new natural key column properties. That's easy.
  1. Launch the DevForce Object Mapper.
  2. Confirm that you want it to update the model based on the schema changes.
  3. Include the two new columns.
  4. Save and close the mapper.
  5. Rebuild the Model project.
  6. Update the UI to use the Natural Key Property
You are probably displaying the primary key right now. You want to switch to the natural key.

Fortunately, your application binds your UI controls to business object properties and these properties are easy to find. You should be able to search and replace references to the primary key property with the natural key property, as in replacing "OrderId" with "OrderNumber".

Obviously you'll be careful about preserving the few places where you really want the primary key.

Saving a New Business Object

This part is automatic. You won't have to change your code at all. Here's the save sequence.
  1. The client application indicates its intention to save the new object (e.g., user presses the "Save" button).
  2. DevForce id generation sets the permanent primary key of the new object (the key was temporary until this point).
  3. DevForce id fixup corrects all other entity references to the new object so that their foreign key values now reflect the new, permanent id.
  4. The natural key properties in the new object may be null or may have temporary values; it doesn't matter.
  5. DevForce requests that the database insert the new object.
  6. The database insert trigger fires, calculates the natural key and natural key sort values, and replaces those values in the newly inserted record.
    DevForce re-reads the inserted record.
  7. DevForce converts the updated record back into a business object.
  8. DevForce returns the business object to the client.
  9. The client UI refreshes, revealing the natural key.

Step #7 is what makes this process so easy. DevForce always re-reads the entity after saving - both on inserts and updates - in case a database trigger updates the record. That's what happens here.

Inside the Insert Trigger

Your trigger will be simple if you're lucky.

Suppose the customer requires that the order reference number be an integer that increments without gaps. The ideal solution is to define the Natural Key Sort column as an autoincrement column (assuming MS SQL Server).

Note:
Add a step to your SQL script: you will initiallize the Natural Key Sort Column the seed to the next available integer.

The insert operation automatically sets the next Natural Key Sort Column value via the autoincrement; the insert trigger merely converts this to a string and inserts it into the Natural Key Column.

Note:
By the way, autoincrement columns do a decent job of reducing sequence gaps but they are not a guarantee. See this link (http://databases.aspfaq.com/database/why-are-there-gaps-in-my-identity/autoincrement-column.html) for an account of ways that gaps can creep back into Natural Key values.

A slight variation on this scheme will satisfy the scenarios we considered earlier in which extra information is prepended (the year) and appended (the customer initials) to an integer value that otherwise increments by one with each insert. Your task is to write a function that reads the Natural Key Sort Column value and returns the calculated string; the trigger stores this string into the Natural Key Column.

More exotic scenarios require more extreme calculations. You may have to abandon the autoincrementing Natural Key Sort Column in favor of a next-integer-key lookup table. This is why they pay you the big money.

The essential points are (a) that you have separated natural key generation from surrogate key generation and (b) you've consolidated the logic into a function and trigger on the data tier.

Why both a Natural Key and Natural Key Sort Column?

The Natural Key is superflous in the simple case wherein the reference number is an integer. You don't really need both columns; you can convert the Natural Key Sort column into a string in the client UI. You can add a custom property ("OrderNumber") to the business object for this purpose.

The wheels will come off when your customer demands complex decoration of the reference number.

You may think you are still in luck if you can calculate the reference number on the fly (as you could in the scenarios we've described). Your Natural Key property is a tad more complicated but not by much.

This approach can get you in trouble. The reference number will change if the user modifies the order date or the customer name; "06-1234567-IB" becomes "07-1234567-JC" . That may be ok but I'll bet your customer is not going to be happy when reconciling printed invoices and payments to the altered order reference number.

Note:
At least your program is safe because the data for orders, invoices, and payments are linked internally via the permanent surrogate keys of these records!

I suggest that you inscribe the calculated Natural Key value into the record rather than calculate it in the UI. You can run a database scrub process to update the Natural Key column when (and if) your customer decides to "corrrect" the Natural Key to reflect year and customer name changes.

Keep the Natural Key Sort Column

You may be tempted to keep the Natural Key column and do away with the Natural Key Sort column. After all, once you've generated the Natural Key, it can be sorted easily right?

My bet is your luck will not hold and the customer is going to require complex sorting rules that are tough to implement. A designated sort column makes life easier for your application and for other applications (including server side reports) that should all apply the same sorting rules consistently. It doesn't cost much; leave it in the data.

Conclusion

It's never easy to keep your customer happy and keep your sanity. Separating surrogate keys from natural keys can help. Use a trigger, rely on Object Mapping, and let the DevForce persistence implementation carry the load.

Happy Coding!



Edited by IdeaBlade - 30-May-2007 at 6:36pm
Back to Top
 Post Reply Post Reply

Forum Jump Forum Permissions View Drop Down