Data is awesome.
Maybe it would be more accurate to say that having an ability to sift through mounds of data and come up with a specific answer or conclusion is an awesome ability to have.
In my experience as a developer, building queries and logic into databases has always been one of my favorite types of development.
Linking together disparate pieces of data into something functional and usable is really cool.
However, one aspect of data that was drilled into my head as a young Engineer was the ability to audit data.
Who did What and When did they do it?
Now, years later I find myself building an integrated custom data solution for Product and Project Management, Time Entry, and Product Release Management.
So where did I start? With the data of course.
The following is a quick crash course in setting up tables and functionality in a relational database for audit purposes. This should be generally applicable to most relational database systems out there.
For this code, SQL Server will be used, however, this same methodology can be easily modified to run on most systems.
Before we get going, a few rules need to be defined:
- Rule 1 – All database tables must have a unique identifier column. Here the term sequence is used, typically with a
_seq
identifier. This sequence is always unique and is always present in every table even if other data columns are also considered to be unique - Rule 2 – Table, Column, Stored Procedure, and other database elements naming following a naming standard. I have my own standard that I made up that makes sense to me. You should come up with one at the beginning of your development. This makes a huge difference in development speed and in understanding.
- Example 1 – I like to group things together in my gui. So each table gets a standard preface so that they are logically grouped. For this instance, my table names will all start with
DEMO_<tablename>
- Example 2 – I prefer to keep column names unique. It does not make any sense to me to have a ‘field_id’ column in one table be of type
integer
and then to also have a ‘field_id’ column in another table be of typenvarchar
. Ugh. That creates headaches.
- Example 1 – I like to group things together in my gui. So each table gets a standard preface so that they are logically grouped. For this instance, my table names will all start with
- Rule 3 – Coding and scripting up front save a lot of time down the road. Yes, the code below may seem like a lot, but I prefer the ‘fire-and-forget’ style approach. I like to write it, make sure it works, then forget about it
Now the rules are set so let’s get coding! Remember, we are build an audit structure so the goal is to be able to define ‘Who did What and When did they do it?‘
Step 1 – Define the ‘Who‘
We need to start by defining the person making the data changes. For this instance, I want to avoid user management within the database itself.
Why? Because I know that all of my data access will not be through database user credentials. I don’t want to set up and manage users inside my database. Since this system will eventually be used with a no-code front end based in Microsoft Powerapps and Microsoft Power Automate, those systems will already have user management in place.
So how do we handle user management in this case to define the ‘Who‘?
The answer was provided above, you just may not have realized it. You will need to do a little bit more design on your system before you can really determine how to manage users. I have already made those decisions – users will be coming from different Microsoft systems. I also know that from those systems I have access to a few elements of that user:
User().Email
User().FullName
User().Image
Here we will ignore the FullName and Image. The full name data might not be unique in all instances and I don’t really want to mess with blobs or image files right now. Also, not everyone may have an image in that system or it might not be a good identifier.
That leaves us with the email address. This is a good identifier, and it should be unique. So now we have a starting point.
The first thing to do is to set up a User table.
The simplest and fastest thing to do would be to set up a 2 column table like this:
CREATE TABLE [DBO].[DEMO_Users](
[user_seq] [INT] NULL,
[user_email] [NVARCHAR](100) NOT NULL
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Ok – stop here. The above code works, but it will need a lot of work to answer the target of ‘Who did What and When did they do it?‘
First of all, we want [user_seq]
to be unique and also to be the primary key for the table. So let’s update it to this:
CREATE TABLE [DBO].[DEMO_Users](
[user_seq] [INT] IDENTITY(1,1) NOT NULL,
[user_email] [NVARCHAR](100) NOT NULL,
CONSTRAINT [PK_DEMO_Users] PRIMARY KEY CLUSTERED
(
[user_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Getting better. We have now fulfilled Rule #1 above.
But what if we want more information about a user? That is easy, we add some more columns. This is also a decision point. Is the data being added a single data point for each user? Or is it possibly multiple data points per user? This is where I try and define whether to add more columns to this table or to add a child table. In this case, let’s add the User’s first name, birthday, and favorite color because we might want to use those later on.
Now our table definition looks like this:
CREATE TABLE [DBO].[DEMO_Users](
[user_seq] [INT] IDENTITY(1,1) NOT NULL,
[user_email] [nvarchar](100) NOT NULL,
[user_first_name] [NVARCHAR](50) NOT NULL,
[engineer_birthday] [DATE] NULL,
[user_fav_color] [NVARCHAR](50) NOT NULL,
CONSTRAINT [PK_DEMO_Users] PRIMARY KEY CLUSTERED
(
[user_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Great! Step 1 is complete. We have a solid definition of the user in a data table, answering the question of ‘Who‘.
Step 2 – Define the ‘When‘
We are going to jump over the ‘What‘ and go straight to the ‘When‘.
At this point we should stop and take a look at the SQL statements to insert and update data in our Users table. Keep these statements in mind as we move forward. For the update, we know the email won’t change as it is unique to the external system. Here we will make a business logic assumption that the user’s first name and birthday won’t change, making the update much simpler.
INSERT:
INSERT INTO [DBO].[DEMO_Users]
([user_email], [user_first_name], [engineer_birthday], [user_fav_color])
VALUES
('test@kevinwanke.com', 'Kevin', '1/1/2000', 'green');
UPDATE:
UPDATE [DBO].[DEMO_Users] SET [user_fav_color] = 'blue' WHERE [user_seq] = 1;
Now we get into storing data on when things occur. Here we will add 2 new columns to the table.
These columns are: [user_cre_ts]
and [user_upd_ts]
.
The first column, [user_cre_ts]
, can be read as: User Create Timestamp. This column is simply a timestamp showing when the overall record is created. It is initialized when the record is inserted and should not be changed or modified beyond that point.
The second column, [user_upd_ts]
, can be read as: User Update Timestamp. If you are guessing that this is a timestamp field that is updated anytime the record changes then you nailed it. One thing to note here, the code that we will be adding to the table treats the initial insert as an update meaning that the update timestamp will equal the create timestamp until a subsequent update is done.
The updated table creation code looks like this:
CREATE TABLE [DBO].[DEMO_Users](
[user_seq] [INT] IDENTITY(1,1) NOT NULL,
[user_email] [nvarchar](100) NOT NULL,
[user_first_name] [NVARCHAR](50) NOT NULL,
[engineer_birthday] [DATE] NULL,
[user_fav_color] [NVARCHAR](50) NOT NULL,
[user_cre_ts] [datetime] NULL,
[user_upd_ts] [datetime] NULL,
CONSTRAINT [PK_DEMO_Users] PRIMARY KEY CLUSTERED
(
[user_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [DBO].[DEMO_Users] ADD CONSTRAINT [CNST_DEMO_Users_user_cre_ts] DEFAULT (getdate()) FOR [user_cre_ts]
GO
ALTER TABLE [DBO].[DEMO_Users] ADD CONSTRAINT [CNST_DEMO_Users_user_upd_ts] DEFAULT (getdate()) FOR [user_upd_ts]
GO
CREATE TRIGGER [dbo].[TRG_DEMO_Users_AfterUpdate] ON [DBO].[DEMO_Users]
AFTER UPDATE
AS
UPDATE DMOUSR SET [user_upd_ts] = GETDATE()
FROM [dbo].[DEMO_Users] as DMOUSR INNER JOIN inserted as INS ON DMOUSR.user_seq = INS.user_seq;
GO
So what the heck did we just add here?
The two ALTER TABLE
lines can be read as: Alter the table to add a constraint that sets the default value for the column to getdate()
– which means that it is adding in the current timestamp. All these lines are doing is that they are setting the current timestamp value into these two column when an insert occurs. These constraints are also why you don’t need to insert a value into these columns on an insert statement.
The CREATE TRIGGER
block creates a trigger event that runs after an update to a row on the table. It will update the [DEMO_Users]
table to set the [user_upd_ts]
equal to the current timestamp after the update is completed.
That is about it. With these pieces in place we have now answered the question of ‘When’. At any point in time when the table is queried each record now shows when it was created and when it was last inserted.
One last note before we move into the final step. It is worthwhile to stop and take a look at our INSERT
and UPDATE
statements again:
INSERT:
INSERT INTO [DBO].[DEMO_Users]
([user_email], [user_first_name], [engineer_birthday], [user_fav_color])
VALUES
('test@kevinwanke.com', 'Kevin', '1/1/2000', 'green');
UPDATE:
UPDATE [DBO].[DEMO_Users] SET [user_fav_color] = 'blue' WHERE [user_seq] = 1;
They haven’t changed at all. They are still nice and simple. All of the timestamps and sequences are happening behind the scenes and keep our logic nice and simple here. Awesome!
Step 3 – Define the ‘What‘
Everything up to this point is all well and good. We know the record information. We have a truly unique identifier in our Primary Key sequence. We know when the record was created. We know when it was last updated and the value of that last update.
But what if we want to know what the previous values were for the User’s favorite color?
Sure – silly example here. But in a production-level system we may often need to understand what the old data was and when it was changed, especially if we have a system where the data changes frequently and where the specific data being changed carries a lot of context for the overall system.
So, now let’s build in a way to track the favorite color changes. One quick note – we already defined business logic rules stating that we aren’t going to be changing the User Email since it comes form a different system and should not change and we have decided to not change the User First Name and User Birthday.
So the first thing to do here is to create an audit table. This will be considered to be a child table under [DEMO_Users]
as now each record in the main table will have potentially multiple update audit records. The table will look like this:
CREATE TABLE [DBO].[DEMO_User_Audits](
[user_audit_seq] [INT] IDENTITY(1,1) NOT NULL,
[user_seq] [INT] NOT NULL,
[user_fav_color_prev] [NVARCHAR](50) NOT NULL,
[user_fav_color_new] [NVARCHAR](50) NOT NULL,
[user_audit_cre_ts] [datetime] NULL,
CONSTRAINT [PK_DEMO_User_Audits] PRIMARY KEY CLUSTERED
(
[user_audit_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [DBO].[DEMO_User_Audits] ADD CONSTRAINT [CNST_DEMO_User_Audits_user_audit_cre_ts] DEFAULT (getdate()) FOR [user_audit_cre_ts]
GO
A few notes on this table:
- The column
[user_seq]
will be used to link to the parent[DEMO_Users]
table. Probably should throw a foreign key constraint on this one, but not defining that at this point to keep these examples simple - The column
[user_fav_color_prev]
is meant to show the value of the User’s favorite color before the change - The column
[user_fav_color_new]
is meant to show the value of the User’s favorite color after the change - The column
[user_audit_cre_ts]
shows the timestamp of the change. This will match the[user_upd_ts]
column in the parent[DEMO_Users]
table after the change until a new update is performed - There is no
[user_audit_upd_ts]
or trigger for the update here. This table is intended to be inserted into with audit data ONLY. There should never be an update to this table.
Now, at this point it may seem that the next logical step here would be to go back and update the UPDATE
trigger on the main table. We have all of the elements necessary to fill all of the data columns and it would be easy to add an INSERT
statement into the main table trigger to populate the child audit table.
However, there is one critical piece of information missing here!
Yes, Step 1 above was all about defining the ‘Who‘. The gap here is that we kind of had a scenario of what comes first, the chicken or the egg? Only in this case, what comes first, the data or the Users?
So we will set aside the User’s favorite color for now. Let’s instead create a table to hold widget information. The parent table and child audit table creation SQL looks like this:
PARENT WIDGETS TABLE:
CREATE TABLE [DBO].[DEMO_Widgets](
[widget_seq] [INT] IDENTITY(1,1) NOT NULL,
[widget_name] [NVARCHAR](100) NOT NULL,
[widget_type] [NVARCHAR](50) NOT NULL,
[widget_version] [INT] NOT NULL,
[widget_cre_ts] [datetime] NULL,
[widget_upd_ts] [datetime] NULL,
CONSTRAINT [PK_DEMO_Widgets] PRIMARY KEY CLUSTERED
(
[widget_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [DBO].[DEMO_Widgets] ADD CONSTRAINT [CNST_DEMO_Widgets_user_cre_ts] DEFAULT (getdate()) FOR [widget_cre_ts]
GO
ALTER TABLE [DBO].[DEMO_Widgets] ADD CONSTRAINT [CNST_DEMO_Widgets_user_upd_ts] DEFAULT (getdate()) FOR [widget_upd_ts]
GO
CREATE TRIGGER [dbo].[TRG_DEMO_Widgets_AfterUpdate] ON [DBO].[DEMO_Widgets]
AFTER UPDATE
AS
UPDATE DmoWdgt SET [widget_upd_ts] = GETDATE()
FROM [dbo].[DEMO_Widgets] as DmoWdgt INNER JOIN inserted as INS ON DmoWdgt.widget_seq = INS.widget_seq;
GO
CHILD WIDGET AUDITS TABLE:
CREATE TABLE [DBO].[DEMO_Widget_Audits](
[widget_audit_seq] [INT] IDENTITY(1,1) NOT NULL,
[widget_seq] [INT] NOT NULL,
[user_seq] [INT] NOT NULL,
[widget_name_prev] [NVARCHAR](100) NOT NULL,
[widget_name_new] [NVARCHAR](100) NOT NULL,
[widget_type_prev] [NVARCHAR](50) NOT NULL,
[widget_type_new] [NVARCHAR](50) NOT NULL,
[widget_version_prev] [INT] NOT NULL,
[widget_version_new] [INT] NOT NULL,
[widget_audit_cre_ts] [datetime] NULL,
CONSTRAINT [PK_DEMO_Widget_Audits] PRIMARY KEY CLUSTERED
(
[widget_audit_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [DBO].[DEMO_Widget_Audits] ADD CONSTRAINT [CNST_DEMO_Widget_Audits_widget_audit_cre_ts] DEFAULT (getdate()) FOR [widget_audit_cre_ts]
GO
Hopefully this all looks pretty straightforward at this point. Please note that there is now a [user_seq]
column in the child audits table. This is important! That is where the ‘Who‘ is defined!
What this means is that we really don’t have a fully automated method for creating the audits table. That is because the context for ‘Who‘ is actually not available in the database logic layer. That context is only available in the higher GUI logic layers, so it is up to those layers to do the insert logic here. The INSERT
and UPDATE
statements would look like this:
INSERT:
INSERT INTO [DBO].[DEMO_Widgets]
([widget_name], [widget_type], [widget_version])
VALUES
('Muffler Bearings', 'Engine Parts', 1);
UPDATE and audit INSERT (where [widget_seq] is 1 and the user making the change is [user_seq] 7):
UPDATE [DBO].[DEMO_Widgets]
SET [widget_name] = 'Muffler Bearings v2',
[widget_type] = 'Engine Parts',
[widget_version] = 2
WHERE [widget_seq] = 1;
INSERT INTO [DBO].[DEMO_Widget_Audits]
([widget_seq],
[user_seq],
[widget_name_prev],
[widget_name_new],
[widget_type_prev],
[widget_type_new],
[widget_version_prev],
[widget_version_new])
VALUES
(1,
7,
'Muffler Bearings',
'Muffler Bearings v2',
'Engine Parts',
'Engine Parts',
1,
2);
And there you have it! A full table and audit record for this data.
One final note. With the additional context of needing to know ‘Who‘ made changes, we should go back and add a new column to the Users audit table like so:
CREATE TABLE [DBO].[DEMO_User_Audits](
[user_audit_seq] [INT] IDENTITY(1,1) NOT NULL,
[user_seq] [INT] NOT NULL,
[user_seq_audit] [INT] NOT NULL,
[user_fav_color_prev] [NVARCHAR](50) NOT NULL,
[user_fav_color_new] [NVARCHAR](50) NOT NULL,
[user_audit_cre_ts] [datetime] NULL,
CONSTRAINT [PK_DEMO_User_Audits] PRIMARY KEY CLUSTERED
(
[user_audit_seq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [DBO].[DEMO_User_Audits] ADD CONSTRAINT [CNST_DEMO_User_Audits_user_audit_cre_ts] DEFAULT (getdate()) FOR [user_audit_cre_ts]
GO
The change is in adding the [user_seq_audit]
column. This will link back to the [DEMO_Users]
table as well, but will not point to the parent record. It will instead point to the user that initiated the change to the User data. While it would be nice to simply call this [user_seq]
to try and follow the other naming standards, we already have a column with this name with the context of linking back to the parent record. Therefore, adding the _audit
to the column differentiates this column from the original [user_seq]
.
Code from this article is available in a public repository on Github. The following files are available under a MIT License:
Thanks for reading!
Comments are closed, but trackbacks and pingbacks are open.