Wednesday, May 15, 2013

Impossibly Lean Audit System for Postgres with hstore

I really like Postgres! I recently discovered a new trick that plays especially well with the access control pattern I recently described: a simple (but rather powerful) audit/logging pattern to track the changes made to a database (i.e. inserts, updates and deletes), and most importantly, allow to query them in a flexible way.

It's easy to set up a trigger to record changes. However, if you follow this basic pattern, you end up with your update data stored as unstructured text, which is not the ideal format to query. You could modify that scheme to use as many audit tables as you want to log (thus each reproducing the structure of its target table), but there's a much nicer solution, using PG's hstore data type.

hstore is PG's answer to NoSQL: it's a string-based key-value data type, which you can embed inside your relational schema, within which it interoperates with other data types and SQL constructs seamlessly. There's also the newer JSON type, which plays a similar role, but since its API is not yet as mature as hstore's, we're not going to use it here.

So here's the audit table:


create extension hstore;

create table audit (
    audit_id serial primary key,
    table_name text not null,
    user_name text not null,
    action_timestamp timestamp not null default current_timestamp,
    action text not null check (action in ('i','d','u')),
    old_values hstore,
    new_values hstore,
    updated_cols text[],
    query text
);

And here's the audit trigger that goes with it:


create or replace function if_modified_func() returns trigger as $body$
begin
    if tg_op = 'UPDATE' then
        insert into audit (table_name, user_name, action, old_values, new_values, updated_cols, query)
        values (tg_table_name::text, current_user::text, 'u', hstore(old.*), hstore(new.*),
               akeys(hstore(new.*) - hstore(old.*)), current_query());
        return new;
    elsif tg_op = 'DELETE' then
        insert into audit (table_name, user_name, action, old_values, query)
        values (tg_table_name::text, current_user::text, 'd', hstore(old.*), current_query());
        return old;
    elsif tg_op = 'INSERT' then
        insert into audit (table_name, user_name, action, new_values, query)
        values (tg_table_name::text, current_user::text, 'i', hstore(new.*), current_query());
        return new;
    end if;
end;
$body$
language plpgsql;

Notice how the rows get converted (hstore(old.*) and hstore(new.*)). Now suppose that we have a book table that we'd like to audit:


create table book (
    book_id serial primary key,
    title text,
    author text,
    n_pages int
);

create trigger book_audit after insert or update or delete on book for each row execute procedure if_modified_func();

If we insert a book in it, we can see the audit mechanism in action:


insert into book (title, n_pages) values ('PG is Great', 250);

select action, table_name, new_values -> 'title' as title from audit;

 action | table_name |    title    
--------+------------+-------------
 i      | book       | PG is Great

The third retrieved value (title) demontrates hstore's syntax for value retrieval from a key (using the -> operator), which is just an example among the bunch of operators and functions offered.

If we perform an update on the book:


update book set author = 'Christian Jauvin', n_pages = 300 where book_id = 1;

select action, table_name, updated_cols from audit;

 action | table_name |   updated_cols   
--------+------------+------------------
 i      | book       | 
 u      | book       | {author,n_pages}

The third column now sports the columns that have been updated, using the updated_cols mechanism, implemented as a PG array (another very nice data structure) along with the akeys function, with which we collect the keys of the hstore structure resulting from the hstore(new.*) "minus" hstore(old.*) operation performed in the "update" part of the trigger function.

It's also easy to perform "before and after" type queries:


select old_values -> 'n_pages' as before, new_values -> 'n_pages' as after from audit where audit_id = 2;

 before | after 
--------+-------
 250    | 300

Don't forget however that hstore values are stored as strings, so this for instance wouldn't work:


select old_values -> 'n_pages' + new_values -> 'n_pages' from audit where audit_id = 2;

HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

whereas this would:


select (old_values -> 'n_pages')::int + (new_values -> 'n_pages')::int from audit where audit_id = 2;

One final aspect to note: I find that this pattern works well with my access control pattern, where every application user has its own PG role, on behalf of which the database operations are performed. In this scenario, the user/role gets automatically logged by the trigger (using the current_user variable). In contrast, an access control system implemented at the level of the application (i.e. with some kind of user table and all database operations performed on behalf of a single PG role) wouldn't enjoy that much simplicity I believe.