# Autotelic Computing

## Wednesday, May 15, 2013

### Impossibly Lean Audit System for Postgres with hstore

I really like Postgres! I recently discovered a new trick that plays especially well with the access control pattern I recently described: a simple (but rather powerful) audit/logging pattern to track the changes made to a database (i.e. inserts, updates and deletes), and most importantly, allow to query them in a flexible way.

It's easy to set up a trigger to record changes. However, if you follow this basic pattern, you end up with your update data stored as unstructured text, which is not the ideal format to query. You could modify that scheme to use as many audit tables as you want to log (thus each reproducing the structure of its target table), but there's a much nicer solution, using PG's hstore data type.

hstore is PG's answer to NoSQL: it's a string-based key-value data type, which you can embed inside your relational schema, within which it interoperates with other data types and SQL constructs seamlessly. There's also the newer JSON type, which plays a similar role, but since its API is not yet as mature as hstore's, we're not going to use it here.

So here's the audit table:


create extension hstore;

create table audit (
audit_id serial primary key,
table_name text not null,
user_name text not null,
action_timestamp timestamp not null default current_timestamp,
action text not null check (action in ('i','d','u')),
old_values hstore,
new_values hstore,
updated_cols text[],
query text
);



And here's the audit trigger that goes with it:


create or replace function if_modified_func() returns trigger as $body$
begin
if tg_op = 'UPDATE' then
insert into audit (table_name, user_name, action, old_values, new_values, updated_cols, query)
values (tg_table_name::text, current_user::text, 'u', hstore(old.*), hstore(new.*),
akeys(hstore(new.*) - hstore(old.*)), current_query());
return new;
elsif tg_op = 'DELETE' then
insert into audit (table_name, user_name, action, old_values, query)
values (tg_table_name::text, current_user::text, 'd', hstore(old.*), current_query());
return old;
elsif tg_op = 'INSERT' then
insert into audit (table_name, user_name, action, new_values, query)
values (tg_table_name::text, current_user::text, 'i', hstore(new.*), current_query());
return new;
end if;
end;
$body$
language plpgsql;



Notice how the rows get converted (hstore(old.*) and hstore(new.*)). Now suppose that we have a book table that we'd like to audit:


create table book (
book_id serial primary key,
title text,
author text,
n_pages int
);

create trigger book_audit after insert or update or delete on book for each row execute procedure if_modified_func();



If we insert a book in it, we can see the audit mechanism in action:


insert into book (title, n_pages) values ('PG is Great', 250);

select action, table_name, new_values -> 'title' as title from audit;

action | table_name |    title
--------+------------+-------------
i      | book       | PG is Great



The third retrieved value (title) demontrates hstore's syntax for value retrieval from a key (using the -> operator), which is just an example among the bunch of operators and functions offered.

If we perform an update on the book:


update book set author = 'Christian Jauvin', n_pages = 300 where book_id = 1;

select action, table_name, updated_cols from audit;

action | table_name |   updated_cols
--------+------------+------------------
i      | book       |
u      | book       | {author,n_pages}



The third column now sports the columns that have been updated, using the updated_cols mechanism, implemented as a PG array (another very nice data structure) along with the akeys function, with which we collect the keys of the hstore structure resulting from the hstore(new.*) "minus" hstore(old.*) operation performed in the "update" part of the trigger function.

It's also easy to perform "before and after" type queries:


select old_values -> 'n_pages' as before, new_values -> 'n_pages' as after from audit where audit_id = 2;

before | after
--------+-------
250    | 300



Don't forget however that hstore values are stored as strings, so this for instance wouldn't work:


select old_values -> 'n_pages' + new_values -> 'n_pages' from audit where audit_id = 2;

HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.



whereas this would:


select (old_values -> 'n_pages')::int + (new_values -> 'n_pages')::int from audit where audit_id = 2;



One final aspect to note: I find that this pattern works well with my access control pattern, where every application user has its own PG role, on behalf of which the database operations are performed. In this scenario, the user/role gets automatically logged by the trigger (using the current_user variable). In contrast, an access control system implemented at the level of the application (i.e. with some kind of user table and all database operations performed on behalf of a single PG role) wouldn't enjoy that much simplicity I believe.

## Monday, April 1, 2013

### Impossibly Lean Access Control with Postgres + Flask

One of the most common feature for a database-backed web application is to implement some kind of access control mechanism to regulate what different types of users (admins, guests, etc) can do. There are of course many ways to implement such a pattern: you can do it client-side, by modifying your UI dynamically (e.g. graying out certain widgets) according to the identity of the user, or server-side (as part of the so-called business rules maybe). Often a mix of both approaches will be used.

In this article, I will to show a way to implement such a pattern in a very simple way, using Postgres, Flask, Flask-Login and little_pger, a very thin SQL wrapper just above psycopg2 which I created.

Because the idea has received some criticism, I must say that (1) it is only a proof of concept (i.e. probably not robust enough for production in its current form) and (2) I don't think that such a pattern would scale well to a big userbase (it's rather meant for small-scale, intranet-style webapps with a limited number of users, which is what I've been mainly developing). The idea for the pattern was inspired by this blog post, that I read a while ago:

http://database-programmer.blogspot.ca/2009/02/comprehensive-database-security-model.html

while my skepticism toward ORMs in general stems from this even older (and admittedly controversial) post:

http://database-programmer.blogspot.ca/2008/06/why-i-do-not-use-orm.html

I say my pattern is "impossibly lean" because it introduces a minimal amount of code and concepts (not even the shadow of an ORM will be found here), by leveraging PG's role access management subsystem. These powerful features are already available the moment you create a PG database, so why would you need to reinvent the wheel?

Let's start by supposing that our application has a very simple database, with only one table:


create table foo (
foo_id serial primary key
);


Suppose also that we have two users for it: an almighty admin (who can do whatever pleases him), and a shy guest (only allowed to look around). Although we could create an additional SQL "user" table, with a column dedicated to the representation of their privilege levels, another solution is to simply create corresponding PG roles:

grant select on all tables in schema public to joe_guest;


joe_admin, being a superuser can perform any SQL operation, while joe_guest is restricted to only being able to select things.. that seems fine, but is there a way to leverage these SQL-level constraints into our application? Sure we can, and here's all it takes to implement the pattern:

import psycopg2, psycopg2.extras
import little_pger as pg

app.secret_key = 'a secret!'

class User(UserMixin):
def __init__(self, oid):
self.id = oid

# called first: here we connect as an admin (with the login privilege)
@app.before_request
def before_request():
connection_factory=psycopg2.extras.RealDictConnection)

# called next: from the admin connection, we fetch the name of the
# target role (identified with the unique (o)id encrypted in the session cookie)
cur = g.db.cursor()
rn = pg.select1(cur, 'pg_authid', 'rolname', where={'oid': id})
if rn:
# executed on behalf of the admin
cur.execute('set role %s', [rn])
# from this point any command performed via g.db.cursor() will be
# on the target role's behalf
return User(id)
return None


The interesting part is the interplay between before_request (Flask) and load_user (Flask-Login) which are both called automatically, before serving any view request. The first simply sets an admin connection from which the second will be able to switch user (by first fetching the appropriate role name in the pg_authid table with the id stored in the session cookie and then issuing the set role SQL command using it). We can view this mechanism in action by adding two views with different access policies:

# this read-only view should be available to anyone
@app.route('/look_at_something')
def look_at_something():
return jsonify({'success': True, 'count': pg.count(g.db.cursor(), 'foo')})

# this view should only be available to an admin; it will raise an exception
# (at the SQL level) if accessed by anyone else
@app.route('/change_something')
def change_something():
pg.insert(g.db.cursor(), 'foo')
g.db.commit()
return jsonify({'success': True})


which we can then drive using some Flask testing code:

if __name__ == '__main__':
for view in ['/look_at_something', '/change_something']:
with app.test_request_context(view):
app.preprocess_request()
oid = pg.select1(g.db.cursor(), 'pg_authid', 'oid',
where={'rolname': user})
print '%s %s..' % (user, view),
try:
app.dispatch_request()
print 'and succeeds!'
except:
print 'but fails..'

# joe_guest /look_at_something.. and succeeds!
# joe_guest /change_something.. but fails..


Of course an access violation exception by itself wouldn't be terribly useful, so an easy way to use that pattern in a real environment would be to intercept it using Flask's errorhandler decorator, in order to return, for instance, a JSON formatted error message, along with a 403 response (upon which the client should presumably react appropriately):

@app.errorhandler(psycopg2.ProgrammingError)
def db_access_error(e):
msg = "This user doesn't have the required privilege to perform this action."
return jsonify({'success': False, 'message': msg}), 403


Finally, note that we didn't make use of the role passwords up to this point, because we assumed that a user had already been logged in, but of course it must occur at some point in the process, and here's a possible way to do it (which implies knowing about the particular way PG encrypts passwords in the pg_authid table):