Wednesday, July 30, 2008

Implementing business rules

This is the first post about implementing business rules in an Oracle database. In the previous millenium I've implemented them a lot using CDM RuleFrame and its business rule classification scheme. This was all done using Oracle7. And business rules were either implemented using check constraints or, most of the time, using database triggers. Since then, a lot has changed, which have changed my preferred way of implementing business rules. These changes include:

  1. The current Oracle database version is now 11gR1. Since version 7, a lot of new techniques have become available to use when implementing business rules. Examples of these are function based indexes, on commit refreshable materialized views and virtual columns.

  2. Database triggers are considered evil, or at least by me.
Also, there are people out there, even some intelligent ones, who claim that business rules should be implemented in an application logic layer and not in a persistency layer. I suspect that all readers of this blog will not agree with that because the data is most important to a company and not the application. And so, business rules should be implemented closest to the data: in the database. But just in case you need a little persuasion, I've collected a few links. Needless to say, I agree with them all heartily.
So now we are clear that business rules should be implemented in the database, and we can move on to the business rule classification scheme. This scheme hasn't changed over the years; it is as applicable now, as it was ten years ago. With a business rule classification scheme you can categorize all business rules that you have thought of during your analysis phase. And when you have categorized them properly, you will see that most business rules of the same category are implemented the same. So when implementing business rules, a business rule classification scheme is indispensable. CDM RuleFrame uses the one below, which I like. You can also see this scheme here.







In the next posts I will walk through this classification scheme and discuss for each class how a business rule can or should be implemented nowadays in my opinion.

Next posts:
Implementing attribute and tuple rules
Implementing entity rules
Implementing inter-entity rules
Implementing dynamic data constraints
Implementing change event rules

Thursday, July 24, 2008

SYSDATE@!

If you use SYSDATE or SYSTIMESTAMP multiple times in a query that takes some time, will you see different times?

No, because queries are read consistent: they see the data as of the timestamp the query began. So SYSDATE and SYSTIMESTAMP will produce the date and timestamp at the time the query begins, right? You can see this in this little test:

rwijk@ORA11G> select sysdate
2 , systimestamp
3 from dual
4 /

SYSDATE SYSTIMESTAMP
------------------- -----------------------------------------
24-07-2008 22:35:05 24-07-08 22:35:05,046000 +02:00

1 rij is geselecteerd.

rwijk@ORA11G> select min(n)
2 , max(n)
3 , min(d)
4 , max(d)
5 , min(t)
6 , max(t)
7 from ( select level n
8 , sysdate d
9 , systimestamp t
10 from dual
11 connect by level <= 2000000
12 )
13 /

MIN(N) MAX(N) MIN(D) MAX(D)
---------- ---------- ------------------- -------------------
MIN(T)
-------------------------------------------------------------
MAX(T)
-------------------------------------------------------------
1 2000000 24-07-2008 22:35:05 24-07-2008 22:35:05
24-07-08 22:35:05,062000 +02:00
24-07-08 22:35:05,062000 +02:00


1 rij is geselecteerd.

rwijk@ORA11G> select sysdate
2 , systimestamp
3 from dual
4 /

SYSDATE SYSTIMESTAMP
------------------- -----------------------------------------
24-07-2008 22:35:10 24-07-08 22:35:10,031000 +02:00

1 rij is geselecteerd.

The query takes 5 seconds, but the reported minimum and maximum sysdates and systimestamps are the time the query began, as you can see.

But a comment by James Padfield in this OTN-thread convinced me that it is not really read consistency that takes care of this behaviour of sysdate and systimestamp. He says:

I don't believe SYSDATE is truly read-consistent, if it was then references to SYSDATE within a serializable (or read-only) transaction would return the same value, it is trivial to show that they do not.

However It appears that SYSDATE has been made to behave as if it is read-consistent at the statement level and this implementation appears to involve re-write as previously shown. Possibly (and this is pure speculation) it is persisted somewhere at the start of the statement and this value is referenced by SYSDATE@!. This might give benefits in terms of both performance and statement level read-consistent behaviour (you might see the latter as a side-effect). I notice a similar rewrite is used with the USER function.

Of course the implementation is largely irrelevant as long as we clearly understand the behaviour.


Let's check the first remark. In a serializable transaction they produce different values:

rwijk@ORA11G> set transaction isolation level serializable
2 /

Transactie is ingesteld.

rwijk@ORA11G> select sysdate
2 , systimestamp
3 from dual
4 /

SYSDATE SYSTIMESTAMP
------------------- --------------------------------------
24-07-2008 22:35:10 24-07-08 22:35:10,031000 +02:00

1 rij is geselecteerd.

rwijk@ORA11G> exec dbms_lock.sleep(2)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select sysdate
2 , systimestamp
3 from dual
4 /

SYSDATE SYSTIMESTAMP
------------------- --------------------------------------
24-07-2008 22:35:12 24-07-08 22:35:12,031000 +02:00

1 rij is geselecteerd.


So, this blows up the read consistency theory. But the first test showed it behaved as statement level read consistent. And the SYSDATE@! internal variable/function (so no, I was not swearing in the title) can be looked up by querying the filter_predicates of the plan_table. I also included the USER and SYSTIMESTAMP pseudo columns:

rwijk@ORA11G> explain plan
2 for
3 select dummy
4 from dual
5 where sysdate > date '2008-01-01'
6 and systimestamp > timestamp '2008-01-01 00:00:00'
7 and user != 'A'
8 /

Uitleg is gegeven.

rwijk@ORA11G> select filter_predicates
2 from plan_table
3 where filter_predicates is not null
4 /

FILTER_PREDICATES
--------------------------------------------------------------------------------
USER@!<>'A' AND SYS_EXTRACT_UTC(SYSTIMESTAMP(6))>SYS_EXTRACT_UTC(TIMESTAMP' 2008
-01-01 00:00:00,000000000') AND SYSDATE@!>TO_DATE(' 2008-01-01 00:00:00', 'syyyy
-mm-dd hh24:mi:ss')


1 rij is geselecteerd.

James' story about SYSDATE sounds very plausible to me. But now I wonder why SYSTIMESTAMP is not converted to SYSTIMESTAMP@! but to SYS_EXTRACT_UTC(SYSTIMESTAMP(6)), especially since SYSTIMESTAMP also behaves as statement level read consistent...

rwijk@ORA11G> select sysdate@!
2 , user@!
3 from dual
4 /

SYSDATE@! USER@!
------------------- ------------------------------
24-07-2008 23:11:41 RWIJK

1 rij is geselecteerd.

rwijk@ORA11G> select systimestamp@!
2 from dual
3 /
select systimestamp@!
*
FOUT in regel 1:
.ORA-00923: FROM-sleutelwoord is niet gevonden waar verwacht.

Saturday, July 19, 2008

Scalability of dbms_lock.request

For some time now, I have recommended to use dbms_lock.request when implementing business rules correctly. In short, the idea behind it is to serialize access when validating some entity rule when the rows belong to the same parent record. For example, think of a business rule that checks for an overlap of periods. An alternative for dbms_lock.request is to physically lock the parent row. But sometimes there is no real parent row to lock, so you can resort to the dbms_lock.request procedure that allows you to lock a number. Your only remaining challenge then is to determine which rows should map to the same number.

A few weeks ago, a colleague was implementing my suggestion and noted performance problems when doing a single update statement that updated a lot of rows. The performance problems disappeared when commenting out the lock procedure. I have used this method several times now and never experienced problems with it. Probably because I never processed real large sets with it. Also, when doing row-by-row processing on a large set, the process itself is slow and the fact that dbms_lock does not scale, is masked by the inefficiency of row-by-row-processing. So, to show what I mean, I will process a large set with a single SQL statement. For each row, a trigger locks based on the parent id. First things first, here are the tables:

rwijk@ORA11G> create table parent (id,name)
2 as
3 select level
4 , 'name' || to_char(level)
5 from dual
6 connect by level <= 100000
7 /

Tabel is aangemaakt.

rwijk@ORA11G> alter table parent add primary key (id)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> create table child (id,parent_id,startdate,enddate)
2 as
3 select level
4 , ceil(level/10)
5 , date '1999-01-01' + numtoyminterval(mod(level,10),'year')
6 , date '1999-12-31' + numtoyminterval(mod(level,10),'year')
7 from dual
8 connect by level <= 1000000
9 /

Tabel is aangemaakt.

rwijk@ORA11G> alter table child add primary key (id)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table child add foreign key (parent_id) references parent(id)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> create index i1 on child(parent_id)
2 /

Index is aangemaakt.

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'parent',cascade=>true)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'child',cascade=>true)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select * from child where id between 1 and 12 order by id
2 /

ID PARENT_ID STARTDATE ENDDATE
---------- ---------- ------------------- -------------------
1 1 01-01-2000 00:00:00 31-12-2000 00:00:00
2 1 01-01-2001 00:00:00 31-12-2001 00:00:00
3 1 01-01-2002 00:00:00 31-12-2002 00:00:00
4 1 01-01-2003 00:00:00 31-12-2003 00:00:00
5 1 01-01-2004 00:00:00 31-12-2004 00:00:00
6 1 01-01-2005 00:00:00 31-12-2005 00:00:00
7 1 01-01-2006 00:00:00 31-12-2006 00:00:00
8 1 01-01-2007 00:00:00 31-12-2007 00:00:00
9 1 01-01-2008 00:00:00 31-12-2008 00:00:00
10 1 01-01-1999 00:00:00 31-12-1999 00:00:00
11 2 01-01-2000 00:00:00 31-12-2000 00:00:00
12 2 01-01-2001 00:00:00 31-12-2001 00:00:00

12 rijen zijn geselecteerd.

So 100,000 parent records, each with 10 child records resulting in a total of 1,000,000 child records. Next, create a package with two variants of serializing at the parent level, one by using dbms_lock.request and the other by the parent record. The package also includes a procedure that prints the elapsed time and some text:

rwijk@ORA11G> create package mypkg
2 as
3 procedure lck (p_parent_id in child.parent_id%type);
4 procedure lck_real_parent (p_parent_id in child.parent_id%type);
5 procedure print_time_and_text (p_text in varchar2);
6 procedure reset_timestamp;
7 end;
8 /

Package is aangemaakt.

rwijk@ORA11G> create package body mypkg
2 as
3 l_previous_timestamp timestamp
4 ;
5 procedure lck (p_parent_id in child.parent_id%type)
6 is
7 l_lock_dummy integer;
8 begin
9 l_lock_dummy := dbms_lock.request
10 ( release_on_commit => true
11 , id => dbms_utility.get_hash_value
12 ( name => 'child' || to_char(p_parent_id)
13 , base => 1
14 , hash_size => power(2,30)
15 )
16 );
17 end lck
18 ;
19 procedure lck_real_parent (p_parent_id in child.parent_id%type)
20 is
21 l_dummy parent.id%type;
22 begin
23 select id
24 into l_dummy
25 from parent
26 where id = p_parent_id
27 for update of id
28 ;
29 end lck_real_parent
30 ;
31 procedure print_time_and_text (p_text in varchar2)
32 is
33 cn_new_timestamp timestamp := systimestamp;
34 begin
35 dbms_output.put_line
36 ( to_char(cn_new_timestamp,'hh24:mi:ss.ff') ||
37 ': ' ||
38 p_text || ' ' ||
39 (cn_new_timestamp - l_previous_timestamp)
40 );
41 l_previous_timestamp := cn_new_timestamp
42 ;
43 end print_time_and_text
44 ;
45 procedure reset_timestamp
46 is
47 begin
48 l_previous_timestamp := null;
49 end reset_timestamp
50 ;
51 end mypkg;
52 /

Package-body is aangemaakt.

Now I will run three variants:
1) without locking, to know how much time the statement itself takes
2) including locking of the parent, implemented with a dbms_lock.request
3) including locking of the parent, by locking the real parent record
For each variant one single update statement updates 1,000,000 rows. The elapsed time is reported after each 50,000 records.

rwijk@ORA11G> create trigger mytrg
2 before update on child
3 for each row
4 begin
5 if mod(:new.id,50000) = 0
6 then
7 mypkg.print_time_and_text(lpad(:new.id,7));
8 end if;
9 end;
10 /

Trigger is aangemaakt.

rwijk@ORA11G> begin
2 mypkg.print_time_and_text(' start');
3 update child
4 set enddate = enddate - 1
5 ;
6 end;
7 /
06:42:33.671000000: start
06:42:35.843000000: 50000 +000000000 00:00:02.172000000
06:42:38.671000000: 100000 +000000000 00:00:02.828000000
06:42:40.828000000: 150000 +000000000 00:00:02.157000000
06:42:43.531000000: 200000 +000000000 00:00:02.703000000
06:42:46.656000000: 250000 +000000000 00:00:03.125000000
06:42:49.531000000: 300000 +000000000 00:00:02.875000000
06:42:51.609000000: 350000 +000000000 00:00:02.078000000
06:42:54.437000000: 400000 +000000000 00:00:02.828000000
06:42:56.796000000: 450000 +000000000 00:00:02.359000000
06:42:59.453000000: 500000 +000000000 00:00:02.657000000
06:43:02.031000000: 550000 +000000000 00:00:02.578000000
06:43:05.109000000: 600000 +000000000 00:00:03.078000000
06:43:07.703000000: 650000 +000000000 00:00:02.594000000
06:43:10.625000000: 700000 +000000000 00:00:02.922000000
06:43:13.203000000: 750000 +000000000 00:00:02.578000000
06:43:15.781000000: 800000 +000000000 00:00:02.578000000
06:43:18.234000000: 850000 +000000000 00:00:02.453000000
06:43:21.156000000: 900000 +000000000 00:00:02.922000000
06:43:23.953000000: 950000 +000000000 00:00:02.797000000
06:43:26.703000000: 1000000 +000000000 00:00:02.750000000

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> rollback
2 /

Rollback is voltooid.

So updating 1,000,000 rows like this takes between 2,0 seconds and 3,1 seconds. Let's round it to 2,5 seconds per 50,000 rows. Now let's include the lck procedure with dbms_lock.request:

rwijk@ORA11G> exec mypkg.reset_timestamp

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> create or replace trigger mytrg
2 before update on child
3 for each row
4 begin
5 mypkg.lck(:new.parent_id)
6 ;
7 if mod(:new.id,50000) = 0
8 then
9 mypkg.print_time_and_text(lpad(:new.id,7));
10 end if;
11 end;
12 /

Trigger is aangemaakt.

rwijk@ORA11G> begin
2 mypkg.print_time_and_text(' start');
3 update child
4 set enddate = enddate - 1
5 ;
6 end;
7 /
06:45:12.156000000: start
06:45:15.375000000: 50000 +000000000 00:00:03.219000000
06:45:21.296000000: 100000 +000000000 00:00:05.921000000
06:45:31.375000000: 150000 +000000000 00:00:10.079000000
06:45:45.093000000: 200000 +000000000 00:00:13.718000000
06:46:01.515000000: 250000 +000000000 00:00:16.422000000
06:46:21.234000000: 300000 +000000000 00:00:19.719000000
06:46:43.609000000: 350000 +000000000 00:00:22.375000000
06:47:08.796000000: 400000 +000000000 00:00:25.187000000
06:47:36.531000000: 450000 +000000000 00:00:27.735000000
06:48:06.828000000: 500000 +000000000 00:00:30.297000000
06:48:40.546000000: 550000 +000000000 00:00:33.718000000
06:49:16.875000000: 600000 +000000000 00:00:36.329000000
06:49:56.328000000: 650000 +000000000 00:00:39.453000000
06:50:38.109000000: 700000 +000000000 00:00:41.781000000
06:51:23.734000000: 750000 +000000000 00:00:45.625000000
06:52:11.093000000: 800000 +000000000 00:00:47.359000000
06:53:02.765000000: 850000 +000000000 00:00:51.672000000
06:53:56.796000000: 900000 +000000000 00:00:54.031000000
06:54:52.265000000: 950000 +000000000 00:00:55.469000000
06:55:50.218000000: 1000000 +000000000 00:00:57.953000000

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> rollback
2 /

Rollback is voltooid.

A linear slowdown can be seen here. It start off pretty efficient, but for every 100,000 rows there is approximately 6 seconds slowdown. The last 50,000 rows take 1 minute!

Here is the last variant which uses the lck_real_parent procedure:

rwijk@ORA11G> exec mypkg.reset_timestamp

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> create or replace trigger mytrg
2 before update on child
3 for each row
4 begin
5 mypkg.lck_real_parent(:new.parent_id)
6 ;
7 if mod(:new.id,50000) = 0
8 then
9 mypkg.print_time_and_text(lpad(:new.id,7));
10 end if;
11 end;
12 /

Trigger is aangemaakt.

rwijk@ORA11G> begin
2 mypkg.print_time_and_text(' start');
3 update child
4 set enddate = enddate - 1
5 ;
6 end;
7 /
07:07:24.781000000: start
07:07:30.171000000: 50000 +000000000 00:00:05.390000000
07:07:36.046000000: 100000 +000000000 00:00:05.875000000
07:07:41.515000000: 150000 +000000000 00:00:05.469000000
07:07:47.640000000: 200000 +000000000 00:00:06.125000000
07:07:53.234000000: 250000 +000000000 00:00:05.594000000
07:07:59.125000000: 300000 +000000000 00:00:05.891000000
07:08:05.062000000: 350000 +000000000 00:00:05.937000000
07:08:10.921000000: 400000 +000000000 00:00:05.859000000
07:08:16.703000000: 450000 +000000000 00:00:05.782000000
07:08:22.593000000: 500000 +000000000 00:00:05.890000000
07:08:28.156000000: 550000 +000000000 00:00:05.563000000
07:08:33.953000000: 600000 +000000000 00:00:05.797000000
07:08:39.703000000: 650000 +000000000 00:00:05.750000000
07:08:46.031000000: 700000 +000000000 00:00:06.328000000
07:08:51.796000000: 750000 +000000000 00:00:05.765000000
07:08:57.828000000: 800000 +000000000 00:00:06.032000000
07:09:03.453000000: 850000 +000000000 00:00:05.625000000
07:09:09.562000000: 900000 +000000000 00:00:06.109000000
07:09:15.015000000: 950000 +000000000 00:00:05.453000000
07:09:20.859000000: 1000000 +000000000 00:00:05.844000000

PL/SQL-procedure is geslaagd.


This one is less efficient for smaller sets, but at least this solution is constant and therefore far more scalable than the previous one. The point where the dbms_lock.request procedure becomes slower seems to be a bit below 100,000 rows. This may be high enough for most tables to stick with dbms_lock.request, but it's always good to know this scalability issue.

My colleague noted that this behaviour is documented, but I could not find it in the documentation about dbms_lock. Turned out he saw the following text of the package itself:

rwijk@ORA11G> select text
2 from all_source
3 where owner = 'SYS'
4 and name = 'DBMS_LOCK'
5 order by line
6 /

TEXT
----------------------------------------------------------------------------------------------

...
package dbms_lock is
...
---------------
-- LIMITATIONS
--
-- The implementation does not support large numbers of locks efficiently.
-- A few hundred locks per session should be the limit.


So it is a known fact already. Too bad I just found out about it now ...

Friday, July 11, 2008

A new blog and a new book

Just noticed that Christian Antognini started blogging a couple of weeks ago. I added the blog to my reader, hoping for more interesting posts like this one.

Christian's book Troubleshooting Oracle Performance is available now. I'm impressed with the table of contents alone and so the book is on my wish list now. The book also has two forewords written by Cary Millsap and Jonathan Lewis that can be read here. And if guys like Cary and Jonathan recommend a book, it must be good.