Thursday, December 27, 2007

Top 3 of 2007

I don't know about other countries, but here in the Netherlands every radio station seems to have its own top 50/100/500/2000 of hits. So I decided to go along with the trend and make my own top 3 ... of OTN forum posts that made me laugh in 2007:

1) How much a three year old DBA is worth?

2) PL/SQL New Development Standard: Frameworkia

3) Why Padfield earns twice as much as Robertson

Have fun reading and since this will be my last post for 2007: I wish all readers a happy and healthy 2008.

Monday, December 24, 2007

Check constraints with sysdate

Implementing a check constraint with sysdate in it, is not possible. Not until 11g that is, where you can use a trick to do this using virtual columns. Here is a small demo:

First, let's create a table and try to place a check constraint to check that a date should be a date in the past:

rwijk@ORA11G> create table mytable
2 ( id number(4)
3 , mydate date
4 )
5 /

Tabel is aangemaakt.

rwijk@ORA11G> alter table mytable add constraint myck1 check (mydate < sysdate)
2 /
alter table mytable add constraint myck1 check (mydate < sysdate)
*
FOUT in regel 1:
.ORA-02436: Datum- of systeemvariabele is onjuist opgegeven in CHECK-beperking.


As expected, ORA-02436 (date or system variable wrongly specified in CHECK constraint) is returned.

Wrapping the check inside a function does not help:

rwijk@ORA11G> create function mydate_in_past
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /

Functie is aangemaakt.

rwijk@ORA11G> alter table mytable
2 add constraint myck1 check (mydate_in_past(mydate) = 'Y')
3 /
add constraint myck1 check (mydate_in_past(mydate) = 'Y')
*
FOUT in regel 2:
.ORA-00904: "MYDATE_IN_PAST": ongeldige ID


Since a function is not allowed inside a check constraint. So let's add the function as a virtual column to the table:

rwijk@ORA11G> alter table mytable
2 add (mydate_in_past_ind as (mydate_in_past(mydate)))
3 /
add (mydate_in_past_ind as (mydate_in_past(mydate)))
*
FOUT in regel 2:
.ORA-30553: De functie is niet deterministisch.


Oracle is clever enough to know that you can only use deterministic functions for virtual columns. And a function using sysdate is never deterministic of course. Validating input of 25-12-2007 will give FALSE as a result, but tomorrow it will be TRUE. But let's try to fool Oracle by just saying the function is deterministic, where it is not:

rwijk@ORA11G> create or replace function mydate_in_past
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /

Functie is aangemaakt.

rwijk@ORA11G> alter table mytable
2 add (mydate_in_past_ind as (cast(mydate_in_past(mydate) as varchar2(1))))
3 /

Tabel is gewijzigd.

rwijk@ORA11G> desc mytable
Naam Null? Type
-------------------------------------------------------------------------- -------- --------------------
ID NUMBER(4)
MYDATE DATE
MYDATE_IN_PAST_IND VARCHAR2(1)


It works! And now you can check whether mydate is a date in the past:

rwijk@ORA11G> alter table mytable
2 add constraint myck1 check (mydate_in_past_ind = 'Y')
3 /

Tabel is gewijzigd.


So we have fooled Oracle by saying the function is deterministic and we can reason that in this case it is not bad, because when a row satisfies the constraint, it will keep satisfying the constraint as time passes.

Let's check if the constraint works:

rwijk@ORA11G> insert into mytable (id,mydate) values (1,date '2007-01-01')
2 /

1 rij is aangemaakt.

rwijk@ORA11G> insert into mytable (id,mydate) values (2,date '2020-01-01')
2 /
insert into mytable (id,mydate) values (2,date '2020-01-01')
*
FOUT in regel 1:
.ORA-02290: CHECK-beperking (RWIJK.MYCK1) is geschonden.


Great, but you have to be careful.
Watch what happens when checking whether a date is in the future. This cannot ever be a real constraint, as there will always be a time that a date will not be in the future anymore. But using the same technique as above it is possible:

rwijk@ORA11G> rollback
2 /

Rollback is voltooid.

rwijk@ORA11G> alter table mytable add (mydate2 date)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> create function mydate_in_future
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate > sysdate then 'Y' else 'N' end;
7 end mydate_in_future;
8 /

Functie is aangemaakt.

rwijk@ORA11G> alter table mytable
2 add (mydate2_in_future_ind as (cast(mydate_in_future(mydate2) as varchar2(1))))
3 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table mytable
2 add constraint myck2 check (mydate2_in_future_ind = 'Y')
3 /

Tabel is gewijzigd.

rwijk@ORA11G> desc mytable
Naam Null? Type
-------------------------------------------------------------------------- -------- -----------------
ID NUMBER(4)
MYDATE DATE
MYDATE_IN_PAST_IND VARCHAR2(1)
MYDATE2 DATE
MYDATE2_IN_FUTURE_IND VARCHAR2(1)

rwijk@ORA11G> insert into mytable (id,mydate2) values (1,date '2007-01-01')
2 /
insert into mytable (id,mydate2) values (1,date '2007-01-01')
*
FOUT in regel 1:
.ORA-02290: CHECK-beperking (RWIJK.MYCK2) is geschonden.


rwijk@ORA11G> insert into mytable (id,mydate2) values (2,date '2020-01-01')
2 /

1 rij is aangemaakt.


So it is quite easy to misuse this feature. But when used carefully it is useful for implementing more business rule declaratively.

Previously, this type of constraint would be implemented using a database trigger. But we all know by now :-) that database triggers are evil. For those of us that don't know this yet, let's do a small performance test, to prove this once again, after dropping all previous objects:

rwijk@ORA11G> create function mydate_in_past
2 ( p_mydate in date
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /

Functie is aangemaakt.

rwijk@ORA11G> create table mytable_virtual_column
2 ( id number(6)
3 , mydate date
4 , mydate_in_past_ind as (cast(mydate_in_past(mydate) as varchar2(1)))
5 , constraint myck1 check (mydate_in_past_ind = 'Y')
6 )
7 /

Tabel is aangemaakt.

rwijk@ORA11G> create table mytable_database_trigger
2 ( id number(6)
3 , mydate date
4 )
5 /

Tabel is aangemaakt.

rwijk@ORA11G> create trigger mytrigger
2 before insert or update on mytable_database_trigger
3 for each row
4 begin
5 if inserting
6 or updating('mydate')
7 then
8 if :new.mydate >= sysdate
9 then
10 raise_application_error(-20000,'mydate must be a date in the past.');
11 end if;
12 end if;
13 end;
14 /

Trigger is aangemaakt.

rwijk@ORA11G> exec runstats_pkg.rs_start

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> insert into mytable_virtual_column (id,mydate)
2 select level
3 , date '2007-01-01'
4 from dual
5 connect by level <= 100000
6 /

100000 rijen zijn aangemaakt.

rwijk@ORA11G> exec runstats_pkg.rs_middle

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> insert into mytable_database_trigger (id,mydate)
2 select level
3 , date '2007-01-01'
4 from dual
5 connect by level <= 100000
6 /

100000 rijen zijn aangemaakt.

rwijk@ORA11G> exec runstats_pkg.rs_stop(100)
Run1 draaide in 42 hsecs
Run2 draaide in 172 hsecs
Run1 draaide in 24,42% van de tijd

Naam Run1 Run2 Verschil
STAT.DB time 43 168 125
STAT.CPU used when call started 42 169 127
STAT.CPU used by this session 39 169 130
STAT.redo size 2,458,396 2,458,560 164
LATCH.object queue header operation 773 994 221
STAT.free buffer inspected 0 250 250

Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
15,027 15,528 501 96.77%

PL/SQL-procedure is geslaagd.


I repeated the test several times, and got consistent results: using declarative constraints with virtual columns is approximately four times faster than using those dreaded database triggers.

Friday, December 21, 2007

Simulating print_table with unpivot

With the advent of UNPIVOT in Oracle11g, a SQL only solution for printing table contents vertically has been made a little easier. I've used Tom Kyte's print_table function for this purpose occasionally. You can read about this function here on AskTom. The following SQL using unpivot is definitely not easier than issuing

exec print_table('select * from emp')
, so it's just for fun:

rwijk@ORA11G>  select decode
2 ( col
3 , 'LINE', '======================'
4 , rpad(col,8) || ': ' || value
5 ) "Contents of table EMP"
6 from ( select to_char(empno) empno
7 , ename
8 , job
9 , to_char(mgr) mgr
10 , to_char(hiredate,'dd-mm-yyyy') hiredate
11 , to_char(sal) sal
12 , to_char(comm) comm
13 , to_char(deptno) deptno
14 , '=' line
15 from emp
16 )
17 unpivot include nulls
18 (value for col in (empno,ename,job,mgr,hiredate,sal,comm,deptno,line))
19 /

Contents of table EMP
--------------------------------------------------
EMPNO : 7369
ENAME : SMITH
JOB : CLERK
MGR : 7902
HIREDATE: 17-12-1980
SAL : 800
COMM :
DEPTNO : 20
======================
EMPNO : 7499
ENAME : ALLEN
JOB : SALESMAN
MGR : 7698
HIREDATE: 20-02-1981
SAL : 1600
COMM : 300
DEPTNO : 30
======================
EMPNO : 7521
ENAME : WARD
JOB : SALESMAN
MGR : 7698
HIREDATE: 22-02-1981
SAL : 1250
COMM : 500
DEPTNO : 30
======================
EMPNO : 7566
ENAME : JONES
JOB : MANAGER
MGR : 7839
HIREDATE: 02-04-1981
SAL : 2975
COMM :
DEPTNO : 20
======================
EMPNO : 7654
ENAME : MARTIN
JOB : SALESMAN
MGR : 7698
HIREDATE: 28-09-1981
SAL : 1250
COMM : 1400
DEPTNO : 30
======================
EMPNO : 7698
ENAME : BLAKE
JOB : MANAGER
MGR : 7839
HIREDATE: 01-05-1981
SAL : 2850
COMM :
DEPTNO : 30
======================
EMPNO : 7782
ENAME : CLARK
JOB : MANAGER
MGR : 7839
HIREDATE: 09-06-1981
SAL : 2450
COMM :
DEPTNO : 10
======================
EMPNO : 7788
ENAME : SCOTT
JOB : ANALYST
MGR : 7566
HIREDATE: 09-12-1982
SAL : 3000
COMM :
DEPTNO : 20
======================
EMPNO : 7839
ENAME : KING
JOB : PRESIDENT
MGR :
HIREDATE: 17-11-1981
SAL : 5000
COMM :
DEPTNO : 10
======================
EMPNO : 7844
ENAME : TURNER
JOB : SALESMAN
MGR : 7698
HIREDATE: 08-09-1981
SAL : 1500
COMM : 0
DEPTNO : 30
======================
EMPNO : 7876
ENAME : ADAMS
JOB : CLERK
MGR : 7788
HIREDATE: 12-01-1983
SAL : 1100
COMM :
DEPTNO : 20
======================
EMPNO : 7900
ENAME : JAMES
JOB : CLERK
MGR : 7698
HIREDATE: 03-12-1981
SAL : 950
COMM :
DEPTNO : 30
======================
EMPNO : 7902
ENAME : FORD
JOB : ANALYST
MGR : 7566
HIREDATE: 03-12-1981
SAL : 3000
COMM :
DEPTNO : 20
======================
EMPNO : 7934
ENAME : MILLER
JOB : CLERK
MGR : 7782
HIREDATE: 23-01-1982
SAL : 1300
COMM :
DEPTNO : 10
======================

126 rijen zijn geselecteerd.


Note that I have to convert all columns to varchar2's to be able to unpivot them. Also, the non default "including nulls" is used to explicitly generate rows when the COMM column is null.

Saturday, December 15, 2007

Parallellism in a skip locked scenario

During the Oracle 11g for developers session, more specifically when I was talking about the FOR UPDATE SKIP LOCKED, I received a question about the possibility of implementing parallellism if a batch program used the FOR UPDATE SKIP LOCKED strategy. Would it work to just kick off N processes simultaneously without altering the code?

Normally you would have to group the workload somehow - probably using the NTILE analytic function - and assign a piece of the workload to each of the slaves. But using the FOR UPDATE SKIP LOCKED, if one of the processes locks some rows, the other processes would skip those rows, and lock some others, and so on. Interesting question and I didn't know the answer right away, so I just tested it.

Let's create a table with a processed flag. This one is necessary in a skip locked scenario as you need to be able to track down which rows were processed and which ones were not. You can handle the ones that were skipped next day or next week.

rwijk@ORA11G> create table parallel_skiplocked_test
2 (id, procesnumber, processed_flag, amount, processed_timestamp)
3 as
4 select level
5 , cast(null as integer)
6 , 'N'
7 , cast(null as number(9))
8 , cast(null as timestamp)
9 from dual
10 connect by level <= 100000
11 /

Tabel is aangemaakt.

rwijk@ORA11G> alter table parallel_skiplocked_test add constraint pst_pk primary key (id)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'parallel_skiplocked_test',cascade=>true)

PL/SQL-procedure is geslaagd.


And create a procedure that processes all rows using the FOR UPDATE SKIP LOCKED clause:

rwijk@ORA11G> create procedure process_all(p_procesnumber in integer)
2 is
3 cursor c_pst
4 is
5 select id
6 , amount
7 from parallel_skiplocked_test
8 where processed_flag = 'N'
9 for update skip locked
10 ;
11 type tr_pst is record
12 ( id parallel_skiplocked_test.id%type
13 , amount parallel_skiplocked_test.amount%type
14 );
15 type t_pst is table of tr_pst;
16 r_pst t_pst;
17 begin
18 open c_pst;
19 loop
20 fetch c_pst bulk collect into r_pst limit 100
21 ;
22 for i in 1..r_pst.count
23 loop
24 r_pst(i).amount := trunc(mod(10000 * i * dbms_random.value,10000));
25 end loop
26 ;
27 forall i in 1..r_pst.count
28 update parallel_skiplocked_test
29 set procesnumber = p_procesnumber
30 , amount = r_pst(i).amount
31 , processed_flag = 'Y'
32 , processed_timestamp = systimestamp
33 where id = r_pst(i).id
34 ;
35 exit when c_pst%notfound
36 ;
37 end loop;
38 close c_pst;
39 end process_all;
40 /

Procedure is aangemaakt.


Now let's run the procedure in a single process using dbms_job:

rwijk@ORA11G> declare
2 l_job_id1 binary_integer;
3 begin
4 dbms_job.submit(l_job_id1,'process_all(1);');
5 end;
6 /

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> commit
2 /

Commit is voltooid.


This is the result:

rwijk@ORA11G> select procesnumber
2 , count(*)
3 , min(processed_timestamp) min_timestamp
4 , max(processed_timestamp) max_timestamp
5 from parallel_skiplocked_test
6 group by procesnumber
7 /

PROCESNUMBER COUNT(*) MIN_TIMESTAMP MAX_TIMESTAMP
------------ ---------- ------------------------- -------------------------
1 100000 15-12-07 01:04:20,734000 15-12-07 01:04:30,375000

1 rij is geselecteerd.


It takes 10 seconds.
Now let's try with 4 processes simultaneously, after recreating the entire scenario the same way as above:

rwijk@ORA11G> declare
2 l_job_id1 binary_integer;
3 l_job_id2 binary_integer;
4 l_job_id3 binary_integer;
5 l_job_id4 binary_integer;
6 begin
7 dbms_job.submit(l_job_id1,'process_all(1);');
8 dbms_job.submit(l_job_id2,'process_all(2);');
9 dbms_job.submit(l_job_id3,'process_all(3);');
10 dbms_job.submit(l_job_id4,'process_all(4);');
11 end;
12 /

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> pause

rwijk@ORA11G> select procesnumber
2 , count(*)
3 , min(processed_timestamp) min_timestamp
4 , max(processed_timestamp) max_timestamp
5 from parallel_skiplocked_test
6 group by procesnumber
7 /

PROCESNUMBER COUNT(*) MIN_TIMESTAMP MAX_TIMESTAMP
------------ ---------- ------------------------- -------------------------
1 23555 15-12-07 01:05:05,734000 15-12-07 01:05:16,656000
2 25445 15-12-07 01:05:05,953000 15-12-07 01:05:17,703000
4 26400 15-12-07 01:05:05,968000 15-12-07 01:05:16,656000
3 24600 15-12-07 01:05:05,796000 15-12-07 01:05:16,640000

4 rijen zijn geselecteerd.


And look: it works! All 4 processes did approximately 25,000 rows. However, instead of 1 time 10 seconds, you now consumed 4 times 11 or 12 seconds. Likely because all four processes are hitting the same resources at the same time, causing contention. So it is of no practical use, but still fun to know. And thanks to Tijmen for the interesting question.

Friday, December 14, 2007

Several 11g oddities

In an earlier post about setting up journaling using flashback data archives I mentioned the risk of losing undo information when adding a constraint to the table. During the preparation for the session Oracle 11g for developers I discovered four oddities:


1) Result Cache:

In the concepts manual, there is a picture of the Result Cache in the SGA. The drawing tells us that the Result Cache is a separate memory area, just like you have the Shared Pool, Large Pool or Buffer Cache. However when executing the procedure dbms_result_cache.memory_report, the output shows this:

rwijk@ORA11G> exec dbms_result_cache.memory_report
R e s u l t C a c h e M e m o r y R e p o r t
[Parameters]
Block Size = 1K bytes
Maximum Cache Size = 1056K bytes (1056 blocks)
Maximum Result Size = 52K bytes (52 blocks)
[Memory]
Total Memory = 5140 bytes [0.003% of the Shared Pool]

... Fixed Memory = 5140 bytes [0.003% of the Shared Pool]
... Dynamic Memory = 0 bytes [0.000% of the Shared Pool]


Which makes you believe that the Result Cache is part of the Shared Pool. So which one of the two is true? The important part for me to know is that it is in shared memory, but still the difference is strange.


2) Reference partitioning

Suppose the following situation: three tables in a master-detail-detail relationship. The master is range partitioned, the child table is reference partitioned, and the grandchild table is also reference partitioned. When trying to insert a row in the grandchild table that violates the foreign key you'd expect the rather well known error message ORA-02291, integrity constraint violated. The error message you receive however is this one:

ORA-14400: De ingevoerde partitiesleutel wordt niet toegewezen aan een partitie.

(translation: inserted partition key does not map to any partition)


3) Fine Grained Dependency Tracking

This feature reduces the odds that your packages become invalid, by storing the dependencies at the column level instead of table level. It works - I've seen it - but I cannot see the column dependencies anywhere. The obvious place, USER_DEPENDENCIES, does not show any column level information. The documentation does not seem to mention an alternative place where this information is stored. Hopefully I overlooked something, and I would appreciate it if someone would tell me where to find it. And I don't mean some X$-table, but in the regular USER_... views.


4) Native compilation

This one was staring at my face when I developed the demonstration script, but I missed it. Only after I was being notified by a colleague watching the session, I saw it too. When compiling a procedure natively, the PLSQL_OPTIMIZE_LEVEL has to be at least 2 according to the documentation. So in my script I set it to 1, and show that a warning is generated and the procedure is NOT natively compiled. But when querying the USER_PLSQL_OBJECT_SETTINGS view, it says that it is natively compiled:

rwijk@ORA11G> alter session set plsql_optimize_level = 1
2 /

Sessie is gewijzigd.

rwijk@ORA11G> alter session set plsql_code_type = native
2 /

Sessie is gewijzigd.

rwijk@ORA11G> create procedure p2
2 is
3 begin
4 for i in 1..100000000
5 loop
6 null;
7 end loop
8 ;
9 dbms_output.put_line('p2');
10 end;
11 /

SP2-0804: Procedure aangemaakt met compilatiewaarschuwingen

rwijk@ORA11G> show err
Fouten voor PROCEDURE P2:

LINE/COL ERROR
-------- -----------------------------------------------------------------
0/0 PLW-06014: Het genereren van native code is uitgeschakeld omdat
PLSQL_OPTIMIZE_LEVEL <= 1.

rwijk@ORA11G> select name
2 , type
3 , plsql_code_type
4 , plsql_optimize_level
5 , plsql_warnings
6 from user_plsql_object_settings
7 where name like 'P_'
8 order by name
9 /

NAME TYPE PLSQL_CODE_TYPE PLSQL_OPTIMIZE_LEVEL PLSQL_WARNINGS
------------------------------ ------------ ------------------------------ -------------------- ------------------------------
P1 PROCEDURE INTERPRETED 2 ENABLE:ALL
P2 PROCEDURE NATIVE 1 ENABLE:ALL
P3 PROCEDURE NATIVE 2 ENABLE:ALL

Wednesday, December 12, 2007

Oracle 11g for developers

Today I will be doing an in-house knowledge session about Oracle 11g for Developers. The powerpoint presentation and the demonstration scripts can be downloaded here:

Powerpoint presentatie Oracle 11g voor ontwikkelaars
Bijbehorende demonstratiescripts

Warning: they are in Dutch.

Friday, December 7, 2007

Multicolumn statistics

Of all new features of 11g, there are the ones that are talked about a lot, like the result cache and virtual columns. I haven't seen much about multicolumn statistics yet, apart from appearing in several bullet lists about 11g. However, I have seen several performance issues in the past, that could have benefitted from this feature, so I am quite excited about this one. The feature addresses what Wolfgang Breitling calls "Fallacy II - The Predicate Independence Assumption" in his paper Fallacies of the Cost Based Optimizer. Here is an example of how it works:

Let's create a fake order table:

rwijk@ORA11G> create table orders
2 as
3 select level nr
4 , 'Customer ' || to_char(level) customer_name
5 , case
6 when level <= 500 then 'London'
7 when level <= 550 then 'New York'
8 when level <= 600 then 'Paris'
9 when level <= 650 then 'Vienna'
10 when level <= 700 then 'Moscow'
11 when level <= 750 then 'Berlin'
12 when level <= 800 then 'Rome'
13 when level <= 850 then 'Madrid'
14 when level <= 900 then 'Brussels'
15 else 'Amsterdam'
16 end city
17 , case
18 when level < 500 then 'UK'
19 when level <= 550 then 'USA'
20 when level <= 600 then 'France'
21 when level <= 650 then 'Austria'
22 when level <= 700 then 'Russia'
23 when level <= 750 then 'Germany'
24 when level <= 800 then 'Italy'
25 when level <= 850 then 'Spain'
26 when level <= 900 then 'Belgium'
27 else 'Netherlands'
28 end country
29 from dual
30 connect by level <= 1000
31 /

Tabel is aangemaakt.


And gather statistics normally.

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders')

PL/SQL-procedure is geslaagd.


Because the case expression in the create table statement may not read easily, I did these select statements to give you a better idea of what is in the table:

rwijk@ORA11G> select city
2 , count(*)
3 from orders
4 group by city
5 order by city
6 /

CITY COUNT(*)
--------- --------------------------------------
Amsterdam 100
Berlin 50
Brussels 50
London 500
Madrid 50
Moscow 50
New York 50
Paris 50
Rome 50
Vienna 50

10 rijen zijn geselecteerd.

rwijk@ORA11G> select country
2 , count(*)
3 from orders
4 group by country
5 order by country
6 /

COUNTRY COUNT(*)
----------- --------------------------------------
Austria 50
Belgium 50
France 50
Germany 50
Italy 50
Netherlands 100
Russia 50
Spain 50
UK 499
USA 51

10 rijen zijn geselecteerd.

rwijk@ORA11G> select city
2 , country
3 , count(*)
4 from orders
5 group by city
6 , country
7 order by city
8 , country
9 /

CITY COUNTRY COUNT(*)
--------- ----------- --------------------------------------
Amsterdam Netherlands 100
Berlin Germany 50
Brussels Belgium 50
London UK 499
London USA 1
Madrid Spain 50
Moscow Russia 50
New York USA 50
Paris France 50
Rome Italy 50
Vienna Austria 50

11 rijen zijn geselecteerd.


So 10 cities, 10 countries, with a clear dependency between the columns city and country. Although not exactly one-on-one, because - as everybody knows :-) - London is also a town in Ohio, USA. Don't get extracted by this fact too much; it won't play a role in the example.

Ok, so let's have a look at the cardinalities in the following query:

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 290 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 10 | 290 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CITY"='London' AND "COUNTRY"='UK')

13 rijen zijn geselecteerd.


Since histograms are not present on the table yet, the cost based optimizer has predicted 10 rows, because the filter predicate city = 'London' has 10 distinct values, and the predicate country = 'UK' also has 10 distinct values. Because of the predicate independence assumption, the cost based optimizer calculates 1000 rows * (1/10) * (1/10) = 10 rows.

Also it doesn't matter which value I put in the query:

rwijk@ORA11G> exec dbms_lock.sleep(1)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 290 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 10 | 290 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CITY"='London' AND "COUNTRY"='Netherlands')

13 rijen zijn geselecteerd.


Even when searching for London in the Netherlands, it predicts 10. We all know that the query would of course return 0 rows, but the cost based optimizer does not.

So let's create histograms on the columns, to have better estimates:

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders',
method_opt=>'FOR ALL COLUMNS')

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 7250 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 250 | 7250 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COUNTRY"='UK' AND "CITY"='London')

13 rijen zijn geselecteerd.


Now the prediction is 250 rows. This is calculated based on the histogram information that tells the cost based optimizer that London occurs 500 times out of 1000 rows and the UK occurs 499 times out of 1000 rows. 1000 rows * (500/1000) * (499/1000) is approximately 250 rows.

And for the query using London in the Netherlands:

rwijk@ORA11G> exec dbms_lock.sleep(1)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 1450 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 50 | 1450 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COUNTRY"='Netherlands' AND "CITY"='London')

13 rijen zijn geselecteerd.


Now 50 rows are calculated, because 1000 rows * (500/1000) * (100/1000) = 50.

The above is the best you could get before 11g. With the use of multicolumn statistics, the row counts will be REALLY accurate:

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders',
method_opt=>'FOR COLUMNS (city,country)')

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> pause

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 499 | 15469 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 499 | 15469 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COUNTRY"='UK' AND "CITY"='London')

13 rijen zijn geselecteerd.


499 exactly! And for London in the Netherlands:

rwijk@ORA11G> exec dbms_lock.sleep(1)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /

Uitleg is gegeven.

rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 1 | 31 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("COUNTRY"='Netherlands' AND "CITY"='London')

13 rijen zijn geselecteerd.


Since the optimizer won't calculate with 0, it uses 1, but again very accurate! Much more precise cardinality estimates will lead to better decisions regarding whether to use an index or not, for example. It can really make a big difference in a production system. Problem queries can now be tuned more often by analyzing instead of adding hints to your code.