Wednesday, August 8, 2012

OEM DBconsole locking and huge resource usage

For some time now, we have had an 11g 2-node RAC database up and running (actually, we have more, but this one behaved somewhat different than expected). It has never been really necessary to plug it into Grid Control, so we still use DBConsole to monitor it when we need. It obviously is no production system we are talking about here...

Today, I came across something funny. The servers had been restarted some time ago, and we had decided to shut down the second node, in order to be able to determine the effect on the application the developers were working on. DBConsole hadn't been switched on yet, so I started it. And that is when the trouble began.

Within seconds, the CPU and I/O usage went way up. DBConsole was still (somewhat) responsive, so I investigated the problem using OEM. Two major problems were visible:
  • Thousands of "active sessions" reported on the Cluster Home page
  • A lot of active SYSMAN sessions, mainly for locking a MGMT_FAILOVER_TABLE and for executing EMD_NOTIFICATION.OMS_FAILOVER

One of the SYSMAN sessions was blocking several others, but getting apparently nowhere with the work it should be doing. Searching the internet I found some possible causes, but none of them seemed to help me here.

Then I decided to just use trial-and-error. Obviously, the MGMT_FAILOVER_TABLE was involved somewhere. Looking at the data in that table revealed that there were 135 rows in there. Not knowing whether this is normal, I looked it up in other RAC databases. These had only 1 row in the table, so I stopped DBConsole, made a backup of the table contents, removed all rows (probably could have left the most recent one in there..) and started DBConsole. It's been quite some time already, and the system is still running smoothly with low CPU and I/O usage, and a normal load from the developers.

I still don't know why there were that many rows in this table, but I am confident that there shouldn't be. Luckily, the system is smart enough to recover if you delete all the rows and after that, normal operations resume. If anyone knows what causes this, feel free to comment.

I'm looking forward to any answer that clarifies what really happened here ;-)


For those who wish to reproduce: the DB and OS used are
  • 11gR2 (11.2.0.2.0)
  • RedHat Linux 5.5

Friday, February 24, 2012

SRW.RUN_REPORT and the dreaded REP-0178

I was working on a Forms&Reports migration last month (6i to 11gR2) and had little trouble upgrading the entire application, except for one small part: some reports were used as a starting report, calling other reports (based on various criteria), using SRW.RUN_REPORT to do so.

SRW.RUN_REPORT was calling the other reports using both the "userid" en "server" parameters, which were deprecated in 10g. Using the "userid" will net you an Oracle error: REP-01434.
If you set the "REPORTS_SRWRUN_TO_SERVER" environment variable to YES, this error can be avoided, but I chose to eliminate these keywords altogether (as there weren't that many reports that used them). So after this, I expected the reports to run smoothly...

When invoking the starting report via the URL, the report itself ran OK, but the SRW.RUN_REPORT did not. Eventually, in the rwEng-0_diagnostic.log file In found only “REP-1428 An error occurred while running procedure SRW.RUN_REPORT in program unit beforereport”. Not much help there.

Because the SRW.RUN_REPORT effectively calls rwclient.exe (it’s a Windows 2008 implementation), I tried using rwclient from the command box. At first, this didn’t work either (REP-0178 Cannot connect to Reports Server), until I saw that one of the rwclient.bat scripts invoked reports.bat (setting the environment) and the other did not. When setting only two environment variables (ORACLE_HOME and ORACLE_INSTANCE), rwclient.exe also worked.

Encouraged by this, I set these two environment variables as system variables, so they will always be available for any program. Just to be sure, I rebooted the entire server as this was only an acceptation server at the moment. After reboot, the reports using SRW.RUN_REPORT also worked. Mission accomplished ;-).


Some details of the environment:
• OS = Windows 2008 R2
• AS = WebLogic 10.3.5
• FMW = 11.1.2 Forms&Reports Services
• Database (different server) = 11.2.0.3

Saturday, December 17, 2011

Oracle Certified Master 11g - finally !

It has been quiet on my blog lately, but that was for a reason. I was very busy studying and preparing for my OCM (Oracle Certified Master) 11g exam. This month, I took the (upgrade) exam and last week I got the highly anticipated "Congratulations" e-mail from Oracle!

So, as of now I am an Oracle Certified Master for 9i, 10g and 11g. Needless to say that I am very happy with this result. At the same time I will try to post more frequently on my blog, as from now on, I should have some time left to do this again.

Tuesday, June 28, 2011

Dynamic SQL: Using a variable number of binds a variable number of times

Well known fact: most queries you execute often, have better performance when you use bind variables. On two separate occasions in the last month, a developer came to me with a question about Dynamic SQL.

The problem they both had was that when dynamically creating a (complex) SQL statement, they wanted to use bind variables. However, they didn't know how many variables they were actually going to use and they didn't know how many times they were going to use them throughout the statement.

Let's start out with something simple to clarify the problem:

CREATE OR REPLACE PROCEDURE dynamic_sql_example
  ( par_a IN varchar2
  , par_b IN number
  , par_c IN varchar2
  )
IS
  TYPE  t_cur IS REF CURSOR;
  c_cur t_cur;
  type t_rec is record (par_a varchar2(20), par_b number, par_c varchar2(20));
  r_rec t_rec;
  --
  l_qt varchar2(1) := chr(39);
BEGIN
  OPEN c_cur FOR 'SELECT '||l_qt||par_a||l_qt||
                      ', '||par_b||
                      ', '||l_qt||par_c||l_qt||
                  ' FROM dual';
  FETCH c_cur INTO r_rec;
  -- do something here
  CLOSE c_cur;
END;
/


All that this actually does is open a cursor for a simple SQL select statement. The problem with this statement is that it will generate a different execution plan for each distinct set of parameters. In order to prevent this, we quickly switch to bind variables:

CREATE OR REPLACE PROCEDURE dynamic_sql_example
  ( par_a IN varchar2
  , par_b IN number
  , par_c IN varchar2
  )
IS
  TYPE  t_cur IS REF CURSOR;
  c_cur t_cur;
  type t_rec is record (par_a varchar2(20), par_b number, par_c varchar2(20));
  r_rec t_rec;
  --
  l_qt varchar2(1) := chr(39);
BEGIN
  OPEN c_cur FOR 'SELECT :x par_a'||
                      ', :y par_b'||
                      ', :z par_c'||
                  ' FROM dual'
           USING par_a, par_b, par_c;
  FETCH c_cur INTO r_rec;
  CLOSE c_cur;
END;
/

This statement is superior to the previous one, as it will generate one execution plan because it uses bind variables. If you think that's just a small detail, try googling "bind variables" and see why you should use them...

So, now the developer wants to create a dynamic SQL statement using an as of yet unknown number and order of bind variables:

CREATE OR REPLACE PROCEDURE dynamic_sql_example
  ( par_a IN varchar2
  , par_b IN number
  , par_c IN varchar2
  )
IS
  TYPE  t_cur IS REF CURSOR;
  c_cur t_cur;
  type t_rec is record (par_a varchar2(20), par_b number, par_c varchar2(20));
  r_rec t_rec;
  --
  l_qt varchar2(1) := chr(39);
  --
  l_stmt varchar2(1000);
  --
BEGIN
  IF par_b = 1 THEN
    l_stmt := 'SELECT :x par_a'||
                  ' , :y par_b'||
                  ' , :z par_c'||
               ' FROM dual'||
              ' WHERE :y = 1'||
                ' AND :x = ''A''';
  ELSE
    l_stmt := 'SELECT :x par_a'||
                  ' , :y par_b'||
                  ' , :z par_c'||
               ' FROM dual'||
              ' WHERE :x = ''Z''';
  END IF;
  --
  OPEN c_cur FOR l_stmt
           USING par_a, par_b, par_c;
  FETCH c_cur INTO r_rec;
  CLOSE c_cur;
END;
/

Depending on the value of par_b, the statement will need either 5 bind variables (when the value = 1) or 4 bind variables (when the value != 1). Thus, we can not use the OPEN..FOR..USING construction without having to resort to another IF..THEN..ELSE construct. When the statement becomes more complex, this becomes too complex to handle. Executing the code as displayed above, will net you an "ORA-01008: not all variables bound" error.

The alternative is to take the bind variables out of the equation and store them in an unambigious and predetermined order:

CREATE OR REPLACE PROCEDURE dynamic_sql_example
  ( par_a IN varchar2
  , par_b IN number
  , par_c IN varchar2
  )
IS
  TYPE  t_cur IS REF CURSOR;
  c_cur t_cur;
  type t_rec is record (par_a varchar2(20), par_b number, par_c varchar2(20));
  r_rec t_rec;
  --
  l_qt varchar2(1) := chr(39);
  --
  l_with varchar2(1000);
  l_stmt varchar2(1000);
  --
BEGIN
  l_with := 'WITH my_parameters '||
             ' AS (SELECT :x par_a'||
                      ' , :y par_b'||
                      ' , :z par_c'||
                   ' FROM dual) ';
  --
  IF par_b = 1 THEN
    l_stmt := 'SELECT par.par_a par_a'||
                  ' , par.par_b par_b'||
                  ' , par.par_c par_c'||
               ' FROM dual'||
                  ' , my_parameters  par'||
              ' WHERE par.par_b = 1'||
                ' AND par.par_a = ''A''';
  ELSE
    l_stmt := 'SELECT par.par_a par_a'||
                  ' , par.par_b par_b'||
                  ' , par.par_c par_c'||
               ' FROM dual'||
                  ' , my_parameters  par'||
              ' WHERE par.par_a = ''Z''';
  END IF;
  --
  OPEN c_cur FOR l_with || l_stmt
           USING par_a, par_b, par_c;
  FETCH c_cur INTO r_rec;
  CLOSE c_cur;
END;
/

Using a WITH..AS construct, we now have a predetermined set of bind variables in a predetermined order. By referencing them in the rest of the code as column values, we are completely independent of the number and order of the bind variables in that part.

Mission Accomplished :-)

Thursday, April 21, 2011

ORA-32321 MV FAST REFRESH after TRUNCATE - example

When you have (fast refreshable on commit) Materialized Views in your schema and want to truncate one or more of the underlying tables, you have a great chance of running into trouble after the truncate. This is when the ORA-32321 error usually appears:

  ORA-32321: REFRESH FAST of "..." unsupported after detail table TRUNCATE


This error occurs because the truncate operation does not record any changes in the MV log and thus a fast refresh will be out of the question for this MV. To solve this, you will have to issue a manual complete refresh on that MV. After that, fast refresh will be possible again.

So far, there is not much news. It gets interesting if you want to truncate and/or delete multiple (e.g. master-detail) tables that are the basis for an MV. When truncating a master-detail pair of tables, you basically have two choices:
  1. disable the foreign key between the tables, truncate them both and re-enable the foreign key
  2. truncate the detail table and after that, delete from the master table

As I have something against disabling constraints (because you must be really confident that re-enabling them again will succeed), I slightly prefer the second option.

There is a small difference in timing when the materialized views should be refreshed. In the first scenario, The MV should be completely refreshed after both tables are truncated and the constraint has been enabled again. In the second scenario, You should refresh the MV right after the truncate and again after the delete and commit of the master table. If you forget the first refresh, you will get the ORA-32321 error on commit of the delete on the master table. Should you forget the second refresh, the MV will not be refreshing at all, although you will not get an error!

In order to see this in action, I devised a simple example (see below).
  1. we create a master-detail pair of tables, and give them an initial rowset. After that, an MV is created (and at the same time populated).
  2. the detail table is truncated and rows from the master table are deleted. Note the timing of the complete refreshes of the MV
  3. the tables are populated again in two steps, proving that the MV is populated on commit, as expected

If you want to see this in action, run the statements as a script, or use them statement-by-statement in order to follow every step along the way. In order to see what happens when you omit one of the refreshes, just try it. Leave either the first or the second (or both) of the refreshes out and spot the differences. Leaving out the first returns on obvious error. Leaving out the second can cause some trouble, as it isn't really obvious that something is wrong. Right up to the point that users start complaining, should this happen in a production environment...

Example - using manual refresh on MV's to re-enable fast refresh on commit


  --
  -- Cleanup
  --

  DROP MATERIALIZED VIEW train_min_max_times_mv;
  DROP TABLE train_times;
  DROP TABLE trains;

  --
  -- Setup for the tables
  --

  CREATE TABLE trains
  ( train_number    number(10)
  , train_date      date
  , description     varchar2(100)
  , CONSTRAINT trains_pk PRIMARY KEY (train_number) USING INDEX
  );

  CREATE TABLE train_times
  ( train_number    number(10)
  , train_time      timestamp(0)
  , description     varchar2(100)
  , CONSTRAINT train_times_pk PRIMARY KEY (train_number, train_time) USING INDEX
  , CONSTRAINT train_times_trains_fk FOREIGN KEY (train_number)
                REFERENCES trains (train_number)
  );

  INSERT INTO trains VALUES (1, sysdate, 'first train');
  INSERT INTO trains VALUES (2, sysdate, 'second train');

  INSERT INTO train_times VALUES (1, systimestamp       , 'station A');
  INSERT INTO train_times VALUES (1, systimestamp+1/1440, 'station B');
  INSERT INTO train_times VALUES (1, systimestamp+2/1440, 'station C');
  INSERT INTO train_times VALUES (1, systimestamp+3/1440, 'station D');

  INSERT INTO train_times VALUES (2, systimestamp+4/1440, 'station X');
  INSERT INTO train_times VALUES (2, systimestamp+5/1440, 'station Y');
  INSERT INTO train_times VALUES (2, systimestamp+6/1440, 'station Z');

  COMMIT;

  --
  -- Create the materialized view and view logs
  --

  CREATE MATERIALIZED VIEW LOG ON trains
    WITH SEQUENCE, ROWID (train_number, train_date)
    INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW LOG ON train_times
    WITH SEQUENCE, ROWID (train_number, train_time)
    INCLUDING NEW VALUES;

  CREATE MATERIALIZED VIEW train_min_max_times_mv
    BUILD IMMEDIATE
    REFRESH FAST ON COMMIT
    ENABLE QUERY REWRITE
    AS SELECT tr.train_number
            , tr.train_date
            , min(tt.train_time)    min_train_time
            , max(tt.train_time)    max_train_time
            , count(*)              total_times
         FROM trains       tr
            , train_times  tt
        WHERE tt.train_number = tr.train_number
        GROUP BY tr.train_number
            , tr.train_date;

  --
  -- Truncate detail table, delete master table
  -- and make MV fast refreshable again
  --

  TRUNCATE TABLE train_times;

  exec dbms_mview.refresh('TRAIN_MIN_MAX_TIMES_MV','C');

  DELETE FROM trains;

  COMMIT;

  exec dbms_mview.refresh('TRAIN_MIN_MAX_TIMES_MV','C');

  SELECT * FROM train_min_max_times_mv;

  INSERT INTO trains VALUES (1, sysdate, 'first train');

  INSERT INTO train_times VALUES (1, systimestamp       , 'station A');
  INSERT INTO train_times VALUES (1, systimestamp+1/1440, 'station B');
  INSERT INTO train_times VALUES (1, systimestamp+2/1440, 'station C');
  INSERT INTO train_times VALUES (1, systimestamp+3/1440, 'station D');

  COMMIT;

  SELECT * FROM train_min_max_times_mv;

  INSERT INTO trains VALUES (2, sysdate, 'second train');

  INSERT INTO train_times VALUES (2, systimestamp+4/1440, 'station X');
  INSERT INTO train_times VALUES (2, systimestamp+5/1440, 'station Y');
  INSERT INTO train_times VALUES (2, systimestamp+6/1440, 'station Z');

  COMMIT;

  SELECT * FROM train_min_max_times_mv;


Example - disabling constraints when truncating

The first example can be modified to using disabling of constraints, by replacing the "truncate-refresh-delete-commit-refresh" portion in the example by the lines below.

  ALTER TABLE train_times MODIFY CONSTRAINT train_times_trains_fk DISABLE;

  TRUNCATE TABLE train_times;
  TRUNCATE TABLE trains;

  ALTER TABLE train_times MODIFY CONSTRAINT train_times_trains_fk ENABLE;

  exec dbms_mview.refresh('TRAIN_MIN_MAX_TIMES_MV','C');

Wednesday, April 20, 2011

RMAN pipe mode example - a simple interface for remote rman access

When reading the backup and recovery user's guide of the 11gR2 database (again), I came across the RMAN pipe interface. Maybe I came across this before, but I haven't really paid any attention to it. Untill now.

The RMAN pipe interface enables you to interact with RMAN when connected via a different tool or interface, like SQL*Plus. This means that you don't have to be connected to the host on which the database resides or you don't even have to be connected as sysdba to execute RMAN commands. Let's first look at the way to do this, and then we'll get back as to why you would want this (or not...)

According to the documentation, it is fairly easy to setup the pipe mode. Simply start the rman executable like this:

  [oracle@localhost ~]$ rman pipe orcl target / catalog rman@rmancat
  Recovery Manager: Release 11.2.0.2.0 - Production on Wed Apr 20 11:16:05 2011
  Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
  recovery catalog database Password:


After supplying the password, the prompt does nothing. this means that rman is now running in pipe mode and not accepting any input via the CLI.

If you want to verify that pipe mode is enabled, use sqlplus to check if the pipes are created as private pipes:

  [oracle@localhost ~]$ sqlplus / as sysdba
  SQL*Plus: Release 11.2.0.2.0 Production on Wed Apr 20 12:27:30 2011
  Copyright (c) 1982, 2010, Oracle. All rights reserved.
  Connected to:
  Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
  With the Partitioning, OLAP, Data Mining and Real Application Testing options
  SQL> select type, name from v$db_pipes;
  TYPE    NAME
  ------- ----------------------------------------
  PRIVATE ORA$RMAN_ORCL_IN
  PRIVATE ORA$RMAN_ORCL_OUT


So, now we have the pipes that the documentation told us we would have. So, how do we use these? The documentation suggests using dbms_pipe.pack_message and dbms_pipe.send_message:

  DECLARE
    l_return integer;
  BEGIN
    dbms_pipe.pack_message('list backup summary;');
    l_return := dbms_pipe.send_message('ORA$RMAN_ORCL_IN');
  END;
  /


Now, we have sent the first command to rman using the "IN" pipe. This means that we should expect some output on the "OUT" pipe as well:

  DECLARE
    l_return integer;
    l_text varchar2(4000);
  BEGIN
    l_return := dbms_pipe.receive_message('ORA$RMAN_ORCL_OUT');
    dbms_pipe.unpack_message(l_text);
    dbms_output.put_line(l_text);
  END;
  /


This returns one line of text. Usually, rman returns more than one line, so we'll have to repeat this untill we get all the lines. The last line to expect is a line saying "RMAN-00572: waiting for DBMS_PIPE input", which means that rman is waiting for a new command on the pipe.

Example interface

This calls for a procedure to handle some of this stuff for us. Or even better: we create a package. The following is a simplified example of how to handle an rman command using a pipe.

  CREATE OR REPLACE PACKAGE rman_pipe
  AS
    PROCEDURE do_command( p_command IN varchar2 );
  END;
  /

  CREATE OR REPLACE PACKAGE BODY rman_pipe
  AS
    PROCEDURE pack_message (p_command IN varchar2)
    IS
      l_return  integer;
    BEGIN
      --
      -- Pack the command (check for closing semi-colon)
      --
      IF substr(p_command,-1) != ';' THEN
        dbms_pipe.pack_message(p_command||';');
      ELSE
        dbms_pipe.pack_message(p_command);
      END IF;
      --
      -- Send the message
      --
      l_return := dbms_pipe.send_message('ORA$RMAN_ORCL_IN');
      IF l_return = 0 THEN
        dbms_output.put_line('.');
        dbms_output.put_line('RMAN command sent');
        dbms_output.put_line('.');
      ELSIF l_return = 1 THEN
        dbms_output.put_line('Timed Out');
      ELSIF l_return = 3 THEN
        dbms_output.put_line('An Interrupt Occurred');
      ELSE
        dbms_output.put_line('Unknown Return Code '||l_return);
      END IF;
    END pack_message;
    --
    PROCEDURE unpack_message (p_timeout IN integer default dbms_pipe.maxwait)
    IS
      l_return  integer         := 0;
      l_text    varchar2(4000)  := 'No text';
    BEGIN
      --
      -- Check the pipe for messages untill ORA-00572 is received
      --
      WHILE l_return  = 0
      AND   l_text   != 'RMAN-00572: waiting for DBMS_PIPE input'
      LOOP
        l_return := dbms_pipe.receive_message('ORA$RMAN_ORCL_OUT', p_timeout);
        IF l_return = 0 THEN
          dbms_pipe.unpack_message(l_text);
          dbms_output.put_line(l_text);
        ELSIF l_return = 1 THEN
          dbms_output.put_line('Timed Out');
        ELSIF l_return = 2 THEN
          dbms_output.put_line('Record in the pipe is too large for the buffer');
        ELSIF l_return = 3 THEN
          dbms_output.put_line('An Interrupt Occurred');
        ELSE
          dbms_output.put_line('Unknown Return Code '||l_return);
        END IF;
      END LOOP;
    END unpack_message;
    --
    PROCEDURE do_command( p_command IN varchar2 )
    IS
    BEGIN
      unpack_message(0);
      pack_message(p_command);
      unpack_message;
    END do_command;
  END;
  /


This is not as basic as when I started out, though. The problems I ran into, in chronological order:
  • When retrieving the results, you'll have to check if you receive the RMA-00572 message
=> after receiving this, stop extracting from the pipe
  • When packing the command, I sometimes forgot to put a semi-colon after the command
=> so I created the part where he checks for a semi-colon
  • When issuing the first command, there is something in the buffer:
  connected to target database: ORCL (DBID=1229390655)
  connected to recovery catalog database
  RMAN-00572: waiting for DBMS_PIPE input

This causes the first time you request the messages up to RMAN-00572, you will only get this part. On the next command you will get the output of the previous command, etc.
=> first, receive any old messages on the pipe, then execute the command and get the results
  • when no previous messages are on the pipe, you will wait indefinitely for a message
=> use a timeout for the first request for messages on the pipe, so you won't get stuck there

Conclusion

Still, this is a simple interface, but think of the possibilities:
  • You can now grant "execute" priviliges to a user, making that user the rman admin for this database


  • No connection via the host or sysdba is needed



  • Sure there are some (major?) drawbacks to overcome at this point:
  • When starting pipe mode as a background process (nohup &) the pipe mode will end when receiving an error (as with a malformed command)


  • When starting pipe mode as a background process, you have to provide the password on the commandline
  • Friday, April 15, 2011

    BULK COLLECT %FOUND and %NOTFOUND behaviour

    When using the BULK COLLECT feature in PL/SQL, a lot of programmers get confused on using the %FOUND and %NOTFOUND in their code. As it is said multiple times (On various sites by various people), it seems better to avoid this by using .count, but I think it is good to point out the behaviour of %FOUND and %NOTFOUND in these cases.

    Usually, you will have a code block similar to this:

      OPEN c_cur;
      LOOP
        FETCH c_cur BULK COLLECT INTO t_cur [limit <n>];
        l_found := c_cur%FOUND;
        CLOSE c_dual;
        [PL/SQL code here...]
        EXIT WHEN NOT l_found;
      END LOOP;

    When you are determined to use %FOUND or %NOTFOUND, note the meaning of these when using BULK COLLECT:
  • %FOUND will return TRUE if exactly [limit] rows were retrieved



  • If %FOUND returns TRUE than there might be more rows to be fetched



  • %FOUND will return FALSE if less than [limit] rows were retrieved



  • %FOUND will always return FALSE if [limit] is not set (you retrieve less than unlimited rows)



  • %NOTFOUND will always return the opposite as %FOUND (as is to be expected...)