Sunday, November 14, 2010

Why won't developers use indexes

It keeps turning up again and again. The application is released to the customer and after a few weeks, or even a few days, the first issues arrive on the slow performance.

Time and again, this is due to the application queries not using the right indexes, or even using any indexes at all. The queries can be well written, as I usually keep repeating to the developers how important that is. On the other hand, I keep telling everybody about the usefulness of indexes, and somehow that doesn't seem to stick...

So, let's try this one more time ;-)

If you write a query (any query), it is important to ask yourself a few questions:
  • What data will this query access
  • How are the tables in this query connected. Am I using the right connections (comparing the right attributes)
  • Am I using the right filter conditions on the data

These are questions of a more logical nature. This has everything to do with constructing a good query. The "hard" part is the technical part of the puzzle:
  • How will the database access my data
  • Are there enough fast access paths (at least one) to choose from
  • Will this query perform when retrieving the needed amount of data
These questions are really not that tough. Most of the time, when writing a query, you will have quite a good idea about what data and what amount of data is to be retrieved. Try to picture this query in the datamodel. How would the database be able to retrieve your data in the fastest way possible? On many occasions, the answer would be to use an index. When accessing a table with an index, only a portion of the table will have to be scanned and the access can be much, much faster.

So, to all developers out there, please consider your queries carefully. After creating the query, try to anticipate how it will function. If there is any doubt if the query will perform adequately, at least test it with a representative amount and distribution of data.

Of course, I am fully aware of the many types of queries out there, not to mention the many types of indexes you can use. Usually, it pays off to invest some time reading up on the use of indexes. There are a great number of resources to use out here on the internet, so I won't even try to begin explaining them. As a developer, do yourself a favor and spend some time getting to know the workings of the most common types of indexes.

And if all else fails: please contact your DBA...

Friday, November 5, 2010

Using PL/SQL tables as IN- and OUTPUT parameters can be bad for performance

Sometimes, when you start a new project, you want to go that extra mile and "do thing right this time". That is just what one of our programmers thought when setting up a new package to manipulate large amounts of data. Instead of repeating the code to merge two PL/SQL tables, he wrote a procedure to do that for him. Nice and clean, modular programming, etc, etc.

First day of testing: everything works fine.
Second day: still going strong.
Third day: the load test. With amounts of data not even close to the amounts to be processed in production, the system slows to a crawl. Processing seems to take forever and memory usage is going through the roof.

In this case, the separate procedure for merging two PL/SQL tables was the problem. When creating this procedure, the developer created something like this, stripped of any irrelevant code (Seven of Nine: Code is irrelevant...)

procedure concat_plsql_table
          ( p_tab1  IN      t_test_tab
          , p_tab2  IN OUT  t_test_tab
          )
IS
BEGIN
  --
  -- LOOP through all records of the first table
  -- and place them in the output table
  --
  FOR i IN 1..p_tab1.count LOOP
    p_tab2(p_tab2.count+1) := p_tab1(i);
  END LOOP;
END;


The problem with this, is that the parameters are referenced by value. This means that the compiler creates a copy of the variable, starts working on that variable, and returns a copy to the calling procedure when finished. It is all this copying that wreaks havoc on your performance and memory.

The solution can be very simple:
* either do not use a separate procedure
* or use the NOCOPY compiler hint in your code

procedure concat_plsql_table
          ( p_tab1  IN            t_test_tab
          , p_tab2  IN OUT NOCOPY t_test_tab
          )
IS
BEGIN
  --
  -- LOOP through all records of the first table
  -- and place them in the output table
  --
  FOR i IN 1..p_tab1.count LOOP
    p_tab2(p_tab2.count+1) := p_tab1(i);
  END LOOP;
END;


With the NOCOPY compiler hint, the compiler is instructed not to copy the variable, but to use a reference to the original variable and work from there. There are some limitations on the usage (see the Oracle Documentation), so be careful on how you use this.

In this case, the performance went up by about 200 times, from minutes to seconds prcessing time. For any procedure using large amounts of data in arrays, it is worthwhile to consider this option.





For a more complete example, see the code below:

create or replace package test_plsql_table
as
  --
  -- Main procedure
  --
  procedure test_response;
  --
end;
/

create or replace package body test_plsql_table
as
  --
  -- pl/sql table
  --
  type t_test_rec IS RECORD ( id       number
                            , desc_col varchar2(100)
                            );
  type t_test_tab IS TABLE of t_test_rec
       index by binary_integer;
  --
  -- Concatenate two pl/sql tables into a third
  --
  procedure concat_plsql_table
            ( p_tab1  IN     t_test_tab
            , p_tab2  IN OUT t_test_tab
            )
  IS
  BEGIN
    --
    -- LOOP through all records of the first table
    -- and place them in the output table
    --
    FOR i IN 1..p_tab1.count LOOP
      p_tab2(p_tab2.count+1) := p_tab1(i);
    END LOOP;
  END;
  --
  -- Main procedure
  --
  procedure test_response
  IS
    l_tab1  t_test_tab;
    l_tab2  t_test_tab;
    l_tab3  t_test_tab;
  BEGIN
    --
    -- Display the start time
    --
    dbms_output.put_line('Procedure started at '||to_char(systimestamp,'hh24:mi:ss'));
    --
    -- Initial content for the first table: 50 records
    --
    FOR i IN 1..500 LOOP
      l_tab1(i).id       := i;
      l_tab1(i).desc_col := 'Record '||i||' in initial table';
    END LOOP;
    --
    -- LOOP over records and in each LOOP, concatenate the initial table with a second one
    --
    FOR i IN 1..1000 LOOP
      concat_plsql_table( l_tab1
                        , l_tab2);
      IF mod(i,100) = 0 THEN
        dbms_output.put_line('.. '||lpad(i*500,6,' ')||' records at '||to_char(systimestamp,'hh24:mi:ss'));
      END IF;
    END LOOP;
    --
    -- tab2 should now contain a lot of records
    --
    dbms_output.put_line('Number of records created in result table 2 = '||l_tab2.count);
    --
    -- Display the end time
    --
    dbms_output.put_line('.');
    dbms_output.put_line('Procedure started at '||to_char(systimestamp,'hh24:mi:ss'));
    --
    -- LOOP over records and in each LOOP, concatenate the initial table with a second one
    --
    FOR i IN 1..1000 LOOP
      FOR i IN 1..l_tab1.count LOOP
        l_tab3(l_tab3.count+1) := l_tab1(i);
      END LOOP;
      IF mod(i,100) = 0 THEN
        dbms_output.put_line('.. '||lpad(i*500,6,' ')||' records at '||to_char(systimestamp,'hh24:mi:ss'));
      END IF;
    END LOOP;
    --
    -- tab3 should now contain a lot of records (same amount as tab2)
    --
    dbms_output.put_line('Number of records created in result table 3 = '||l_tab3.count);
    --
    -- Display the end time
    --
    dbms_output.put_line('.');
    dbms_output.put_line('Procedure finished at '||to_char(systimestamp,'hh24:mi:ss'));
  END;
  --
end;
/