Building a transformative function

A transformative function is a pipelined function that accepts as a parameter a result set (via a CURSOR expression) and returns a result set. This functionality is also new to Oracle9i and can have a very positive effect on application performance.

You can define a table function with an IN argument of type REF CURSOR and call it with a CURSOR expression as the actual parameter—all inside a SQL statement. This technique allows you to "pipe" data from one function to the next, or from one SQL operation to the next, without needing to rely on any intermediate storage of data. Here is an example of how you can use this functionality:

All the code shown here may be found in the tabfunc.sql script on the O'Reilly site.

 

 

Consider the following scenario. I have a table of stock ticker information that contains a single row for the openand close prices of stock:

CREATE TABLE StockTable ( ticker VARCHAR2(10), open_price NUMBER, close_price NUMBER);

I need to transform (or pivot) that information into another table:

CREATE TABLE TickerTable ( ticker VARCHAR2(10), PriceType VARCHAR2(1), price NUMBER);

In other words, a single row in StockTable becomes two rows in TickerTable.

There are many ways to achieve this goal. For example, when using traditional methods in pre-Oracle9i versions of the database, I could write code like this:

FOR rec IN (SELECT * FROM stocktable)LOOP INSERT INTO tickertable (ticker, pricetype, price) VALUES (rec.ticker, 'O', rec.open_price); INSERT INTO tickertable (ticker, pricetype, price) VALUES (rec.ticker, 'C', rec.close_price);END LOOP;

It works, but for very large volumes of data, perhaps it is not as efficient as it could be. Let's see if I can use a transformative function to do the job more quickly.

I create a collection type to use in my function:

CREATE TYPE TickerType AS OBJECT ( ticker VARCHAR2(10), PriceType VARCHAR2(1), price NUMBER); CREATE TYPE TickerTypeSet AS TABLE OF TickerType;

I then create a package defining a REF CURSOR type based on this collection type. I do this in a package specification so that it can be referenced by my function:

CREATE OR REPLACE PACKAGE refcur_pkg IS TYPE refcur_t IS REF CURSOR RETURN StockTable%ROWTYPE;END refcur_pkg;

And here is my stock pivot function:

CREATE OR REPLACE FUNCTION StockPivot ( p refcur_pkg.refcur_t) RETURN TickerTypeSet PIPELINED IS out_rec TickerType := TickerType(NULL,NULL,NULL); in_rec p%ROWTYPE;BEGIN LOOP FETCH p INTO in_rec; EXIT WHEN p%NOTFOUND; -- first row out_rec.ticker := in_rec.Ticker; out_rec.PriceType := 'O'; out_rec.price := in_rec.Open_Price; PIPE ROW(out_rec); -- second row out_rec.PriceType := 'C'; out_rec.Price := in_rec.Close_Price; PIPE ROW(out_rec); END LOOP; CLOSE p; RETURN ;END;

And here is that same function defined in a nonpipelined way:

CREATE OR REPLACE FUNCTION StockPivot_nopl ( p refcur_pkg.refcur_t) RETURN TickerTypeSet IS out_rec TickerType := TickerType(NULL,NULL,NULL); in_rec p%ROWTYPE; retval TickerTypeSet := TickerTypeSet( );BEGIN retval.DELETE; LOOP FETCH p INTO in_rec; EXIT WHEN p%NOTFOUND; out_rec.ticker := in_rec.Ticker; out_rec.PriceType := 'O'; out_rec.price := in_rec.Open_Price; retval.EXTEND; retval(retval.LAST) := out_rec; out_rec.PriceType := 'C'; out_rec.Price := in_rec.Close_Price; retval.EXTEND; retval(retval.LAST) := out_rec; END LOOP; CLOSE p; RETURN retval;END;

So many different approaches to solving the same problem! How does one choose among them? Well, if you have lots of data to process, you will certainly want to choose the most efficient implementation. You will find in the tabfunc.sql file[1] an anonymous block that compares the performance of the following four approaches:

[1] The tabfunc.sql file relies on the PL/Vision timer mechanism, PLVtmr, to calculate elapsed time. You can install this package by running the :plvtmr.pkg script.

· A direct insert from SELECT using the pipelined function:

· INSERT INTO tickertable · SELECT * FROM TABLE (StockPivot (CURSOR(SELECT * FROM StockTable)));

· A direct insert from SELECT using the nonpipelined function:

· INSERT INTO tickertable · SELECT * FROM TABLE (StockPivot_nopl (CURSOR(SELECT * FROM StockTable)));

· A deposit of pivoted data into a local collection. You can then use a simple loop to transfer the collection's contents to the table:

· OPEN curvar FOR· SELECT * FROM stocktable;· mystock := stockpivot_nopl (curvar);· indx := mystock.FIRST;· · LOOP· EXIT WHEN indx IS NULL;· · INSERT INTO tickertable· (ticker, pricetype, price)· VALUES (mystock (indx).ticker, mystock (indx).pricetype,· mystock (indx).price);· END LOOP;

· The "old-fashioned" method. Use a cursor FOR loop to expand each single row from stocktable into the two rows of the tickertable:

· FOR rec IN (SELECT * FROM stocktable)· LOOP· INSERT INTO tickertable· (ticker, pricetype, price)· VALUES (rec.ticker, 'O', rec.open_price);· · INSERT INTO tickertable· (ticker, pricetype, price)· VALUES (rec.ticker, 'C', rec.close_price); END LOOP;

When I execute the block comparing these four aproaches in my SQL*Plus session, I see these results:

All SQL with Pipelining function Elapsed: 2.47 seconds.All SQL with non-pipelining function Elapsed: 1.78 seconds.Intermediate collection Elapsed: 6.71 seconds.Cursor FOR Loop and two inserts Elapsed: 6.9 seconds.

I draw two conclusions from this output:

· The ability of the table function (whether pipelined or regular) to transform data "in-line" (i.e., within a single SQL statement) noticeably improves performance.

· Pipelining doesn't help us in this scenario; it actually seems to slow things down a bit. In fact, I am not really taking advantage of pipelining in this code. In all cases, I am waiting until the logic has executed to completion before I do anything with my data (or compute elapsed time).

I would expect (or hope, at least) to see some improvement in elapsed time when executing this logic in parallel or, more generally, when we get the first N number of rows and start processing them before all of the data has been retrieved. The file tabfunc.sql offers a simulation of such a scenario.

I compare the time it takes to execute each of these statements:

-- With pipeliningINSERT INTO tickertable SELECT * FROM TABLE (StockPivot (CURSOR(SELECT * FROM StockTable)))WHERE ROWNUM < 10; -- Without pipeliningINSERT INTO tickertable SELECT * FROM TABLE (StockPivot_nopl (CURSOR(SELECT * FROM StockTable)))WHERE ROWNUM < 10;

And the contrasting timings are very interesting:

Pipelining first 10 rows Elapsed: .08 seconds.No pipelining first 10 rows Elapsed: 1.77 seconds.

Clearly, piping rows back does work and does make a difference!