Generic Architecture for Caching Table Data: Supercharge Your PL/SQL Applications

In this series I am going to introduce you to a generic PL/SQL caching architecture. The goal is to give you an understanding of how you can store Table data in PL/SQL collections (“the cache”) and how to retrieve data from those same PL/SQL collections. The proposed architecture will be generic enough for you to use throughout your applications everywhere you need to retrieve data from the database (and isn’t that what PL/SQL applications are all about?).

Why Caching?

The PL/SQL language was specifically created by Oracle for interacting with data in a database, yet ironically that same interaction is what makes most PL/SQL code slow.  You could write a PL/SQL program containing tens of thousands of lines of code performing extremely complex arithmetic, and it would probably run quicker than a 500-line procedure that needs to query 1000 invoices from a table containing 100 million rows.  It is exactly because of this inefficiency that one of the most effective performance enhancing techniques available to PL/SQL developers is caching data in PL/SQL collections.

Accessing data in the database involves (most of the time) reading data from a disk.  This is inherently slow and you should try to avoid it as much as possible.  Caching is a technique whereby you store data you have read from the disk previously in memory so that the next time you need to access it, you retrieve it from memory instead of from the disk.  Reading data from the memory is much faster than reading data from a hard drive, so caching gives your applications a serious boost.

As you might have guessed, caching is only useful if you need to access the same data over and over again.  There is no point to caching your data if you are not going to need it anymore afterwards.

{mospagebreak title=Why a Caching Architecture?}

I have implemented caches myself on many occasions, but they were always an afterthought: code had been released and a customer complained about performance.  You verify that you have tuned your queries to perfection and then you realize that caching will significantly increase performance, so you start patching up your code and add the caching logic.

I also started noticing that the way I implemented caching in different places was never exactly the same.  Obviously you are dealing with different data so there have to be differences, but I am referring to my own implementations varying from code to code.  Basically I was re-inventing the wheel every time I implemented caching logic and sometimes the wheel had 10 spokes and sometimes 11.  Worst of all, occasionally, there should have been 10 spokes but there were only 9, i.e. I would introduce bugs into my own caching logic (that I implemented perfectly two weeks before in some other piece of code).

After having gone through several iterations of this, I finally decided enough is enough.  I sat down to once and for all come up with a proper solution, some simple guidelines that I can just follow whenever I need to implement caching logic, a proper caching architecture.  I also figured that maybe when I have these guidelines, I would not wait to implement caching till a customer tells me to, but actually design them in the system from the start. 

And low and behold, that is exactly what happened.  Once I was certain that this actually works and can be implemented anywhere, I got so confident about the architecture that I now build caching logic into my applications without even thinking about it.  It takes me almost no extra effort or time to add caching logic to the code, they code has less bugs (nobody is perfect) and it always looks exactly the same.  I cannot overstate how important that last note is.  I for one have difficulties remembering why I myself coded some things the way I did, let alone when reviewing someone else’s code.  Now it is easy because it is all the same, even if someone else actually wrote the code.

{mospagebreak title=An Example}

To keep this example simple, I am going to use the EMP and DEPT tables that are installed (in the SCOTT schema) when you install an Oracle database.  Typically an organization has many employees but relatively few departments for which all of these employees work.  Now imagine that you are receiving a collection of employee data and you are asked to write a procedure that finds out what the department name is for which each of these employees work and the location of that department.  This is a perfect scenario for caching.

We start with the construction of our cache.  We know that it needs to contain the department name and location because that is what is being asked from us.  Furthermore it needs to contain the department identifier because that is the only way to join the employee to his or her department.  This makes the department id a special attribute of our cache as we shall see in a bit.

Here is the declaration of my PL/SQL type to hold the department data:

   TYPE department_tp IS RECORD (
      NAME       dept.dname%TYPE,
      LOCATION   dept.loc%TYPE
   );

Now I can hear you think: hold on a minute, where is the department identifier?  As I mentioned, the department identifier is a special attribute of my cache.  It will be used to access the data in the cache and as such it will be the index of the record it represents.  So when I declare my associative array (formerly known as Index-By table) as

   TYPE department_tbl IS TABLE OF department_tp
      INDEX BY BINARY_INTEGER;

my department identifier is actually the INDEX BY BINARY_INTEGER bit in this declaration.  This will become clearer later on in the example.

With all my types in place, I can now declare my actual cache:

   g_dept_cache   department_tbl;

Let’s start with some basic code for our cache.  I need to be able to write data to my cache and read data back from it.  I also need a function that can read data from the database.  Here is how these modules look:

   PROCEDURE write_to_cache (
      p_dept_id     IN   dept.deptno%TYPE,
      p_dept_data   IN   department_tp
   )
   AS
   BEGIN
      g_dept_cache (p_dept_id).NAME := p_dept_data.NAME;
      g_dept_cache (p_dept_id).LOCATION := p_dept_data.LOCATION;
   END write_to_cache;
   FUNCTION read_from_cache (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      IF (g_dept_cache.EXISTS (p_dept_id))
      THEN
         l_dept_data.NAME := g_dept_cache (p_dept_id).NAME;
         l_dept_data.LOCATION := g_dept_cache
(p_dept_id).LOCATION;
      END IF;
      RETURN l_dept_data;
   END read_from_cache;
   FUNCTION read_from_db (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
      CURSOR csr_dept_data (p_dept_id IN dept.deptno%TYPE)
      IS
         SELECT dname, loc
           FROM dept
          WHERE deptno = p_dept_id;
   BEGIN
      OPEN csr_dept_data (p_dept_id);
      FETCH csr_dept_data INTO l_dept_data;
      CLOSE csr_dept_data;
      RETURN l_dept_data;
   END read_from_db;

Notice how the department identifier (p_dept_id) is used everywhere to identify the department in the cache.

{mospagebreak title=Retrieving department data}

With these three procedures in place, the logic for retrieving department data becomes very simple really:

 

  • Read the department data from the cache

§        If it was not found in the cache, query it from the database

§        If it was found in the database, write it to the cache.

   FUNCTION dept_data (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      l_dept_data := read_from_cache (p_dept_id);
      IF (l_dept_data.NAME IS NULL)
      THEN
         l_dept_data := read_from_db (p_dept_id);
         IF (l_dept_data.NAME IS NOT NULL)
         THEN
            write_to_cache (p_dept_id, l_dept_data);
         END IF;
      END IF;
      RETURN l_dept_data;
   END dept_data;

Now wrap all these functions and procedures in a package body, call it dept_cache and expose all the types and the dept_data function (by putting them in the package header).  There is no need to expose any of the other procedures.

Now, if you want to get department data, you just call the dept_data function.

   DECLARE
      l_dept_data   dept_cache.department_tp;
   BEGIN
      l_dept_data := dept_cache.dept_data (’10’);
   END;

Notice how the whole caching logic is completely transparent for the users of your package.  Nobody knows that it is using a cache (and neither should they have to know).

When you get your collection of employee data you can now simply loop over the data and invoke dept_cache.dept_data passing in the DEPTNO of the record you are currently processing, e.g.:

   DECLARE
      l_dept_data   dept_cache.department_tp;
   BEGIN
      FOR emp_rec IN (SELECT *
                        FROM emp)
     LOOP
         l_dept_data := dept_cache.dept_data (emp_rec.deptno);
         DBMS_OUTPUT.put_line (   emp_rec.ename
                               || ‘ works in the ‘
                               || INITCAP (l_dept_data.NAME)
                               || ‘ department which is located
in ‘
                               || INITCAP (l_dept_data.LOCATION)
                              );
      ENDLOOP;
   END;

In order to find out how much of a performance gain this caching is giving us I conducted a little test.  We will query 200.000 records using the read_from_db function (note that you will have to make the read_from_db function temporarily public to be able to run the test).  Then we will do the same test using the dept_data function.  Here is what I get on my Test DB:

   Timings when reading 200000 records from the DB = 2132 ms
   Timings when reading 200000 records, using the cache = 238 ms

These results will obviously vary, depending on your DB (version, setup…) and the volume of data in your department table, but typically you will see a huge increase in speed when caching your data.

{mospagebreak title=Conclusion}

So far we have implemented only the very basic parts of the generic Caching Architecture, yet you should have an understanding of how caching can help you in building more performant applications.

Here are the steps we have implemented thus far:

First of all, you have to design your cache. It basically needs to be able to hold the data that you need to query from the database, which in turn is the data that the user is requesting.  In our case, the user is requesting the department name and location, so we have to write a query that retrieves the department name and location, and we create our cache based on that.  Whichever attribute the user is using to identify the data he needs, i.e. the where clause of your query, that attribute needs to become the index of your cache. 

Note that from Oracle 9iR2 onwards, this index does not have to be a numeric value anymore, it can just as well be a VARCHAR2  — which is just as well, because the department identifier is actually a VARCHAR2 field in the dept table, so you might want to use a VARCHAR2 instead as an index.  The only reason I got away with using a binary_integer is because all the identifiers in my DEPT table are numbers, so they get implicitly converted for me without a problem.

Once you have your cache, you need to create three private procedures, one to write to the cache, one to read from the cache, and one that queries the DB.  Finally you create a public function using all of the above procedures to manage the data flow as defined above and serve up the data to your users:

1. Create a cache

2. Create a read, write and query procedure

3. Create a data retrieval function

In the next article, we will expand the functionality of our cache.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan