Generic Architecture for Caching Table Data: Supercharge Your Cache

In the first part of this series we started of by putting the basic structures in place for a cache and wrote some code to manage the cache. In this next part, we will extend the functionality of our cache.

Things to look out for when using a cache

Although caching is a fabulous way to increase performance, it has a couple of drawbacks.  First of all, data in the cache can get out of sync with the data in the database: a record you cached might get updated or even removed from the database.  This is one of the reasons you should not use caching on transactional data that gets changed a lot.  These synchronization issues can result in “unpredictable” behavior of your code and are potentially very hard to debug.  Because we hide our caching logic, a fellow developer who is using your package might not understand why, when the database says “The Department Name of department 20 is R&D”, your procedure consistently returns that “The Department Name of department 20 is Research”.

A second issue with caching is that, by its very nature, it consumes memory.  This becomes a problem when either your cache becomes very big or there are many users who simultaneously, in separate sessions, use your package, spawning many caches.  Besides causing memory issues, big caches also tend to become slower, that is they do not scale very well.  This obviously defeats the purpose of having a cache in the first place so try to keep them as small as possible.

In a later part of this series we will try to implement solutions for these issues but for now, like a true developer, I first would like to try an easy solution.

{mospagebreak title=Switching the cache on and off}

When my code produces truly unexpected results, I always immediately suspect the cache: is it out of sync, does it contain corrupted data, and so on. So what would be an easy way to confirm whether the cache really is to blame?  How about switching it off!  That’s right, if you switch the cache off, and your code suddenly starts working, there must be something wrong with your cache; if it still does not work, you are going to have to dig deeper, but at least you ruled out the cache.  Let’s see how we can implement such a switch.

You will first need to create the switch.  This is basically a private global BOOLEAN variable that you declare in your package body and will signal if the cache is in use (TRUE) or not (FALSE).

   g_caching      BOOLEAN        :=TRUE;

Here we default g_caching to TRUE meaning the cache is on by default.

In order for anybody to “set” the switch, we need a procedure, which we will make public by publishing it in the package specification:

      g_caching := NVL(p_on,TRUE);
   END set_caching;

Some of you are probably wondering why we don’t just make our g_caching variable public and forget about the set_caching procedure.  It is considered good practice to hide your global variables (all of them, not just this one) in the package body.  You should never expose them because then you lose control over its content.  Anybody can set your public variable to whatever they want, making it impossible to track what happens. 

Using this technique, you have complete control over your global variable and you can add your own business rules.  E.g. my business rule is that if NULL is passed in, I set the value to TRUE.  If I had exposed the variable, I could not enforce my business rule, and somebody could have easily set it to NULL, probably breaking the caching logic further on.  It is true that I could change all the other code to handle NULL values in my g_caching variable, but this is much cleaner (and easier!).  I could also extend the set_caching procedure later on, adding some security measures (e.g. only allow certain users to set the switch).  You can even log who sets the switch, what it was set to and when.

We also create a public function that can tell us whether the cache is on or off:

   FUNCTION caching
      RETURN g_caching;
   END caching;

Don’t forget to add these functions to the package specification.

Now we are ready to make our modifications to the caching logic. Here is the new code:

   FUNCTION dept_data (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
      l_dept_data   department_tp;
      l_dept_data := read_from_cache (p_dept_id);
      IF (l_dept_data.NAME IS NULL)
         l_dept_data := read_from_db (p_dept_id);
         IF ((l_dept_data.NAME IS NOT NULL)AND(caching))
            write_to_cache (p_dept_id, l_dept_data);
         END IF;
      END IF;
      RETURN l_dept_data;
   END dept_data;

We do the check before we write to the cache.  Notice though that we don’t check whether caching is switched on before we read.  This isn’t really necessary as the cache would be empty anyway, forcing a DB read later on (you could add the check though if you want).

Anybody can now control the behavior of your cache (use it or not), by calling set_caching:

   l_dept_data   dept_cache.department_tp;
   — cache will be used: the default
   l_dept_data := dept_cache.dept_data (’10’);
   — Cache will not be used
   dept_cache.set_caching (FALSE);
   l_dept_data := dept_cache.dept_data (’10’);

{mospagebreak title=Don’t Forget to Flush}

If you run this code you will notice some very peculiar behavior.  Although the cache gets switched off before the second call, it will refuse to query the database.  In fact, if you run it a second time (and a third etc.), you will see the same behavior, it just will not go to the DB.  That is until you reconnect (i.e. close this session and open a new one).  The reason is that switching the cache off doesn’t actually clear it.  The data that was in the cache before you switched it off stays in there until you disconnect your session.  Therefore, when you query department 10, it will still find it in the cache and return the cached value.

Now you could have avoided this by adding the check to verify whether the cache is in use or not before you read from the cache.  But that would hide the fact that there is still data in the cache, occupying valuable memory for no reason.  To resolve this issue we will create a procedure that empties the cache and we will call this when the cache gets switched off.

   PROCEDURE flush_cache
   END flush_cache;

Here are the changes needed in set_caching:

      g_caching := p_on;
      IF (NOT p_on)
      END IF;
   END set_caching;

Now, when you switch the cache off, you also clean it up and no further cache reads will occur, until you switch the cache back on that is. 

And how do I know this switch actually works?  Well, very simple, I can now do my performance test that I did in the first article with the same procedure, dept_data, and just switch the cache on and off before each test (in part 1 of this series I could only demonstrate this by calling the read_from_db function directly).  It should be significantly slower with the cache switched off.

{mospagebreak title=Conclusion}

When you create a cache, you should always implement a switch that allows you to set caching off when needed. I cannot tell you how much time this has saved me while debugging my code.  When you do switch off the cache, make sure you also clear it out before continuing, else you might see some unexpected behavior and cause “memory leaks.” 

Given that this is so useful, I always implement my caches with this functionality and so I have made it a part of the Caching Architecture, which now basically looks like this:

  1. Create a cache.
  2. Create a procedure to empty the cache.
  3. Create a switch for the cache, a public procedure to set the switch and a function to verify what the switch is set to.
  4. Create a read, write and query procedure.
  5. Create a data retrieval function that uses the switch.

In the next part of this series we will add even more functionality to the cache and try to find solutions for some of the drawbacks we mentioned earlier.

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye