Generic Architecture for Caching Table Data: Keeping It Real (and Small)

In the third installment of this series we will add some more functionality to the cache that will alleviate some of the drawbacks that we discussed in earlier articles.

Refreshing the Cache

One of the issues with using cached data is that your cache can get out of sync with the database.  This can lead to unpredictable behavior in your code and is the reason why we introduced the caching switch.  Obviously that is not really a solution to the problem.  The real solution would be to implement a mechanism that “detects” changes in the DB and automatically refreshes the cached data.  Unfortunately, such a mechanism is really beyond the scope of this series, but we can at least implement cache-refreshing procedures so that, if you want to create a refreshing mechanism, you have everything ready.

We will create two different procedures, a procedure that refreshes one record, and another one that refreshes the whole cache.  Let’s start with the first one, refreshing one record from the database.  We can actually achieve this very simply by adding a new Boolean parameter p_force_db_read to the FUNCTION dept_data which defaults to false:

   FUNCTION dept_data (
      p_dept_id         IN   dept.deptno%TYPE,
      p_force_db_read   IN   BOOLEAN DEFAULT FALSE
   )
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      IF ((caching) AND (NOT (p_force_db_read)))
      THEN
         l_dept_data := read_from_cache (p_dept_id);
      END IF;

      IF (l_dept_data.NAME IS NULL)
      THEN
         l_dept_data := read_from_db (p_dept_id);

         IF ((l_dept_data.NAME IS NOT NULL) AND (caching))
         THEN
            write_to_cache (p_dept_id, l_dept_data);
         END IF;
      END IF;

      RETURN l_dept_data;
   END dept_data;

Now we will skip the cache read if either the caching is turned off, or if we set p_force_db_read to TRUE.  This in turn will trigger a read from the DB, refreshing the cached record in the process.  So now we just need to write our refresh procedure calling dept_data with the correct mode:

   PROCEDURE refresh_cache (p_dept_id IN dept.deptno%TYPE)
   AS
      l_dept_data   department_tp;
   BEGIN
      l_dept_data := dept_data (p_dept_id, TRUE);
   END refresh_cache;

We will ignore the department data that gets returned from dept_data.  It would probably be handy to have a function that does both, i.e. refreshes the cache and returns the found record, but you should not put that functionality in this procedure.  The procedure is called refresh_cache, not refresh_cache_and_return_value.  You should always be careful not to introduce “side effects” into your code.  In the long run, this will ALWAYS lead to problems.  If you really want a function that does this, create one:

   FUNCTION force_read_from_db (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      l_dept_data := dept_data (p_dept_id, TRUE);
      RETURN l_dept_data;
   END force_read_from_db;

And if you are thinking that this function now has the side effect of refreshing the cache, you are wrong.  Well, actually you are half right, it does refresh the cache, but that is not a “side-effect,” but the desired effect. I am caching, am I not?

For the second form of refresh, we will create an overriding procedure that just loops over the cache content and calls refresh_cache for every record:

   PROCEDURE refresh_cache
   AS
      l_cache_idx   PLS_INTEGER;
   BEGIN
      l_cache_idx := g_dept_cache.FIRST;
      <<all_records_in_cache>>
      LOOP
         EXIT all_records_in_cache WHEN NOT g_dept_cache.EXISTS
(l_cache_idx);
         refresh_cache (l_cache_idx);
         l_cache_idx := g_dept_cache.NEXT (l_cache_idx);
      END LOOP all_records_in_cache;
   END refresh_cache;

You publish these procedures in the package specification for people to use as they please.  For example, the developer who is going to design the “keep in sync with DB” feature can use these procedures to achieve his goal.

{mospagebreak title=Limit the size of the cache}

Another drawback of caching is that it consumes memory.  If you design your cache incorrectly, or if you pick the wrong tables to cache, you might actually bring the whole database to its knees, something that does not impress customers (I speak from experience).  Big caches also tend to become slower.  It is therefore useful to design for this up front and somehow limit the size of the cache.  Unfortunately, the “optimal” size, i.e. the maximum size of the cache without causing memory issues, varies from system to system.  Some customers have huge servers with massive amounts of RAM, some don’t.  So you can only accommodate, not actually set the cache size (although we will default it to 1000 records).

When you limit the size of your cache, you must confront the following issue: what do I do when the cache is full?  In this case, what if I already have 1000 records in the cache and now I want to cache a new record?  The only option you have available is to overwrite an existing record.  Actually, we are not really overwriting but deleting an existing record and creating a new one.  But which one?  Well, that is the subject of many debates and there are many selection algorithms out there that you are welcome to implement (MRU, LRU, LFU …).  Again, those algorithms are out of the scope of this series.  I am just going to remove a random record from the cache and then add the new one.

First we need a new global variable that holds the maximum cache size:

   g_max_cache_size   PLS_INTEGER;

and getter and setter procedures to manage the variable:

   PROCEDURE set_max_cache_size (p_cache_size IN PLS_INTEGER)
   AS
   BEGIN
      g_max_cache_size := NVL (p_cache_size, 1000);
   END set_max_cache_size;
   FUNCTION max_cache_size
      RETURN PLS_INTEGER
   AS
   BEGIN
      RETURN g_max_cache_size;
   END max_cache_size;

Then we need a procedure that can delete one record from the cache:

   PROCEDURE delete_from_cache (p_dept_id IN dept.deptno%TYPE)
   AS
   BEGIN
      g_dept_cache.DELETE (p_dept_id);
   END delete_from_cache;

And finally we need a “randomizer,” a function that returns a random dept_id that exists in the cache.

   FUNCTION random_cached_dept_id
      RETURN dept.deptno%TYPE
   AS
      l_random_dept_id   dept.deptno%TYPE;
      l_cache_idx        PLS_INTEGER;
      l_cache_counter    PLS_INTEGER;
      l_random_number    PLS_INTEGER;
   BEGIN
      l_cache_idx := g_dept_cache.FIRST;
      l_cache_counter := 1;
      l_random_number := DBMS_RANDOM.VALUE (1, g_max_cache_size);
      <<random_loop>>
      LOOP
         EXIT random_loop WHEN l_cache_counter = l_random_number;
         l_cache_counter :=   l_cache_counter
                            + 1;
         l_cache_idx := g_dept_cache.NEXT (l_cache_idx);
      END LOOP random_loop;
      RETURN l_cache_idx;
   END random_cached_dept_id;

Basically this function runs through the cache a random number of times, at which point it stops and returns the index of the record it stopped at (remember that the index is not just an index number in our cache, but the Department Identifier).  It’s probably not the finest piece of code ever created, but it works.

Now we modify the write_to_cache procedure to make sure we never exceed the maximum cache size:

   PROCEDURE write_to_cache (
      p_dept_id     IN   dept.deptno%TYPE,
      p_dept_data   IN   department_tp
   )
   AS
   BEGIN
      IF (g_dept_cache.COUNT >= max_cache_size)
      THEN
         delete_from_cache (random_cached_dept_id);
      END IF;
      g_dept_cache (p_dept_id).NAME := p_dept_data.NAME;
      g_dept_cache (p_dept_id).LOCATION := p_dept_data.LOCATION;
   END write_to_cache;

Now your cache will never grow beyond the maximum size, and your customers can set the cache size to whatever is suitable for their system.  It is sufficient to say that the smaller the maximum cache size, the less efficient your cache actually becomes, although that also depends on the amount of distinct values.  In my case for example, there are only four distinct values in DEPTNO, so if I set the cache to four I’ll cover all of them, and there will never be any in and out swapping, truly an ideal cache size.

{mospagebreak title=Showing the cache content}

Another useful ability is to be able to see what is actually stored in the cache, so we will create a log_cache procedure:

   PROCEDURE log_cache
   AS
      l_cache_idx   PLS_INTEGER;
   BEGIN
      l_cache_idx := g_dept_cache.FIRST;
      DBMS_OUTPUT.put_line (‘Name           Location’);
      DBMS_OUTPUT.put_line (‘————– ————-‘);
      <<all_records_in_cache>>
      LOOP
         EXIT all_records_in_cache WHEN NOT g_dept_cache.EXISTS
(l_cache_idx);
         DBMS_OUTPUT.put_line (   LPAD (g_dept_cache
(l_cache_idx).NAME, 14)
                               || ‘ ‘
                               || LPAD (g_dept_cache
(l_cache_idx).LOCATION,
                                        13
                                       )
                              );
         l_cache_idx := g_dept_cache.NEXT (l_cache_idx);
      END LOOP all_records_in_cache;
   END log_cache;

I am sending the output of this procedure to the screen, but you might have another mechanism for this, e.g. maybe you log trace messages in a trace table.  You can modify the procedure to use your own logging mechanism.  It is always very handy to be able to peek into the cache, especially when you are debugging.

{mospagebreak title=Conclusion}

We have shown some techniques that can be used to manage the cache size and refresh records in the cache.  If we add this to the Caching Architecture we get the following:

  1. Create a cache.
  2. Create a procedure to empty the cache.
  3. Create a switch for the cache, a public procedure to “set” the switch and a function to verify what the switch is set to.
  4. Create a read, write and query procedure.
  5. Create Procedures to refresh the cache content.
  6. Create a global variable to store the maximum cache size, a public procedure to “set” the size and a function to verify what the size is set to.
  7. Create a data retrieval function.
  8. Create a log procedure.

In the next part of the series we will add procedures that monitor the effectiveness of our cache.

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye