Generic Architecture for Caching Table Data: Hello Cache, How Are You Doing?

Now that you have set up a functioning cache, are you sure it’s giving you the performance enhancements you need? Keep reading to learn three ways to check the performance of your cache.

At this stage, we have all the basic functionality in place for our cache to be useful and usable.  We can read and write to it, refresh it, flush it and show its content.  In this part we will add code that allows you to gauge the effectiveness of the cache.  This can be used to help you fine-tune the optimum size of your cache in a particular instance.  Note that this does not add any functionality to the actual cache, it is just a tool that helps you understand how the cache is being used and allows you to make some informed decisions later on.

The Hit Ratio

When data is found in the cache, it is called a cache hit, and the effectiveness of a cache is judged by its hit rate.  A cache is only effective if it has a high hit rate.  The hit ratio can be calculated by dividing the number of times a record gets found in the cache (cache hit), versus the number of times a record was looked for:

            # Cache Hits

Hit Ratio = ————- * 100

            # Total Reads

The higher the hit ratio, the better your cache usage, and the more performance gain you are getting from using the cache.  A low cache hit ratio can indicate that the cache size is too small, or that the data you are caching is not suitable for caching, i.e. there are too many distinct values that get queried resulting in high turnover (see later).

In order to be able to calculate the ratio, we just need to keep track of the number of successful cache reads and the total number of reads.  I do not want to slow down the cache’s performance, however, by gathering all these statistics, so first we introduce another switch to control whether cache stats are gathered or not:

   g_gather_stats     BOOLEAN        := FALSE;

This is followed by the usual getters and setters:

   PROCEDURE set_gather_stats (p_on IN BOOLEAN)
   AS
   BEGIN
      g_gather_stats := NVL (p_on, FALSE);
   END set_gather_stats;

   FUNCTION gather_stats
      RETURN BOOLEAN
   AS
   BEGIN
      RETURN g_gather_stats;
   END gather_stats;

We also need places in which we can store the numbers, so let’s create some more global variables:

   g_total_reads        PLS_INTEGER    := 0;
   g_total_cache_hits   PLS_INTEGER    := 0;

And then change our existing code to gather the numbers (if requested):

   FUNCTION read_from_cache (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      IF (g_dept_cache.EXISTS (p_dept_id))
      THEN
         l_dept_data.NAME := g_dept_cache (p_dept_id).NAME;
         l_dept_data.LOCATION := g_dept_cache
(p_dept_id).LOCATION;
         IF (gather_stats)
         THEN
            g_total_cache_hits :=   g_total_cache_hits
                                  + 1;

         END IF;
      END IF;

      RETURN l_dept_data;
   END read_from_cache;

   FUNCTION dept_data (
      p_dept_id         IN   dept.deptno%TYPE,
      p_force_db_read   IN   BOOLEAN
   )
      RETURN department_tp
   AS
      l_dept_data   department_tp;
   BEGIN
      IF ((caching) AND (NOT (p_force_db_read)))
      THEN
         l_dept_data := read_from_cache (p_dept_id);
      END IF;

      IF (l_dept_data.NAME IS NULL)
      THEN
         l_dept_data := read_from_db (p_dept_id);

         IF ((l_dept_data.NAME IS NOT NULL) AND (caching))
         THEN
            write_to_cache (p_dept_id, l_dept_data);
         END IF;
      END IF;

      IF (gather_stats)
      THEN
         g_total_reads :=   g_total_reads
                          + 1;

      END IF;
      RETURN l_dept_data;
   END dept_data;

Now we just need to create a procedure to actually perform the calculation of the ratio:

   FUNCTION cache_hit_ratio
      RETURN PLS_INTEGER
   AS
      l_cache_hit_ratio   PLS_INTEGER;
   BEGIN
      IF (g_total_reads <> 0)
      THEN
         l_cache_hit_ratio := (g_total_cache_hits /
g_total_reads) * 100;
      END IF;

      RETURN l_cache_hit_ratio;
   END cache_hit_ratio;

Note that your hit ratio will never be 100% because the cache needs to get filled first, i.e. the first records that get queried will always result in a DB read because there is nothing in the cache yet.  The goal is to get as close to 100% as possible.

{mospagebreak title=The Cache Miss Ratio aka DB Reads}

When data is not found in the cache, it is called a cache miss.  A cache is only effective if it has a low miss rate.  The miss ratio can be calculated by dividing the number of times a record gets read from the DB (it only gets read from the DB if it wasn’t found in the cache, indicating a cache miss), versus the number of times a record was looked for:

            # Cache Misses

Hit Ratio = ————– * 100

            # Total Reads

The lower the miss ratio, the fewer the data items get read from the database and the more performance gain you are experiencing from using the caching logic.  A high cache miss ratio can indicate that the cache size is too small, or that the data you are caching is not suitable for caching, i.e. there are too many distinct values that get queried resulting in high turnover (see later).

In order to be able to calculate the ratio, we just need to keep track of the number of db reads (on top of the total number of reads which we already keep track of). 

   g_total_db_hits      PLS_INTEGER    := 0;

And then we change our existing code to gather the numbers (if requested):

   FUNCTION read_from_db (p_dept_id IN dept.deptno%TYPE)
      RETURN department_tp
   AS
      l_dept_data   department_tp;
      CURSOR csr_dept_data (p_dept_id IN dept.deptno%TYPE)
      IS
         SELECT dname, loc
           FROM dept
          WHERE deptno = p_dept_id;
   BEGIN
      OPEN csr_dept_data (p_dept_id);
      FETCH csr_dept_data INTO l_dept_data;

      IF (csr_dept_data%FOUND)
      THEN
         IF (gather_stats)
         THEN
            g_total_db_hits :=   g_total_db_hits
                               + 1;

         END IF;
      END IF;

      CLOSE csr_dept_data;
      RETURN l_dept_data;
   END read_from_db;

Now we just need to create a procedure to actually perform the calculations of the ratio:

   FUNCTION db_hit_ratio
      RETURN NUMBER
   AS
      l_db_hit_ratio   NUMBER (10, 5);
   BEGIN
      IF (g_total_reads <> 0)
      THEN
         l_db_hit_ratio := (g_total_db_hits / g_total_reads) *
100;
      END IF;

      RETURN l_db_hit_ratio;
   END db_hit_ratio;

{mospagebreak title=The Cache Turnover Ratio}

When data is removed from the cache to make room for another record, it is called a turnover.  A cache is most effective if it has a low turnover rate.  The turnover ratio can be calculated by dividing the number of times a record gets removed from the cache, versus the total number of cache hits:

            # Cache Turnovers

Hit Ratio = —————– * 100

            # Cache Hits

The lower the turnover ratio, the better your cache usage, and the more performance gain you are getting from using the cache.  A high cache turnover can indicate that the cache size is too small, i.e. it can help you pinpoint the issue (cache size too small) when used in conjunction with the Cache Hit and/or Cache Miss ratios.

In order to be able to calculate the ratio, we just need to keep track of the number of cache turnovers.

   g_total_cache_turnover   PLS_INTEGER    := 0;

And then we change our existing code to gather the numbers (if requested):

   PROCEDURE write_to_cache (
      p_dept_id     IN   dept.deptno%TYPE,
      p_dept_data   IN   department_tp
   )
   AS
   BEGIN
      IF (g_dept_cache.COUNT >= max_cache_size)
      THEN
         delete_from_cache (random_cached_dept_id);

         IF (gather_stats)
         THEN
            g_total_cache_turnover :=   g_total_cache_turnover
                                      + 1;
         END IF;
      END IF;

      g_dept_cache (p_dept_id).NAME := p_dept_data.NAME;
      g_dept_cache (p_dept_id).LOCATION := p_dept_data.LOCATION;
   END write_to_cache;

Now we just need to create a procedure to actually perform the calculations of the ratio:

   FUNCTION cache_turnover_ratio
      RETURN NUMBER
   AS
      l_cache_turnover_ratio   NUMBER (10, 5);
   BEGIN
      IF (g_total_reads <> 0)
      THEN
         l_cache_turnover_ratio :=
                                  (g_total_cache_turnover /
g_total_cache_hits)
                                * 100;
      END IF;

      RETURN l_cache_turnover_ratio;
   END cache_turnover_ratio;

Note that your turnover ratio can be 0% and the goal is to get as close to 0% as possible.

{mospagebreak title=Reporting the Ratio}

After all this data gathering you probably would want to report on the data, so let’s write a procedure that writes all the ratios to the screen:

   PROCEDURE log_ratios
   AS
   BEGIN
      DBMS_OUTPUT.put_line (‘Hit Ratio Miss Ration Turnover Ratio’);
      DBMS_OUTPUT.put_line (‘——— ———– ————–
‘);
      DBMS_OUTPUT.put_line (   LPAD (TO_CHAR
(cache_hit_ratio,’9990.00′), 7)
                            || ‘ %’
                            || ‘ ‘
                            || LPAD (TO_CHAR
(db_hit_ratio,’999990.00′), 9)
                            || ‘ %’
                            || ‘ ‘
                            || LPAD (TO_CHAR
(cache_turnover_ratio,’999990.00′), 12)
                            || ‘ %’
                           );
   END log_ratios;

As usual, you can do whatever you want with these ratios, you don’t have to write them to the screen.  You can store them in a table, e.g. to see how your cache’s ratios evolves over time.  That way you can spot trends and act pro-actively, e.g. adjusting the cache size if over time the turnover ratio worsens.

Notice that these ratios are also session specific. If you lose your connection or reconnect, the ratio components will get reset (to NULL).  If you want to record ratios spanning several sessions you will have to store the values of the ratio components at the end of your code, and restore them at the beginning. 

On the other hand, if you do not want to record over several sessions and not even over different runs of your code in the same session, you will have to reset the cache manually.  You can add a Cache Reset procedure to your package that you call at the start of your code.  That way the ratio components are always the same (zero) at the start of your program, regardless of whether you just started a session, or have been in that same session for a while.

   PROCEDURE reset_stats
   AS
   BEGIN
      g_total_reads := 0;
      g_total_cache_hits := 0;
      g_total_db_hits := 0;
      g_total_cache_turnover := 0;
   END reset_stats;

Conclusion

This concludes the series on my Caching Architecture.  We have seen how to create a cache for our PL/SQL applications and how to make good use of it.  You should now be able find good candidate queries for caching and just follow the architecture to implement good caching logic.  Hopefully you can put this into good practice and your customers or manager will thank you for the fine job you did on improving the performance of the application.

Google+ Comments

Google+ Comments