usefulness of the query cache

Lot of talk lately about the effectiveness or lack thereof of the MySQL query cache.  I’m kind of surprised we are still having this discussion in late 2009, I really thought these issues have been long settled.

It’s well known that the query cache is terrible for environments with a constant stream of writes.  Each write invalidates the cached results for every query touching that table.  This means that your database needs to be constantly inserting and purging the query cache in an environment with mixed reads and writes (aka the most common type of “Web 2.0″ workload you’ll pardon the outdated cliché).

But now, Mark’s post goes one step further and reveals that even in a completely read-only environment, the query cache is still not effective.  This was news to me, as I haven’t even bothered testing this feature in a few years.  I’m sure if you really tried you could devise a scenario where it would outperform the InnoDB default, and it seems that consultants have encountered some installations where it has helped.  However I’m quite certain that in a well-designed modern workload, the query cache is not going to be a good approach.

The obvious alternative solution is to use an external caching system, preferably a L1/L2 cache using APC/memcached or similar.  Combine that with the most excellent MySQL/memcached UDF to intelligently prune your cache and you’ve got a system that easily can achieve performance at high concurrency.

I know what I’m saying here is not exactly breaking news (memcached was created 6 years ago), this really is the standard implementation now… my only point is that any discussion about the query cache that doesn’t suggest an external L1/L2 cache is missing the most important aspect.

Drizzle made the right call to remove the query cache feature entirely, with a plugin interface to optionally support that as needed.  That approach may be too drastic for MySQL itself, which still has some low-end use cases that may be able to get a performance benefit from it (and where the caches may be too much overhead).  In my opinion though, it should be disabled or set to DEMAND by default in a future version (and maybe ship the memcached UDF plugin too!).  Is it too late for 5.4 GA?

5 Responses to “usefulness of the query cache”

  1. ryan Says:

    “But now, Mark’s post goes one step further and reveals that even in a completely read-only environment, the query cache is still not effective”

    To be clear, those are my words, not his. He pointed out that in the comments to Sheeri’s post (http://www.pythian.com/news/4142/is-the-query-cache-useful#comment-377304) that his main interest was in the regression between MySQL versions, not the effectiveness of the query cache itself.

    Still, I read the numbers he presents as pointing to a very strong conclusion.

  2. Mark Callaghan Says:

    My post was very unclear and what I wrote didn’t match my intent. I am not a fan of the query cache, but the point of the post is that worst-case performance has changed. I want to get memcache like performance from InnoDB when using memcache like queries, such as fetch by primary key. I think we can get there.

  3. Sheeri Says:

    Except that my entire post was debunking the fact that Mark’s numbers proved anything about the effectiveness of the query cache.

    Mark’s numbers that turning the query cache on destroyed throughput, but that wasn’t his conclusion. And my post http://www.pythian.com/news/4142/is-the-query-cache-useful showed how Mark’s numbers aren’t to be trusted. So the conclusion that the query cache is useless, based on Mark’s numbers, is EXACTLY what NOT to do.

  4. ryan Says:

    Sheeri:

    Useless is not the right word, I’d prefer the phrase “ineffective for modern workloads” instead. One can easily devise a data access pattern where the query cache would outperform the InnoDB Buffer Pool. Have small amount of queries which do a lot of work in the database — scanning lots of rows, sorting, creating disk temporary tables, and so forth — and repeat those queries many times rapidly.

    But in my opinion, that is not how one should design systems to interact with the database. The idea case is more like the sysbench test in Mark’s post, where the database access pattern involves many small queries at high concurrency. In this design pattern, the query cache will underperform the InnoDB buffer pool, especially in a fast moving table. This is especially true given the well known limitations of the query cache at high concurrency.

    Then for queries which don’t fit this ideal design pattern, you can use a L1/L2 cache external to the database. Then you use intelligent pruning methods (I suggest the memcached UDF) to invalidate only the query set results which are affected by the DML. You still have to be smart about it with your design to make sure you aren’t thrashing the keys in your caches, but this should outperform the MySQL query cache just based on how many results have to be invalidated for every DML statement.

  5. ryan Says:

    Mark:

    To get memcached performance in MySQL is an impressive goal. Using the InnoDB Buffer Pool instead of the query cache seems like the right approach. I don’t want to say it’s impossible, but in addition to the nuts and bolts I see several theoretical obstacles that I see in achieving your end goal:

    1) The most obvious one: persistence. Unless you are talking about a pure read only environment (and maybe you are), your database is going to have to do extra work than a memory cache to persist the DML to disk, vs a SET operation that remains entirely in memory.
    2) MVCC. InnoDB cares about this, while memcached obviously doesn’t have to. You may be able to mitigate this somewhat by using InnoDB transaction modes or HANDLER to get dirty reads if you don’t have that as design requirement.
    3) Protocol. I have not researched this myself, but I have listened much to the Drizzle folks about the lack of optimization in the existing implementation during their discussions in creating a replacement protocol. By contrast, memcached is an extremely simple and fast protocol (especially with UDP) — and when their binary protocol will be stable it promises to be even better.
    4) Authentication. MySQL’s has an authentication while memcached does not — intuitively this has to provide some overhead. This is again something that I have not measured myself, and my thoughts have been influenced by lurking on the Drizzle list.
    5) Binlog/Replication. I know many smart people are hard at work on alternatives and improvements, but right now the binlog replication is a huge impediment. But then again, this presumes you care about replication and I’m not sure about all the details of your use case.

    I say this not trying to dissuade you from pursuing this goal — it’s also possible to implement InnoDB outside of MySQL which overcomes some of those challenges. Any movement in this direction is going to be a huge improvement from the status quo, because incremental improvement means that system designers could increasingly lean less heavily on L2 caches, even if they wouldn’t go away completely. Good luck!

Leave a Reply