Why is running your data warehouse and your OLTP systems on the same server (instance) a bad idea?
This is the 9th post in a series of twelve in Tim Fords Entry-Level Content Challenge.
In this blog post, I will explain the basics of IO and the buffer cache/pool in a database and a way to measure this for queries in SQL Server. Then I will explain OLTP (OnLine Transactions Processing) and data warehouse type workloads. Finally, I will illustrate why mixing these two types of workloads on a single database instance is a bad idea.
IO and the buffer cache
One of the optimizations a RDBMS uses to deliver query results faster, is the use of a memory cache called the “buffer cache,” or "buffer pool." Reading a block of data from memory is typically measured in microseconds, whereas reading a block of data from disk takes milliseconds. Internally in SQL server, a row is never persisted to disk alone, but always in blocks of 8k (also called pages). The buffer cache is a piece of memory, which holds these pages after they have been read from disk. The size of the buffer cache is determined at startup time for the SQL Server engine, but it is limited by the available physical memory in the server.
We can query the DMV (Dynamic Management View) sys.dm_os_sys_info to get information about physical memory and the maximum and current size of the buffer cache (there are more details to these numbers, please take a look at the documentation to get the full picture):
Using another DMV, sys.dm_os_buffer_descriptors, we can also peek into what is currently stored in the buffer cache:
And we can create a report on how the pages from different databases are represented in the buffer cache (this report was run just after starting up my instance, so not all user databases have pages in the buffer cache):
Using the command
SET STATISTICS IO ON
in SSMS, it is possible to measure for a query how many pages are read from cache, and how many pages are read from disk. In the following example, we read all the rows from the Orders table:
SELECT * FROM Sales.Orders
The output from the statistics command is written to the command tab (I have only slightly formatted the text for better visibility):
We can see, that 692-670=22 pages were read from the buffer cache (shown as logical reads), and 3+667=670 pages (shown as “physical reads” and “read-ahead reads”) were read from disk (and possibly cached in the buffer cache afterwards.)
Running the query again, we get a new output from the statistics command:
Now, we see that all 692 pages are read from the buffer cache.
Least Recently Used algorithm
SQL Server uses an algorithm called Least Recently Used (LRU) to determine which pages should stay in the cache. A consequence of the LRU algorithm is that pages in the buffer cache are kept longer in the cache if they are being used by queries. This is important when trying to understand database workloads.
Workloads in an OLTP system is characterized by having many transactions with a small scope. Think of an ERP system like Dynamics NAV, where you look up a customer (data will probably be fetched using an index range scan followed by an index seek), add an order (dml with a couple of single row insert statements). Or looking up an item type in the warehouse system, and change the number of available items (probably an index range scan followed by an index seek and dml with a single row update statement). After some usage of the system, warm parts of the data will reside in the buffer cache, and user operations will mostly result in index seeks and small insert/update transactions.
Data warehouse workloads
When running data warehouse queries, we are typically interested in all data available, maybe filtered on the current financial year. In a data warehouse, data is often organized in fact tables with business transaction data (eg. sales order lines), and dimension tables with master data (eg. customer or product). There is typically much more data in the fact tables than in the dimension tables. Many queries will be aggregations grouped by dimensions, something along the lines “give me the total number of products sold and total amount in DKK and USD of all products sold in this fiscal year, grouped by Month and Product category.” After some usage of the data warehouse, dimension data will reside in the buffer cache, and data warehouse queries will mostly result in full table/index scans of fact tables.
Mixing workload types
Mixing OLTP and data warehouse workload types in the same SQL Server instance is not a good idea. Why? Well, because the workload types don’t mix. As described above, an ideal buffer cache for an OLTP system contains pages used in searches as well as pages for the most used customer/items etc. Running a few data warehouse queries will change the buffer cache to contain (mostly) all dimension data. This will make OLTP operations slow. On the other hand, the ideal buffer cache in a data warehouse will contain all dimension data, which makes aggregations fast because all dimension lookups will be done in memory. OLTP operations might flush out parts of this dimension data from the cache, hurting the performance for data warehouse queries. The two workload types will keep on fighting about the content of the buffer cache, making both systems suffer.
So: please separate databases for your OLTP systems and data warehouse into dedicated instances (with their own buffer cache.) You probably want to run them on separate servers as well, since the optimal CPU load is very different for the two. But that’s another story…
Thanks for reading along, I hope that you liked it. Stay tuned…
Latency Numbers Every Programmer Should Know:
About the buffer cache in SQL Server:
Deep dive into the internals of the LRU-K algorithm in this answer from my former colleague Martin Schmidt here: