The (Microsoft?) data platform anno 2026 (T-SQL Tuesday #100)
Picture taken from https://www.pexels.com/photo/crystal-ball-in-hands-6101/
This month’s host is Adam himself, and the topic is: Looking Forward 100 Months: Your mission for month #100 is to put on your speculative shades and forecast the bright future ahead. Tell us what the world will be like when T-SQL Tuesday #200 comes to pass. If you’d like to describe your vision and back it up with points, trends, data, or whatever, that’s fine. If you’d like to have a bit more fun and maybe go slightly more science fiction than science, come up with a potential topic for the month and create a technical post that describes an exciting new feature or technology to your audience of June 2026. (Did I do the math properly there?)
Wow, what a cool topic. I commute 12 km on my bicycle from home to work at Microsoft in Lyngby, and I typically spend that time reflecting on work, life, the past and the future. Last week, I spent my commute time letting my mind wander and reflect on this topic. Most of what I discuss in the following is rooted in experience, I have had in data-related projects over the last 10+ years.
So, in 2026 we will see databases based on a non-system R system design. Central in System R is the concept of a query optimizer, which will pre-compile (and cache) execution plans for queries based on dynamic programming. Last year at the SQL Nexus 2017 conference, I saw a talk by professor Fritz Henglein present recent work in functional language research, where they used lazy evaluation features in the Haskell programming language to produce blazingly fast query performance on certain types of queries, without the need for an optimization phase or B-tree indexes. Optimization was kind-of built in to the engine. This is just one example of modern methods that can challenge the System R design, remember that hardware looked quite differently back in the 1970-ties: memory was a scarce resource, (disk) I/O was slow, CPUs didn’t have the advanced caching we see today. Another example that we already have in SQL server is in the Hekaton (in-memory OLTP) engine, where the page size is optimized for modern CPU cache sizes (moving from 8k pages to a 1M page size)
We will see true set-based databases, not the multi-set approach that was chosen for SQL. When quantum computing kicks in. Because quantum programmers will have a foundation in physics and mathematics, they will naturally ask for database products that can handle as sets first-class citizens. Possibly even infinite (non-countable) sets. And vector spaces… And Hilbert spaces… and and and…
Database vendors will also (finally) start to add concepts related to data modelling in their products. We will get domains (app user, language, country, …) as first class citizens in database schemas. This will enable tool support out of the box for things like full-stack validation rules and standard widgets. We will also see support for metadata annotation on single data elements (just like what cameras add to a photo file). It will be possible to define concepts like data classification and retention policies as metadata and let the database handle how long data can be kept before it is deleted or de-linked/anonymized. And backups would respect this – when a backup is restored, retention policies are applied before it gets online.
Have you ever designed a system that needed to have support for multi language texts? The static texts are easy, this is already part of the .net framework. But how do you design a database, where a given texts can be in represented in several languages. A product like Apache Cassandra supports wide rows, where the number of columns can vary between rows (in the same table). But relational databases anno 2026 will allow database designers with the possibility to annotate nvarchar attributes with the language(s) the text is/can be in. When inserting data, (string, language) pairs are specified in the INSERT clause, and in SELECT statements, language must be specified to retrieve the language needed.
Finally, the concept of database will be diluted, we will be talking about data swarms – data truly distributed in cloud services, edge devices, laptops, on-prem databases, all mixed, but with data classification applied on every data element, users are able to trust that the grid of query engines will respect if data is allowed to cross boundaries (e.g. personal, close circles of trust, company, region)
That’s it for this month’s T-SQL Tuesday. Please reread this post in 100 months, and tell me how we did…
Read more about T-SQL Tuesday #100 here:
Read an example of Fritz Henglein’s research here:
Fritz Henglein: Optimizing relational algebra operations using generic equivalence discriminators and lazy products, in: Proceeding PEPM '10 Proceedings of the 2010 ACM SIGPLAN workshop on Partial evaluation and program manipulation, Pages 73-82, https://dl.acm.org/citation.cfm?id=1706372
Read more about different temporal models here: