CTS logo
hazy blue Catskill Mountains in distance


A Thought…

There was the time when Yogi [Berra] and his wife were traveling by car. His wife said, “Yogi, I think we’re lost”, and Yogi responded “Yeah, but we’re making really good time.”

Spreadsheet vs Database

Posted on 2024-03-23 at 16:55:00 by Phil

I was reading a rather interesting article in Ars Technica about the Williams Formula 1 (F1) racing team using Excel to manage its car build: Formula 1 chief appalled to find team using Excel to manage 20000 car parts. Quite a lively discussion ensued in the comments, mostly bashing Excel as totally unsuited for the purpose (which it was), but not really offering a solution.

I get the impression that the chief problem here is something in common for all businesses and government organizations: a spreadsheet is fine for a single user, where the input data from other sources changes slowly enough that a person can keep up with updating it. However, when data from multiple sources is needed, the usual practice is to email the spreadsheet around to other people, and rely on them to update certain cells (often by copy-paste). Needless to say, this is often a recipe for disaster! Not only is the data usually stale by the time it gets back to the spreadsheet owner, but it is too easy to miss an update, corrupt data, or someone is on vacation/holiday and not available to update their data.

In such cases, where data sources are spread over multiple people, not to mention multiple departments and even outside suppliers, the best solution is usually to put the data in a central database that all can tap into. The database of course needs adequate access controls and data integrity checks to keep the data clean and secure. And Excel is not a database.

The problem is, most people don’t want to sit down with some generic IT department analyst, wait months or years for a solution to be developed (and often over multiple iterations until the IT guy understands what you want), and then to pay handsomely for the whole thing. And when a change is needed, rinse and repeat. They want to be able to do it themselves, immediately, and at no cost to their department. Yes, spreadsheets have their own issues, and most people writing them are not computer scientists and can make stupid mistakes (come to think of it, so can computer scientists!) and not implement adequate safeguards on data cleanliness and security (read in the comments about a budget spreadsheet that contained everyone in the company’s salary information). People like spreadsheets because it’s so quick and easy to get the results you need.

For anything sharing data (as inputs or outputs), the data should most often live in a proper database, with proper controls. How then can we provide a spreadsheet-like interface that end-users can easily update in the manner of a spreadsheet? A spreadsheet usually carries its data around with it statically, and requires manual operations to refresh or update that data. It needs some way of tapping into the database to get live data, so that it’s never working with stale data, and can (within limits) make its calculated results available to others (probably through the database). The entire show should look to the database as just another client.

This might be implemented in several ways. One might be as some sort of “glue layer” between the database and the spreadsheet. It would act as a database client, making queries to extract data (including operations such as taking a minimum or maximum of a column), and inserting that data into cells of the spreadsheet, which then act only as a data cache and not as a permanent store. Another way might be to combine the glue layer with a spreadsheet-like user interface, where data storage cells would just be some sort of SQL query to talk to the database (both input and output of data). Actually, there is no reason to restrict it to relational databases — other kinds of databases could work just as well, but would need something other than SQL queries (in any case, these could be supplied by the author of the spreadsheet). The formula entry and cell formatting would be just like any other spreadsheet’s.

Whether existing databases and spreadsheets would work well with this architecture is an open question. Certainly, existing databases should be able to work with the new client, but there is a good possibility that the load on the database would be too high if the spreadsheet is demanding a fresh load of data at each recalculation sweep of the spreadsheet. I don’t know how many databases offer asynchronous signaling to a client of when a given piece of data has changed, and the client should re-query that field. You would like to avoid constantly requerying all the data from the database, but still avoid missing data updates. After all, the basic problem is data going stale. The spreadsheet end of things should refrain from updating any database fields until the recalculation sweep has finished and all values are stable. I don’t know how many spreadsheets permit an outside program to poke data into cells (or to read from those cells), so if such interfaces don’t exist, the combined client-spreadsheet might be the best choice.

Another possibility would be to continue to allow users to work with standalone spreadsheets, but not to use them as long-term data stores. Instead of loading (opening) an existing spreadsheet file, users would run a quick program to generate the latest copy of the data cells, merging it with the cells containing the formulas and presentation, and then load that file into any spreadsheet. The generator would be a database client (it could even query multiple databases). If it can understand a spreadsheet file’s layout well enough, it might be able to update an existing saved spreadsheet file with fresh data, rather than generating the whole thing over and over (requiring more setup by the user, to hold the formulas and presentation separately). This might make it easier for users, who could edit and save spreadsheets in the usual manner, and would only once have to tell the generator where to find various cells’ data.


All content © copyright 2005 – 2024 by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this site remains under the copyright and license of its owners. Catskill Technology Services, LLC does not claim copyright over such software.


This page is https://www.catskilltech.com/utils/show.php?link=spreadsheet-vs-database

Search Quotations database.

Last updated Mon, 25 Mar 2024 at 10:44 AM

Valid HTML 5

Thu, 18 Apr 2024 at 9:37 AM EDT