PDF::Builder
v3.028 Released, 31 December 2025
Please also see the
CPAN listing, the
GitHub entry, and the latest changes list.
One death is a tragedy; one million is a statistic.
— Joseph Stalin
Posted on 2024-03-23 at 16:55:00 by Phil
I was reading a rather interesting article in Ars Technica about the Williams Formula 1 (F1) racing team using Excel to manage its car build: Formula 1 chief appalled to find team using Excel to manage 20000 car parts. Quite a lively discussion ensued in the comments, mostly bashing Excel as totally unsuited for the purpose (which it was), but not really offering a solution.
I get the impression that the chief problem here is something in common for all businesses and government organizations: a spreadsheet is fine for a single user, where the input data from other sources changes slowly enough that a person can keep up with updating it. However, when data from multiple sources is needed, the usual practice is to email the spreadsheet around to other people, and rely on them to update certain cells (often by copy-paste). Needless to say, this is often a recipe for disaster! Not only is the data usually stale by the time it gets back to the spreadsheet owner, but it is too easy to miss an update, corrupt data, or someone is on vacation/holiday and not available to update their data.
In such cases, where data sources are spread over multiple people, not to mention multiple departments and even outside suppliers, the best solution is usually to put the data in a central database that all can tap into. The database of course needs adequate access controls and data integrity checks to keep the data clean and secure. And Excel is not a database.
The problem is, most people don’t want to sit down with some generic IT department analyst, wait months or years for a solution to be developed (and often over multiple iterations until the IT guy understands what you want), and then to pay handsomely for the whole thing. And when a change is needed, rinse and repeat. They want to be able to do it themselves, immediately, and at no cost to their department. Yes, spreadsheets have their own issues, and most people writing them are not computer scientists and can make stupid mistakes (come to think of it, so can computer scientists!) and not implement adequate safeguards on data cleanliness and security (read in the comments about a budget spreadsheet that contained everyone in the company’s salary information). People like spreadsheets because it’s so quick and easy to get the results you need.
For anything sharing data (as inputs or outputs), the data should most often live in a proper database, with proper controls. How then can we provide a spreadsheet-like interface that end-users can easily update in the manner of a spreadsheet? A spreadsheet usually carries its data around with it statically, and requires manual operations to refresh or update that data. It needs some way of tapping into the database to get live data, so that it’s never working with stale data, and can (within limits) make its calculated results available to others (probably through the database). The entire show should look to the database as just another client.
This might be implemented in several ways. One might be as some sort of “glue layer” between the database and the spreadsheet. It would act as a database client, making queries to extract data (including operations such as taking a minimum or maximum of a column), and inserting that data into cells of the spreadsheet, which then act only as a data cache and not as a permanent store. Another way might be to combine the glue layer with a spreadsheet-like user interface, where data storage cells would just be some sort of SQL query to talk to the database (both input and output of data). Actually, there is no reason to restrict it to relational databases — other kinds of databases could work just as well, but would need something other than SQL queries (in any case, these could be supplied by the author of the spreadsheet). The formula entry and cell formatting would be just like any other spreadsheet’s.
Whether existing databases and spreadsheets would work well with this architecture is an open question. Certainly, existing databases should be able to work with the new client, but there is a good possibility that the load on the database would be too high if the spreadsheet is demanding a fresh load of data at each recalculation sweep of the spreadsheet. I don’t know how many databases offer asynchronous signaling to a client of when a given piece of data has changed, and the client should re-query that field. You would like to avoid constantly requerying all the data from the database, but still avoid missing data updates. After all, the basic problem is data going stale. The spreadsheet end of things should refrain from updating any database fields until the recalculation sweep has finished and all values are stable. I don’t know how many spreadsheets permit an outside program to poke data into cells (or to read from those cells), so if such interfaces don’t exist, the combined client-spreadsheet might be the best choice.
Another possibility would be to continue to allow users to work with standalone spreadsheets, but not to use them as long-term data stores. Instead of loading (opening) an existing spreadsheet file, users would run a quick program to generate the latest copy of the data cells, merging it with the cells containing the formulas and presentation, and then load that file into any spreadsheet. The generator would be a database client (it could even query multiple databases). If it can understand a spreadsheet file’s layout well enough, it might be able to update an existing saved spreadsheet file with fresh data, rather than generating the whole thing over and over (requiring more setup by the user, to hold the formulas and presentation separately). This might make it easier for users, who could edit and save spreadsheets in the usual manner, and would only once have to tell the generator where to find various cells’ data.
Posted on 2026-04-02 at 14:57:00 by Phil
One hazard of putting fresh data (from a database) into a standard spreadsheet is that some users will allow their data to grow stale. They will happily grab data and work with it for months, never refreshing it. This means that whatever conclusions they draw and present are outdated and potentially dangerously wrong.
This means that some mechanism is needed to guarantee that any time a spreadsheet is presented, its data is updated (refreshed). We don’t want this to present too heavy a load on the database, and we may not want to use new data that has not been checked over in some manner. You wouldn’t want to be presenting to the Board of Directors and an erroneous update creeps in via a just-made database update that blows up the whole show! It is reasonable to permit stable data that has been error-checked in some manner, but not to use very stale data.
Certainly the spreadsheet could include a field that states how long it has been since its data has been refreshed, and users should be in the habit of looking for this age field before accepting results and conclusions. Of course, this could be faked by someone, but there are limits to how well a system can detect such things. At some point you have to trust that someone presenting to you isn’t trying to deceive you!
What is the proper balance between stable data that has been checked or verified in some manner, placing an extra load on the database; and allowing someone to work with old, stale data in a saved spreadsheet? It may be as simple as working with two or more generations of data in the database, one the very latest (but not necessarily validated), and other(s) that are vetted and checked in some manner. A spreadsheet drawing data might be told to use one of the older data sets rather than the latest and greatest. Or, just have a process to not to allow data into the database until it’s been checked (vetted) in some manner (possibly internal versioning of data, so the newest data is hidden from users until validated).
If you wish to contribute to this discussion in a constructive manner, please email your comment to this discussion group.Include which discussion thread title you are responding to, and the nom de plume you would like to be listed under (we allow only one email address per user name, and vice versa). Due to the large amounts of spam received in the past, as well as abuse of posting privileges (e.g., attacking others), all posts must be received by email and will be individually vetted. If we deem it a useful, factual, and polite post, we will enter it. Anything ugly will result in your being permanently blacklisted. All decisions are made by the management of CTS and are final. We reserve the right to discard submissions without any feedback to you, and to fix errors in spelling and grammar in something we post. Please give a minimum of a few days for us to review and post your entry. Don’t be impatient and resubmit!
All content © copyright 2005 – 2026
by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this
site remains under the copyright and license of its owners.
Catskill Technology Services, LLC does not claim copyright over such software.
This page is https://www.catskilltech.com/utils/show.php?link=spreadsheet-vs-database
Search Quotations database.
Last updated Thu, 02 Apr 2026 at 2:57 PM