CTS logo
hazy blue Catskill Mountains in distance

Detecting unset variables

Posted on 2017-Apr-21 at 12:47:54 by Phil

A major problem in all languages (compiled or interpreted) is detecting when the programmer has used a variable without putting a value into it. This discussion isn’t about whether a value is valid for the problem set (e.g., Methuselah is stated to be 969 years old, rather than the actual 969 months), but whether the variable’s value has been set at all, or whether it’s just random junk.

Interpreters with loose typing need to carry along a great deal of information about the variable, including its type and value, which need to be checked before it’s used. It’s no big deal to include a little extra information on whether the variable has been set at some point (a value assigned), or whether it’s still undefined. Note that some languages will let you undefine a previously defined variable!

Especially with compiled programs, where the symbol table is no longer available and machine instructions usually directly access the storage for a variable (to load or store it), there is no easy way to tell whether that variable contains random junk, or is a valid set value, just from looking at the bit pattern. The compiler can store a separate flag to indicate the state of the variable (assigned at some point, or unassigned/undefined), but then has to check that flag each time it reads that variable, and remember to set the flag each time it writes to the variable (or, at least once). There are optimization methods to minimize the amount of checking code needed (i.e., check only at the beginning of a basic block, and pass on the information, if it’s been assigned, to subsequent blocks), but the program will still end up doing a considerable amount of its time checking whether a variable is valid.

Except for some limited cases, it is not possible at an arbitrary time for a compiler to tell in general whether a variable has a valid value assigned to it. Loops and conditional expressions can make it very difficult to determine when something has been assigned (i.e., there are multiple paths to get to this point, some of which may have assigned the variable, and some not!). A similar problem can arise with constant folding and propagation, where some optimizations have to be left undone because the compiler can’t be certain of the path taken to a particular point. The famed Halting Problem is for similar reasons.

A few possible tricks

  • Initialize everything: Some compilers simply initialize all variables, whether static or stack-based, to some value such as 0. At least, an erroneous program will be repeatable, but without detection and flagging of undefined variables, the programmer may not realize that there is a problem. A zero, or a cute pattern such as 0xdeadbeef, may be perfectly valid data for ordinary variables, although invalid for pointers, and thus usually not caught.
  • Indirect access: Perhaps all variables would be loaded indirectly, through a pointer (rather than directly from their memory location). The pointer would be initialized to 0, and thus caught by runtime support. The first time the variable is written to, the pointer is updated to the actual address. This shifts the code burden for checking from reads to writes (assigns). If a variable is known to be initialized in its declaration, or within the first basic block (and thus definitely determined to be valid), code could instead be generated to directly load the variable.
  • Parity checking: This one would be the fastest and slimmest, although it would probably require hardware modifications (as well as parity-checked memory). The idea would be to load all unassigned variables with a bit pattern containing invalid parity bits for each byte. It would be necessary to bypass the normal parity bit generation mechanism. If such a variable is read without having been written to first, the parity error would be detected and runtime support would be alerted. When a value is written to the variable, the normal parity bit generation will be done, so subsequent reads will be without error. It would be possible to undefine a variable by overwriting it with parity-invalid bits, and unused stack locations (including popped locations) would be set to invalid.

Does anyone know of other mechanisms in use for preventing the use of invalid unassigned variable data? Do you have any other proposals?


Posted on 2017-Apr-22 at 09:08:17 by sciurius

  • I’ve been working (in the past) with programming languages that always supplied a defined default value (zero) to variables. There was simply no such thing as an undefined variable.
  • Some hardware architectures have more flag bits for purposes like this. For example, the Burroughs mainframes had 51 bit words, of which 48 were data bits.

Posted on 2017-Apr-22 at 19:57:37 by Phil

Well, it’s still undefined/unset, just that it has a consistent value now (0). It still doesn’t solve the problem that you failed to assign a legitimate value at some point, and have garbage in your calculations.

The Intel 8087 numeric coprocesssor and its descendants I think also had some extra bits for declaring NaN, infinities, etc., or possibly it just reserved some bit patterns?


Posted on 2017-Apr-23 at 15:07:49 by sciurius

Well, it’s still undefined/unset, just that it has a consistent value now (0). It still doesn’t solve the problem that you failed to assign a legitimate value at some point, and have garbage in your calculations.

No, you don’t have garbage. But it takes a short while to get used to the “default zero" paradigm.
Whether this is good from an educational point of view is subject to debate.

The Intel 8087 numeric coprocesssor and its descendants I think also had some extra bits for declaring NaN, infinities, etc., or possibly it just reserved some bit patterns?

NaN and friends are defined by IEEE 754.


Posted on Apr 24 08:54:29 2017 by Phil

One man’s trash is another man’s treasure. I still call it [default 0 value] garbage simply because that might not be what you intended when you devised the algorithm and wrote the code. What if the variable were to be used as an index value, starting at 1, and you forgot to give it a value, leaving it at 0? That could be a very subtle error to debug. To rely on the compiler to automatically initialize variables is a very dangerous habit to get into, even if they are to be 0.

The 8087 actually came out before IEEE 754 was finally codified, and mostly (but not fully) implements the standard.


Posted on 2017-Apr-24 at 14:55:54 by sciurius

One man’s trash is another man’s treasure. I still call it [default 0 value] garbage simply because that might not be what you intended when you devised the algorithm and wrote the code. What if the variable were to be used as an index value, starting at 1, and you forgot to give it a value, leaving it at 0?

For this, we have 0-based indices :) .

But seriously, I consider it decent programming to explicitly initialize variables. I like the way Perl uses the OOB value ’undefined’. Several programming languages allow trapping undefined values (by generating exceptions) but often you cannot check whether a variable is (still) undefined.

 

All content © copyright 2005 – 2025 by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this site remains under the copyright and license of its owners. Catskill Technology Services, LLC does not claim copyright over such software.

 

This page is https://www.catskilltech.com/utils/show.php?link=detecting-unset-variables

Search Quotations database.

Last updated Sat, 28 Dec 2024 at 11:29 PM

Valid HTML 5

Mon, 13 Jan 2025 at 4:13 AM EST