Interpreters with loose typing need to carry along a great deal of information about the variable, including its type and value, which need to be checked before it's used. It's no big deal to include a little extra information on whether the variable has been set at some point (a value assigned), or whether it's still undefined. Note that some languages will let you undefine a previously defined variable!
Especially with compiled programs, where the symbol table is no longer available and machine instructions usually directly access the storage for a variable (to load or store it), there is no easy way to tell whether that variable contains random junk, or is a valid set value, just from looking at the bit pattern. The compiler can store a separate flag to indicate the state of the variable (assigned at some point, or unassigned/undefined), but then has to check that flag each time it reads that variable, and remember to set the flag each time it writes to the variable (or, at least once). There are optimization methods to minimize the amount of checking code needed (i.e., check only at the beginning of a basic block, and pass on the information, if it's been assigned, to subsequent blocks), but the program will still end up doing a considerable amount of its time checking whether a variable is valid.
Except for some limited cases, it is not possible at an arbitrary time for a compiler to tell in general whether a variable has a valid value assigned to it. Loops and conditional expressions can make it very difficult to determine when something has been assigned (i.e., there are multiple paths to get to this point, some of which may have assigned the variable, and some not!). A similar problem can arise with constant folding and propagation, where some optimizations have to be left undone because the compiler can't be certain of the path taken to a particular point. The famed Halting Problem is for similar reasons.
A few possible tricks
- Initialize everything: Some compilers simply initialize all variables, whether static or stack-based, to some value such as 0. At least, an erroneous program will be repeatable, but without detection and flagging of undefined variables, the programmer may not realize that there is a problem. A zero, or a cute pattern such as 0xdeadbeef, may be perfectly valid data for ordinary variables, although invalid for pointers, and thus usually not caught.
- Indirect access: Perhaps all variables would be loaded indirectly, through a pointer (rather than directly from their memory location). The pointer would be initialized to 0, and thus caught by runtime support. The first time the variable is written to, the pointer is updated to the actual address. This shifts the code burden for checking from reads to writes (assigns). If a variable is known to be initialized in its declaration, or within the first basic block (and thus definitely determined to be valid), code could instead be generated to directly load the variable.
- Parity checking: This one would be the fastest and slimmest, although it would probably require hardware modifications (as well as parity-checked memory). The idea would be to load all unassigned variables with a bit pattern containing invalid parity bits for each byte. It would be necessary to bypass the normal parity bit generation mechanism. If such a variable is read without having been written to first, the parity error would be detected and runtime support would be alerted. When a value is written to the variable, the normal parity bit generation will be done, so subsequent reads will be without error. It would be possible to undefine a variable by overwriting it with parity-invalid bits, and unused stack locations (including popped locations) would be set to invalid.
Does anyone know of other mechanisms in use for preventing the use of invalid unassigned variable data? Do you have any other proposals?