I was sold on the religion of which I was a member when one of the priests told me, “God gave you a brain; we expect you to use it.”
— dflak, Non Sequitur comic letters
Posted on 2017-Jan-06 at 15:39:48 by Phil
The primary cause of run-time problems in C code, has been (in my experience), mistakes in the handling of pointers. Here are some, ah, pointers on good code practices.
No, this has nothing to do with BDSM, whips with pointy tips, and the such. If you came here expecting such a discussion, sorry, but this is a lot more boring! The C language is happy to provide a million ways to hang yourself, and provides plenty of rope to do it[1]. The price of high performance is low run-time overhead, and C assumes that you are a grownup who knows what you are doing. Unfortunately, most programmers don’t quite know what they’re doing, especially when it comes to manipulating pointers. This leads to many, often subtle, run-time problems that could be avoided had proper care been taken.
This is not to say that most programmers are incompetent; it’s just that they have never been taught the fine points of pointer usage. This article will attempt to deal with that problem with the discussion of a number of trouble spots.
Some of these problems can be avoided by the use of macros and/or function wrappers to protect the programmer, some may require a new preprocessor step, and some may just be too restrictive to warrant anything beyond a checklist of things to watch out for. People work in C because they want blazing speed, and are leery of adding overhead for the purpose of protecting themselves. Fair enough, but one way or the other, you need to be aware of these things and to avoid falling into various traps.
Misuse of pointers falls into three main categories:
The most common cause of dangling references and memory leaks is improper handling of pointers in heap allocation. There are three safeguards you can either implement in macros/functions, or just remember to check off on your list:
Here is a bit of code illustrating macros or function wrappers (the choice is yours) to handle heap pointers:
And here is how it might expand (macros) or be hand-written by a careful coder:
Notice that ptrA is initialized to NULL, so that if it is accidentally used before being assigned, the run-time will choke on it. Before being assigned by malloc(), it is confirmed to still be NULL (preventing a memory leak). The macro or wrapper checks that the malloc() was successful. When we’re done with this heap memory use, not only is the memory freed, but ptrA is reset to NULL, so you can’t accidentally keep using it to point to that now-gone allocation (preventing a dangling reference). Also, being NULL, it is eligible to be assigned again by another allocation.
Unfortunately, it is not uncommon to see code where a pointer has been assigned in a deeper block (either a function call, or a nested { } code block) and not handled properly. This can cause two problems:
The two cases above are sort of mirror images of the other — you either are successfully returning a pointer to something that no longer exists, or you are allocating something on the heap, and failing to return that pointer before it goes out of scope. It’s very easy to have a heap allocation pointer, and forget to free it when returning from a block (we get used to having run-time take care of such things with automatic variables).
Say you malloc() a block on the heap, and assign it to ptrA. While in the process of writing or reading that chunk of memory (e.g., by ptrA++ while traversing the array), you alter ptrA’s value. Unless you have saved it, and restore it when done, ptrA is no longer valid as a heap pointer! At best, it will point to somewhere within that allocated block, and can’t be used to free() the block later.
It is better to declare a scratch or working pointer to be modified in use, and be careful to preserve the original pointer as a read-only value. Unfortunately, you can’t declare ptrA to be a const, as you still need to write to it several times after it’s declared and initialized. This one will probably just require extra care in coding, rather than being able to rely on the compiler or some sort of lint to discover that you’ve written to this variable where and when you shouldn’t have. Perhaps a new preprocessor could detect writes outside of HEAP, MALLOC/CALLOC, and FREE macros (e.g., ptrA =), but that’s a lot of work and might not catch everything.
The previous section discussed the use of alias pointers in lieu of modifying the primary pointer to a heap allocation. This also applies to pointer aliases to things like strings and array (not just heap allocations) — you want to be careful not to destroy the only pointer you have to the string or array. Sometimes it’s easy enough to re-create the pointer, but if you’re in another function or some deeper block (where the original item is not currently accessible), it may not be so easy. Then, you would want to create another pointer that you don’t mind incrementing or decrementing, as you can always get back to the start with the original pointer.
Let’s say you have your primary pointer to something, and you’ve created one or more alias pointers for working purposes (i.e., they will likely be changed during usage, such as walking through a string or array). What are the rules for these alias pointers?
A new preprocessor might have an ALIAS macro and a POINTER macro to automatically initialize to NULL, and possibly to detect certain situations and insert code to check for them.
Our final look (for now) is when using pointers to traverse something (variable or heap allocation) and you accidentally run off the end of it. With a smart enough preprocessor, it might be possible to insert code to check where you are within the block of memory, and stop things if you’ve gone too far and are in foreign territory. However, in most cases, the (possibly substantial) overhead may be deemed not worth it, and you have to rely on careful coding to avoid such problems.
Some of the things discussed here could be implemented with macros and/or function wrappers, some might be handled with a new preprocessor pass, and still others just require careful attention to detail. You could simply be aware of all these pitfalls, and print yourself up a checklist of things to look for at some point in the desk check or formal code review.
Posted on 2017-May-08 at 11:14:25 by Phil
A preprocessor might call a check routine at the end of scope for a declared pointer value, and issue a run time warning if a non-NULL pointer is about to go out of scope (i.e., become a memory leak). This could also be done manually, but would depend upon the vigilance of the programmer (remember to test all pointers). For heap pointers, they should be NULL at exit, having either been freed, or copied back up to the caller (preserved) and set to NULL (not freed, though) in this routine (to avoid tripping the leak test). If the heap pointer itself is a parameter, it may need a flag to tell the preprocessor not to worry about the block ending with a non-NULL value. For other pointers, it may or may not matter that they are going out of scope with values.
The safest implementation would be to always make such a check, but it would not be unreasonable (for performance reasons) to only do this during testing and debugging, and hope that all problems have been caught by this time!
Posted on 2018-Oct-05 at 22:11:31 by Phil
Here is a proposal for safe pointer usage that could involve macros, subroutines, language extensions, and/or “lint”-like checking. Naturally, the less that is done at run time, the less speed penalty there is for checking, and the less that needs programmer cooperation to put in and use special routines, the more likely it is that the right things will be done.
The first thing is to separate out the heap pointers (allocated blocks) from other pointer usage. These will be special pointers, in the sense that they are very restricted in their usage, to avoid getting into trouble. It might have its own name, such as HeapPointer. A heap pointer should only be used for one thing: keeping track of an allocated heap block. If it is a macro, it would translate to a type of void * — no other types are allowed (this restriction might be relaxed, if the macro or external preprocessing permits passing in an arbitrary type). When declared, it must always be initialized to NULL. Only two operations are permitted:
A HeapPointer would be ideal for an object, if a language permits it, as the limited suite of operations can be implemented as methods, and the value hidden (a “get” or “copy” method would be needed so that the value could be copied to a working pointer). A HeapPointer may never be assigned to, outside of an allocation routine/macro, or a free. It should not be tampered with because of the risk of losing contact with the block of allocated memory on the heap.
In addition to allocation and freeing, a HeapPointer gets special treatment upon going out of scope (end of block, exit, return, long jump, etc.). At that time, if it has a non-NULL value, it must be freed unless the programmer has somehow informed the checking system that the pointer has been transferred to a parameter, and is being returned to a higher level routine. Perhaps this could be done with a HeapPointerCopy routine to copy a HeapPointer to a parameter. Freeing memory blocks (unless the pointer to a block is being transferred to another routine) at out-of-scope should prevent more memory leaks. Ideally, the parameter at a higher (or lower??) level routine should somehow be tagged as a HeapPointer, so the same rules about allocating and freeing can be applied. Unfortunately, this could really complicate things, unless a full analysis of data flow across multiple routines (and possibly multiple files) is done to check that HeapPointers are handled with absolute safety (a “lint”-like program).
Once heap pointers have been dealt with, what’s left are ordinary
working pointers. Other than not being permitted to be allocated or freed (no
heap blocks), it’s difficult to analyze and restrict their behavior.
Simply assigning the address of a variable (e.g., ptr =
&someVar;
) gets you into trouble with someVar
going
out-of-scope but ptr
is still being used somewhere, possibly
returned to a calling routine! This is somewhat the opposite to a HeapPointer
— what the pointer is pointing to is nonpersistent (and the pointer is
possibly persistent) rather than the other way around. If the pointer is
(initially) a copy of a HeapPointer, that heap block may be automatically
freed even if the working pointer is returned to another routine! This would
still take some care on the part of the programmer, and might not be possible to
automatically detect.
Working pointers are still subject to dangling references, if the underlying heap block is freed. It might be possible to tie a set of working pointers to a HeapPointer, in effect saying, “these pointers are going to be working in an allocated heap block pointed to by this HeapPointer. If that block is freed, automatically NULL out all these working pointers.” That in turn could cause its own failures, but at least there would be less of a chance of invalid pointers continuing to be used. It should be OK for a working pointer to go out of scope (and itself disappear) while holding a non-NULL value, as it’s not responsible for a heap block.
Finally, there is the age-old problem with pointers of running off either end of the data they’re pointing to (such as an array). The only way to really keep this under control is to know in advance what the pointer is playing with and its size, and check the pointer against its current bounds. This bound checking has been available for a long time, but usually requires programmer cooperation, and has to execute at run time (a large time penalty). A related problem is buffer overflow, where data can be written beyond the extent of the space intended to receive it, but it will not be covered here, as the tools to prevent it are almost always available.
A language with proper heap and pointer management built right in, without the programmer having to remember to clean up after themselves, might well be a better deal than using something like C. A “lint” to analyze pointer usage across routines and files (if it’s even possible) would be helpful, but providing tools and procedures for a careful programmer to follow would be better than nothing at all. Even providing pointers as objects, with limited operations and bound checking built into the methods, would be an improvement, as a language like C provides so much rope for you to hang yourself with. If you’re stuck with C (and no objects), about the best you can do is use some macros, preprocessors, limited check utilities, and a lot of care in writing code to properly use these facilities!
All content © copyright 2005 – 2025
by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this
site remains under the copyright and license of its owners.
Catskill Technology Services, LLC does not claim copyright over such software.
This page is https://www.catskilltech.com/utils/show.php?link=pointer-discipline-in-c
Search Quotations database.
Last updated Sat, 28 Dec 2024 at 11:29 PM