CTS logo
hazy blue Catskill Mountains in distance

News:

Give our new Discussions area a try!

PDF::Builder v3.024 Released, 12 September 2022
   Please see the CPAN listing, GitHub entry.

PDF::Table v1.003 Released, 05 July 2022
   Please see the CPAN listing, as well as the GitHub entry.


A Thought…

To his dog, every man is Napoleon; hence the constant popularity of dogs.

   — Aldous Huxley

Pointer discipline in C

Posted on 2017-Jan-06 at 15:39:48 by Phil

The primary cause of run-time problems in C code, has been (in my experience), mistakes in the handling of pointers. Here are some, ah, pointers on good code practices.

No, this has nothing to do with BDSM, whips with pointy tips, and the such. If you came here expecting such a discussion, sorry, but this is a lot more boring! The C language is happy to provide a million ways to hang yourself, and provides plenty of rope to do it[1]. The price of high performance is low run-time overhead, and C assumes that you are a grownup who knows what you are doing. Unfortunately, most programmers don’t quite know what they’re doing, especially when it comes to manipulating pointers. This leads to many, often subtle, run-time problems that could be avoided had proper care been taken.

This is not to say that most programmers are incompetent; it’s just that they have never been taught the fine points of pointer usage. This article will attempt to deal with that problem with the discussion of a number of trouble spots.

Some of these problems can be avoided by the use of macros and/or function wrappers to protect the programmer, some may require a new preprocessor step, and some may just be too restrictive to warrant anything beyond a checklist of things to watch out for. People work in C because they want blazing speed, and are leery of adding overhead for the purpose of protecting themselves. Fair enough, but one way or the other, you need to be aware of these things and to avoid falling into various traps.

What Kinds of Problems?

Misuse of pointers falls into three main categories:

  1. Leaving dangling references when the target of the pointer has gone away in one way or another (e.g., you still have a pointer to a heap variable even after you've freed the allocation). The danger here is that you use that pointer reference again without realizing that it no longer points to what you thought it did, and end up reading or writing memory that you didn’t intend to. As the pointer address (what it’s pointing to) likely belongs to your process or program, run-time support is unlikely to flag it.
  2. Causing memory leaks by destroying or changing a pointer too soon, while it’s still pointing to something (especially something on the heap). Essentially, you lose contact with your targeted memory. At that point, with no-one responsible for it, that memory just sits there used and unavailable. The heap manager isn’t aware that this memory is no longer needed, and can’t recover it (there is no garbage collection in C). Usually, this memory leak is finally recovered when the program or process ends, but in the meantime, multiple memory leaks can put quite a bit of heap space (and virtual memory) out of action.
  3. Running off the end of a string or array, and reading or writing where you shouldn’t be doing so. This is related to the buffer overflow problem where a user input is not length-constrained, and can overwrite memory well beyond the allocated buffer (whether on heap or a variable). This is a favorite exploit by hackers, as they can figure out where to put executable code in their input, so that it will end up being executed (overwrites part of the code segment).

Heap Allocation

The most common cause of dangling references and memory leaks is improper handling of pointers in heap allocation. There are three safeguards you can either implement in macros/functions, or just remember to check off on your list:

  1. Variables declared as pointers should always be initialized to some reasonable initial value, no matter how they will be used. If they’re going to be set to point to some object, you can do that in the declaration, if you wish. Otherwise, all pointers should be initialized to NULL. A NULL value tells the run-time support that this pointer has not been set to point to anything, and should be flagged as an error. If you don’t initialize a pointer, and it has some non-zero garbage in it, run-time may or may not pick it up (Segment Violation or Bus Error). If you’re unlucky, it points to somewhere in your own data (read/write memory) and won’t be caught, resulting in a difficult-to-debug error. So, to summarize, all pointers should have either a NULL value, or point to a real target, at all times.
  2. As a corollary to the previous rule, all pointer variables no longer in use should be reset to NULL. This must be done immediately after a free() call, or anything else that removes the targeted memory from use. The pointer then is visible to run-time support as invalid, ready to be reused.
  3. Before allocating a block of the heap (malloc(), calloc(), etc.), check that the pointer variable that will receive the address is NULL. This indicates that it is currently unused, and therefore safe to overwrite. If you should overwrite a pointer variable that has a non-NULL value, especially if you’re following the first two rules, there’s a good chance that you just orphaned a heap block and created a memory leak. So, by only allowing NULL pointer variables to be written to, you reduce the chances of losing heap allocations.

Here is a bit of code illustrating macros or function wrappers (the choice is yours) to handle heap pointers:

HEAP int *ptrA ⋮ MALLOC(ptrA, 'int *', 'int', 1024) /* ptrA will be 1024 integers */ ⋮ /* READ ptrA (never write) for various things */ ⋮ /* done with ptrA */ FREE(ptrA)

And here is how it might expand (macros) or be hand-written by a careful coder:

/* heap pointer */ int *ptrA = NULL; ⋮ if (ptrA != NULL) fatal_error("memleak", __FILE__, __LINE__); ptrA = (int *)malloc(1024 * sizeof(int)); /* ptrA will be 1024 integers */ if (ptrA == NULL) fatal_error("mallocfail", __FILE__, __LINE__); ⋮ /* READ ptrA (never write) for various things */ ⋮ /* done with ptrA */ free(ptrA); ptrA = NULL;

Notice that ptrA is initialized to NULL, so that if it is accidentally used before being assigned, the run-time will choke on it. Before being assigned by malloc(), it is confirmed to still be NULL (preventing a memory leak). The macro or wrapper checks that the malloc() was successful. When we’re done with this heap memory use, not only is the memory freed, but ptrA is reset to NULL, so you can’t accidentally keep using it to point to that now-gone allocation (preventing a dangling reference). Also, being NULL, it is eligible to be assigned again by another allocation.

Pointers in Deeper Blocks

Unfortunately, it is not uncommon to see code where a pointer has been assigned in a deeper block (either a function call, or a nested { } code block) and not handled properly. This can cause two problems:

  1. The pointer is to an automatic variable, living on the stack, and that pointer’s target goes out of scope and gets overwritten when control returns from that block! The pointer works perfectly well while control is still within that block, but as soon as you leave the block, that stack-based variable may be overwritten. This error can be quite insidious, as the target may not be overwritten immediately, and run-time won’t see anything wrong with a pointer within your stack (unless possibly it checks for a pointer outside the active stack). It’s quite easy to accidentally do this, forgetting that the pointer^rsquo;s target’s lifetime is not eternal. Basically, this is a dangling reference.
  2. The pointer is to an allocation on the heap, and you neglect to pass that pointer back to a less-deeply nested block before it goes out of scope. That heap allocation has now been orphaned, and is a memory leak. Don’t forget, when setting up a function’s parameters to return the pointer, that you have to pass the address of the pointer to the function, and not the pointer value itself[2]. A corollary to this is that you’ve allocated space on the heap for some purpose, but neglect to free() it before going out of scope.

The two cases above are sort of mirror images of the other — you either are successfully returning a pointer to something that no longer exists, or you are allocating something on the heap, and failing to return that pointer before it goes out of scope. It’s very easy to have a heap allocation pointer, and forget to free it when returning from a block (we get used to having run-time take care of such things with automatic variables).

Overwriting Heap Allocation Pointers

Say you malloc() a block on the heap, and assign it to ptrA. While in the process of writing or reading that chunk of memory (e.g., by ptrA++ while traversing the array), you alter ptrA’s value. Unless you have saved it, and restore it when done, ptrA is no longer valid as a heap pointer! At best, it will point to somewhere within that allocated block, and can’t be used to free() the block later.

It is better to declare a scratch or working pointer to be modified in use, and be careful to preserve the original pointer as a read-only value. Unfortunately, you can’t declare ptrA to be a const, as you still need to write to it several times after it’s declared and initialized. This one will probably just require extra care in coding, rather than being able to rely on the compiler or some sort of lint to discover that you’ve written to this variable where and when you shouldn’t have. Perhaps a new preprocessor could detect writes outside of HEAP, MALLOC/CALLOC, and FREE macros (e.g., ptrA =), but that’s a lot of work and might not catch everything.

Pointer Aliases

The previous section discussed the use of alias pointers in lieu of modifying the primary pointer to a heap allocation. This also applies to pointer aliases to things like strings and array (not just heap allocations) — you want to be careful not to destroy the only pointer you have to the string or array. Sometimes it’s easy enough to re-create the pointer, but if you’re in another function or some deeper block (where the original item is not currently accessible), it may not be so easy. Then, you would want to create another pointer that you don’t mind incrementing or decrementing, as you can always get back to the start with the original pointer.

Let’s say you have your primary pointer to something, and you’ve created one or more alias pointers for working purposes (i.e., they will likely be changed during usage, such as walking through a string or array). What are the rules for these alias pointers?

  1. Like all other pointers, they should be initialized to NULL or to some real value, to prevent random garbage in them being interpreted as a valid pointer address.
  2. If they are an alias for a heap pointer, when the heap pointer is freed (and set to NULL), all aliases (modified or not) for that heap pointer should also be set to NULL (but not freed — that’s already been done!).
  3. Aliases for pointers other than heap pointers should follow the lead of their primaries when it comes to resetting them to NULL or some other value when they are no longer of use. If control is returning or falling off the end of a block, a local pointer alias that is going out of scope doesn’t need to be explicitly NULLed. Pointer aliases inherited from a higher-level block should be NULLed to indicate that they are free to be reassigned.

A new preprocessor might have an ALIAS macro and a POINTER macro to automatically initialize to NULL, and possibly to detect certain situations and insert code to check for them.

Buffer Overflows and the Like

Our final look (for now) is when using pointers to traverse something (variable or heap allocation) and you accidentally run off the end of it. With a smart enough preprocessor, it might be possible to insert code to check where you are within the block of memory, and stop things if you’ve gone too far and are in foreign territory. However, in most cases, the (possibly substantial) overhead may be deemed not worth it, and you have to rely on careful coding to avoid such problems.

Some of the things discussed here could be implemented with macros and/or function wrappers, some might be handled with a new preprocessor pass, and still others just require careful attention to detail. You could simply be aware of all these pitfalls, and print yourself up a checklist of things to look for at some point in the desk check or formal code review.


Posted on 2017-May-08 at 11:14:25 by Phil

A preprocessor might call a check routine at the end of scope for a declared pointer value, and issue a run time warning if a non-NULL pointer is about to go out of scope (i.e., become a memory leak). This could also be done manually, but would depend upon the vigilance of the programmer (remember to test all pointers). For heap pointers, they should be NULL at exit, having either been freed, or copied back up to the caller (preserved) and set to NULL (not freed, though) in this routine (to avoid tripping the leak test). If the heap pointer itself is a parameter, it may need a flag to tell the preprocessor not to worry about the block ending with a non-NULL value. For other pointers, it may or may not matter that they are going out of scope with values.

The safest implementation would be to always make such a check, but it would not be unreasonable (for performance reasons) to only do this during testing and debugging, and hope that all problems have been caught by this time!


Posted on 2018-Oct-05 at 22:11:31 by Phil

Here is a proposal for safe pointer usage that could involve macros, subroutines, language extensions, and/or “lint”-like checking. Naturally, the less that is done at run time, the less speed penalty there is for checking, and the less that needs programmer cooperation to put in and use special routines, the more likely it is that the right things will be done.

The first thing is to separate out the heap pointers (allocated blocks) from other pointer usage. These will be special pointers, in the sense that they are very restricted in their usage, to avoid getting into trouble. It might have its own name, such as HeapPointer. A heap pointer should only be used for one thing: keeping track of an allocated heap block. If it is a macro, it would translate to a type of void * — no other types are allowed (this restriction might be relaxed, if the macro or external preprocessing permits passing in an arbitrary type). When declared, it must always be initialized to NULL. Only two operations are permitted:

  • malloc (also calloc, realloc) via a subroutine or macro — which checks that the current value is NULL, and if it is, runs the normal allocation. A heap pointer currently pointing to something (non-NULL) is a fatal error because this would leave a memory leak (the current block is lost and is floating around unused and unrecoverable). A realloc call would not have this restriction. It is assumed that the old block is destroyed after its contents are copied to the new block, or the old block is extended.
  • free — if non-NULL the normal free is done, and the heap pointer is then set to NULL. It then may be reused, if desired. It may be good to make free-ing a NULL HeapPointer a warning or error of some sort, as it may indicate a logic problem in the program. Ensuring that a HeapPointer is set to NULL when freed prevents dangling references (a pointer that claims to still point to something, when that memory block is now gone) and enables checking that a HeapPointer about to be allocated isn’t already pointing to something (memory leak about to happen).

A HeapPointer would be ideal for an object, if a language permits it, as the limited suite of operations can be implemented as methods, and the value hidden (a “get” or “copy” method would be needed so that the value could be copied to a working pointer). A HeapPointer may never be assigned to, outside of an allocation routine/macro, or a free. It should not be tampered with because of the risk of losing contact with the block of allocated memory on the heap.

In addition to allocation and freeing, a HeapPointer gets special treatment upon going out of scope (end of block, exit, return, long jump, etc.). At that time, if it has a non-NULL value, it must be freed unless the programmer has somehow informed the checking system that the pointer has been transferred to a parameter, and is being returned to a higher level routine. Perhaps this could be done with a HeapPointerCopy routine to copy a HeapPointer to a parameter. Freeing memory blocks (unless the pointer to a block is being transferred to another routine) at out-of-scope should prevent more memory leaks. Ideally, the parameter at a higher (or lower??) level routine should somehow be tagged as a HeapPointer, so the same rules about allocating and freeing can be applied. Unfortunately, this could really complicate things, unless a full analysis of data flow across multiple routines (and possibly multiple files) is done to check that HeapPointers are handled with absolute safety (a “lint”-like program).

Once heap pointers have been dealt with, what’s left are ordinary working pointers. Other than not being permitted to be allocated or freed (no heap blocks), it’s difficult to analyze and restrict their behavior. Simply assigning the address of a variable (e.g., ptr = &someVar;) gets you into trouble with someVar going out-of-scope but  ptr is still being used somewhere, possibly returned to a calling routine! This is somewhat the opposite to a HeapPointer — what the pointer is pointing to is nonpersistent (and the pointer is possibly persistent) rather than the other way around. If the pointer is (initially) a copy of a HeapPointer, that heap block may be automatically freed even if the working pointer is returned to another routine! This would still take some care on the part of the programmer, and might not be possible to automatically detect.

Working pointers are still subject to dangling references, if the underlying heap block is freed. It might be possible to tie a set of working pointers to a HeapPointer, in effect saying, “these pointers are going to be working in an allocated heap block pointed to by this HeapPointer. If that block is freed, automatically NULL out all these working pointers.” That in turn could cause its own failures, but at least there would be less of a chance of invalid pointers continuing to be used. It should be OK for a working pointer to go out of scope (and itself disappear) while holding a non-NULL value, as it’s not responsible for a heap block.

Finally, there is the age-old problem with pointers of running off either end of the data they’re pointing to (such as an array). The only way to really keep this under control is to know in advance what the pointer is playing with and its size, and check the pointer against its current bounds. This bound checking has been available for a long time, but usually requires programmer cooperation, and has to execute at run time (a large time penalty). A related problem is buffer overflow, where data can be written beyond the extent of the space intended to receive it, but it will not be covered here, as the tools to prevent it are almost always available.

A language with proper heap and pointer management built right in, without the programmer having to remember to clean up after themselves, might well be a better deal than using something like C. A “lint” to analyze pointer usage across routines and files (if it’s even possible) would be helpful, but providing tools and procedures for a careful programmer to follow would be better than nothing at all. Even providing pointers as objects, with limited operations and bound checking built into the methods, would be an improvement, as a language like C provides so much rope for you to hang yourself with. If you’re stuck with C (and no objects), about the best you can do is use some macros, preprocessors, limited check utilities, and a lot of care in writing code to properly use these facilities!


  • [1]  Much of this probably also applies to other languages which expose low-level pointer capabilities, such as C++.
  • [2]  All items in an argument list are passed as values on the stack, including pointer addresses. The called function simply sees them more or less as local variables on its stack, and may modify them (write back to its parameter list). However, C does not copy back the values to the caller! They are simply local variables, and disappear when the function returns. Therefore, it is necessary to pass addresses as values, and declare them on the function's end to be pointers, so that the code ends up writing to the given address (presumably somewhere safely above), rather than just to the local variable/parameter.
 

All content © copyright 2005 – 2022 by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this site remains under the copyright and license of its owners. Catskill Technology Services, LLC does not claim copyright over such software.

 

This page is https://www.catskilltech.com/point-discipline-in-c.html

Search Quotations database.

Last updated Mon, 18 Apr 2022 at 11:03 PM

Valid HTML 5

Sat, 24 Sep 2022 at 6:24 PM EDT