CTS logo
hazy blue Catskill Mountains in distance

PDF::Builder v3.010 Released, 20 August 2018

Please see the Free Software section, as well as the PDF::Builder section of the forum.


A Thought…

With the qualities of cleanliness, affection, patience, dignity, and courage that cats have, how many of us, I ask you, would be capable of becoming cats?

   — Fernand Mery

Pointer Discipline in C

No, this has nothing to do with BDSM, whips with pointy tips, and the such. If you came here expecting such a discussion, sorry, but this is a lot more boring! The C language is happy to provide a million ways to hang yourself, and provides plenty of rope to do it[1]. The price of high performance is low run-time overhead, and C assumes that you are a grownup who knows what you are doing. Unfortunately, most programmers don't quite know what they're doing, especially when it comes to manipulating pointers. This leads to many, often subtle, run-time problems that could be avoided had proper care been taken.

This is not to say that most programmers are incompetent; it's just that they have never been taught the fine points of pointer usage. This article will attempt to deal with that problem with the discussion of a number of trouble spots.

Some of these problems can be avoided by the use of macros and/or function wrappers to protect the programmer, some may require a new preprocessor step, and some may just be too restrictive to warrant anything beyond a checklist of things to watch out for. People work in C because they want blazing speed, and are leery of adding overhead for the purpose of protecting themselves. Fair enough, but one way or the other, you need to be aware of these things and to avoid falling into various traps.

What Kinds of Problems?

Misuse of pointers falls into three main categories:

  1. Leaving dangling references when the target of the pointer has gone away in one way or another (e.g., you still have a pointer to a heap variable even after you've freed the allocation). The danger here is that you use that pointer reference again without realizing that it no longer points to what you thought it did, and end up reading or writing memory that you didn't intend to. As the pointer address (what it's pointing to) likely belongs to your process or program, run-time support is unlikely to flag it.
  2. Causing memory leaks by destroying or changing a pointer too soon, while it's still pointing to something (especially something on the heap). Essentially, you lose contact with your targeted memory. At that point, with no-one responsible for it, that memory just sits there used and unavailable. The heap manager isn't aware that this memory is no longer needed, and can't recover it (there is no garbage collection in C). Usually, this memory leak is finally recovered when the program or process ends, but in the meantime, multiple memory leaks can put quite a bit of heap space (and virtual memory) out of action.
  3. Running off the end of a string or array, and reading or writing where you shouldn't be doing so. This is related to the buffer overflow problem where a user input is not length-constrained, and can overwrite memory well beyond the allocated buffer (whether on heap or a variable). This is a favorite exploit by hackers, as they can figure out where to put executable code in their input, so that it will end up be executed (overwrites part of the code segment).

Heap Allocation

The most common cause of dangling references and memory leaks is improper handling of pointers in heap allocation. There are three safeguards you can either implement in macros/functions, or just remember to check off on your list:

  1. Variables declared as pointers should always be initialized to some reasonable initial value, no matter how they will be used. If they're going to be set to point to some object, you can do that in the declaration, if you wish. Otherwise, all pointers should be initialized to NULL. A NULL value tells the run-time support that this pointer has not been set to point to anything, and should be flagged as an error. If you don't initialize a pointer, and it has some non-zero garbage in it, run-time may or may not pick it up (Segment Violation or Bus Error). If you're unlucky, it points to somewhere in your own data (read/write memory) and won't be caught, resulting in a difficult-to-debug error. So, to summarize, all pointers should have either a NULL value, or point to a real target, at all times.
  2. As a corollary to the previous rule, all pointer variables no longer in use should be reset to NULL. This must be done immediately after a free() call, or anything else that removes the targeted memory from use. The pointer then is visible to run-time support as invalid, ready to be reused.
  3. Before allocating a block of the heap (malloc(), calloc(), etc.), check that the pointer variable that will receive the address is NULL. This indicates that it is currently unused, and therefore safe to overwrite. If you should overwrite a pointer variable that has a non-NULL value, especially if you're following the first two rules, there's a good chance that you just orphaned a heap block and created a memory leak. So, by only allowing NULL pointer variables to be written to, you reduce the chances of losing heap allocations.

Here is a bit of code illustrating macros or function wrappers (the choice is yours) to handle heap pointers:

HEAP int *ptrA ⋮ MALLOC(ptrA, 'int *', 'int', 1024) /* ptrA will be 1024 integers */ ⋮ /* READ ptrA (never write) for various things */ ⋮ /* done with ptrA */ FREE(ptrA)

And here is how it might expand (macros) or be hand-written by a careful coder:

/* heap pointer */ int *ptrA = NULL; ⋮ if (ptrA != NULL) fatal_error("memleak", __FILE__, __LINE__); ptrA = (int *)malloc(1024 * sizeof(int)); /* ptrA will be 1024 integers */ if (ptrA == NULL) fatal_error("mallocfail", __FILE__, __LINE__); ⋮ /* READ ptrA (never write) for various things */ ⋮ /* done with ptrA */ free(ptrA); ptrA = NULL;

Notice that ptrA is initialized to NULL, so that if it is accidentally used before being assigned, the run-time will choke on it. Before being assigned by malloc(), it is confirmed to still be NULL (preventing a memory leak). The macro or wrapper checks that the malloc() was successful. When we're done with this heap memory use, not only is the memory freed, but ptrA is reset to NULL, so you can't accidentally keep using it to point to that now-gone allocation (preventing a dangling reference). Also, being NULL, it is eligible to be assigned again by another allocation.

Pointers in Deeper Blocks

Unfortunately, it is not uncommon to see code where a pointer has been assigned in a deeper block (either a function call, or a nested { } code block) and not handled properly. This can cause two problems:

  1. The pointer is to an automatic variable, living on the stack, and that pointer's target goes out of scope and gets overwritten when control returns from that block! The pointer works perfectly well while control is still within that block, but as soon as you leave the block, that stack-based variable may be overwritten. This error can be quite insidious, as the target may not be overwritten immediately, and run-time won't see anything wrong with a pointer within your stack (unless possibly it checks for a pointer outside the active stack). It's quite easy to accidentally do this, forgetting that the pointer's target's lifetime is not eternal. Basically, this is a dangling reference.
  2. The pointer is to an allocation on the heap, and you neglect to pass that pointer back to a less-deeply nested block before it goes out of scope. That heap allocation has now been orphaned, and is a memory leak. Don't forget, when setting up a function's parameters to return the pointer, that you have to pass the address of the pointer to the function, and not the pointer value itself[2]. A corollary to this is that you've allocated space on the heap for some purpose, but neglect to free() it before going out of scope.

The two cases above are sort of mirror images of the other — you either are successfully returning a pointer to something that no longer exists, or you are allocating something on the heap, and failing to return that pointer before it goes out of scope. It's very easy to have a heap allocation pointer, and forget to free it when returning from a block (we get used to having run-time take care of such things with automatic variables).

Overwriting Heap Allocation Pointers

Say you malloc() a block on the heap, and assign it to ptrA. While in the process of writing or reading that chunk of memory (e.g., by ptrA++ while traversing the array), you alter ptrA's value. Unless you have saved it, and restore it when done, ptrA is no longer valid as a heap pointer! At best, it will point to somewhere within that allocated block, and can't be used to free() the block later.

It is better to declare a scratch or working pointer to be modified in use, and be careful to preserve the original pointer as a read-only value. Unfortunately, you can't declare ptrA to be a const, as you still need to write to it several times after it's declared and initialized. This one will probably just require extra care in coding, rather than being able to rely on the compiler or some sort of lint to discover that you've written to this variable where and when you shouldn't have. Perhaps a new preprocessor could detect writes outside of HEAP, MALLOC/CALLOC, and FREE macros (e.g., ptrA =), but that's a lot of work and might not catch everything.

Pointer Aliases

The previous section discussed the use of alias pointers in lieu of modifying the primary pointer to a heap allocation. This also applies to pointer aliases to things like strings and array (not just heap allocations) — you want to be careful not to destroy the only pointer you have to the string or array. Sometimes it's easy enough to re-create the pointer, but if you're in another function or some deeper block (where the original item is not currently accessible), it may not be so easy. Then, you would want to create another pointer that you don't mind incrementing or decrementing, as you can always get back to the start with the original pointer.

Let's say you have your primary pointer to something, and you've created one or more alias pointers for working purposes (i.e., they will likely be changed during usage, such as walking through a string or array). What are the rules for these alias pointers?

  1. Like all other pointers, they should be initialized to NULL or to some real value, to prevent random garbage in them being interpreted as a valid pointer address.
  2. If they are an alias for a heap pointer, when the heap pointer is freed (and set to NULL), all aliases (modified or not) for that heap pointer should also be set to NULL (but not freed — that's already been done!).
  3. Aliases for pointers other than heap pointers should follow the lead of their primaries when it comes to resetting them to NULL or some other value when they are no longer of use. If control is returning or falling off the end of a block, a local pointer alias that is going out of scope doesn't need to be explicitly NULLed. Pointer aliases inherited from a higher-level block should be NULLed to indicate that they are free to be reassigned.

A new preprocessor might have an ALIAS macro and a POINTER macro to automatically initialize to NULL, and possibly to detect certain situations and insert code to check for them.

Buffer Overflows and the Like

Our final look (for now) is when using pointers to traverse something (variable or heap allocation) and you accidentally run off the end of it. With a smart enough preprocessor, it might be possible to insert code to check where you are within the block of memory, and stop things if you've gone too far and are in foreign territory. However, in most cases, the (possibly substantial) overhead may be deemed not worth it, and you have to rely on careful coding to avoid such problems.


Some of the things discussed here could be implemented with macros and/or function wrappers, some might be handled with a new preprocessor pass, and still others just require careful attention to detail. You could simply be aware of all these pitfalls, and print yourself up a checklist of things to look for at some point in the desk check or formal code review.

To read or post in this discussion, click here.


  • [1]  Much of this probably also applies to other languages which expose low-level pointer capabilities, such as C++.
  • [2]  All items in an argument list are passed as values on the stack, including pointer addresses. The called function simply sees them more or less as local variables on its stack, and may modify them (write back to its parameter list). However, C does not copy back the values to the caller! They are simply local variables, and disappear when the function returns. Therefore, it is necessary to pass addresses as values, and declare them on the function's end to be pointers, so that the code ends up writing to the given address (presumably somewhere safely above), rather than just to the local variable/parameter.
 

All content © copyright 2005 – 2018 by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this site remains under the copyright and license of its owners. Catskill Technology Services, LLC does not claim copyright over such software.

 

This page is https://www.catskilltech.com/Discussions/name/pointerdisp/dir/PointerDisp/discussion_full

Like a quotation you saw earlier, and wish you could see it again? Want to see if there are more quotes from a certain person? Enter the address https://www.catskilltech.com/?quote=term to find a quotation containing that term. Looking for a term of multiple words separated by spaces? Separate the words by +'s: Mark+Twain.

Last updated Sat, 30 Dec 2017 at 2:43 PM

Valid HTML 5

 

Valid CSS