Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

Heap variables in C

  • 2 Replies
  • 2587 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 430
    • View Profile
Heap variables in C
« March 01, 2017, 10:00:05 AM »
One eternal problem when programming in the C language is properly handling pointers into the heap (dynamically allocated memory). It's very easy, if you're not absolutely anal about checking your code, to use unallocated pointers and to fail to free memory when done with it (or to free it too early). The use of three macros and/or library routines would help with this situation:

  • Heap variable (pointer) declarations: don't simply declare them like any other variable, but also force them to be initialized to NULL. The purpose is to make sure a pointer doesn't have non-NULL garbage in it if it should be used too early, that would fool the runtime into not flagging it.
  • Wrapper around malloc/calloc calls: first check if the pointer variable being written to is NULL. The purpose is to avoid accidentally overwriting a pointer already in use, resulting in the loss (memory leak) of whatever it was pointing to. Only NULL pointers should be written to. It may also be possible to automatically cast the returned pointer to the same type as in the declaration, but that is left as an exercise for the reader.
  • Freeing the pointer: do not call free() directly, but in some sort of wrapper, so that after the heap memory is freed, the pointer variable is set to NULL. This will permit the pointer to be reused and not flagged as "still in use" by the *alloc call.

Numbers 2 and 3 are pretty easy to implement as C routines, but number 1 is a macro that I never really got the hang of getting to work in a really general manner.  It would be nice to keep track of the pointer declaration (type), so that the *alloc call could automatically cast the pointer to the right type, but that may be beyond the regular C preprocessor. Perhaps a new preprocessor could be written that could handle this, and other improvements?

Update: Expanded into an article on pointer usage. This topic is the discussion for that article.
« Last Edit: March 11, 2017, 05:37:52 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 430
    • View Profile
Re: Heap variables in C
« Reply #1: May 08, 2017, 11:14:25 AM »
A preprocessor might call a check routine at the end of scope for a declared pointer value, and issue a run time warning if a non-NULL pointer is about to go out of scope (i.e., become a memory leak). This could also be done manually, but would depend upon the vigilance of the programmer (remember to test all pointers). For heap pointers, they should be NULL at exit, having either been freed, or copied back up to the caller (preserved) and set to NULL (not freed, though) in this routine (to avoid tripping the leak test). If the heap pointer itself is a parameter, it may need a flag to tell the preprocessor not to worry about the block ending with a non-NULL value. For other pointers, it may or may not matter that they are going out of scope with values.

The safest implementation would be to always make such a check, but it would not be unreasonable (for performance reasons) to only do this during testing and debugging, and hope that all problems have been caught by this time!

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 430
    • View Profile
Re: Heap variables in C
« Reply #2: October 05, 2018, 10:11:31 PM »
Here is a proposal for safe pointer usage that could involve macros, subroutines, language extensions, and/or "lint"-like checking. Naturally, the less that is done at run time, the less speed penalty there is for checking, and the less that needs programmer cooperation to put in and use special routines, the more likely it is that the right things will be done.

The first thing is to separate out the heap pointers (allocated blocks) from other pointer usage. These will be special pointers, in the sense that they are very restricted in their usage, to avoid getting into trouble. It might have its own name, such as HeapPointer. A heap pointer should only be used for one thing: keeping track of an allocated heap block. If it is a macro, it would translate to a type of void * — no other types are allowed (this restriction might be relaxed, if the macro or external preprocessing permits passing in an arbitrary type). When declared, it must always be initialized to NULL. Only two operations are permitted:
  • malloc (also calloc, realloc) via a subroutine or macro — which checks that the current value is NULL, and if it is, runs the normal allocation. A heap pointer currently pointing to something (non-NULL) is a fatal error because this would leave a memory leak (the current block is lost and is floating around unused and unrecoverable). A realloc call would not have this restriction. It is assumed that the old block is destroyed after its contents are copied to the new block, or the old block is extended.
  • free — if non-NULL the normal free is done, and the heap pointer is then set to NULL. It then may be reused, if desired. It may be good to make free-ing a NULL HeapPointer a warning or error of some sort, as it may indicate a logic problem in the program. Ensuring that a HeapPointer is set to NULL when freed prevents dangling references (a pointer that claims to still point to something, when that memory block is now gone) and enables checking that a HeapPointer about to be allocated isn't already pointing to something (memory leak about to happen).
A HeapPointer would be ideal for an object, if a language permits it, as the limited suite of operations can be implemented as methods, and the value hidden (a "get" or "copy" method would be needed so that the value could be copied to a working pointer). A HeapPointer may never be assigned to, outside of an allocation routine/macro, or a free. It should not be tampered with because of the risk of losing contact with the block of allocated memory on the heap.

In addition to allocation and freeing, a HeapPointer gets special treatment upon going out of scope (end of block, exit, return, long jump, etc.). At that time, if it has a non-NULL value, it must be freed unless the programmer has somehow informed the checking system that the pointer has been transferred to a parameter, and is being returned to a higher level routine. Perhaps this could be done with a HeapPointerCopy routine to copy a HeapPointer to a parameter. Freeing memory blocks (unless the pointer to a block is being transferred to another routine) at out-of-scope should prevent more memory leaks. Ideally, the parameter at a higher (or lower??) level routine should somehow be tagged as a HeapPointer, so the same rules about allocating and freeing can be applied. Unfortunately, this could really complicate things, unless a full analysis of data flow across multiple routines (and possibly multiple files) is done to check that HeapPointers are handled with absolute safety (a "lint"-like program).

Once heap pointers have been dealt with, what's left are ordinary working pointers. Other than not being permitted to be allocated or freed (no heap blocks), it's difficult to analyze and restrict their behavior. Simply assigning the address of a variable (e.g., ptr = &someVar;) gets you into trouble with someVar going out-of-scope but  ptr is still being used somewhere, possibly returned to a calling routine! This is somewhat the opposite to a HeapPointer — what the pointer is pointing to is nonpersistent (and the pointer is possibly persistent) rather than the other way around. If the pointer is (initially) a copy of a HeapPointer, that heap block may be automatically freed even if the working pointer is returned to another routine! This would still take some care on the part of the programmer, and might not be possible to automatically detect.

Working pointers are still subject to dangling references, if the underlying heap block is freed. It might be possible to tie a set of working pointers to a HeapPointer, in effect saying, "these pointers are going to be working in an allocated heap block pointed to by this HeapPointer. If that block is freed, automatically NULL out all these working pointers." That in turn could cause its own failures, but at least there would be less of a chance of invalid pointers continuing to be used. It should be OK for a working pointer to go out of scope (and itself disappear) while holding a non-NULL value, as it's not responsible for a heap block.

Finally, there is the age-old problem with pointers of running off either end of the data they're pointing to (such as an array). The only way to really keep this under control is to know in advance what the pointer is playing with and its size, and check the pointer against its current bounds. This bound checking has been available for a long time, but usually requires programmer cooperation, and has to execute at run time (a large time penalty). A related problem is buffer overflow, where data can be written beyond the extent of the space intended to receive it, but it will not be covered here, as the tools to prevent it are almost always available.

A language with proper heap and pointer management built right in, without the programmer having to remember to clean up after themselves, might well be a better deal than using something like C. A "lint" to analyze pointer usage across routines and files (if it's even possible) would be helpful, but providing tools and procedures for a careful programmer to follow would be better than nothing at all. Even providing pointers as objects, with limited operations and bound checking built into the methods, would be an improvement, as a language like C provides so much rope for you to hang yourself with. If you're stuck with C (and no objects), about the best you can do is use some macros, preprocessors, limited check utilities, and a lot of care in writing code to properly use these facilities!