Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

Null-terminated strings in C

  • 1 Replies
  • 368 Views
*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 751
    • View Profile
Null-terminated strings in C
« October 17, 2019, 01:05:45 PM »
In 1972, Dennis Ritchie defined a string in the C language as an array of bytes (characters), until you encounter a byte of value 0 (null byte) to terminate the string. Multibyte character systems may (e.g., UTF-8) or may not (e.g., UTF-16) be careful to avoid 0 bytes that would prematurely end the string. Arbitrary binary data that may include a 0 byte must be avoided, at least if you are going to use the standard string functions, which are looking for a null byte to end the string. There is no reason that a string (including the null terminator) has to completely fill the allocated array, but it certainly cannot be any longer.

This brings up the problem of trying to fill a string with more data than was allocated to the array in the first place. Particularly insidious is forgetting to account for the null terminator byte when sizing the array, and ending up writing the null byte one beyond the end of the array, where who knows what it's overwriting. A compiler initializing a string may notice this, but don't count on it! You may see something like
Code: [Select]
  #define STRINGLEN  100  /* 100 bytes of string content */
    ...
  char MyString[STRINGLEN+1];  /* space for terminator */
    ...
  strncpy(MyString, SrcString, STRINGLEN);  /* max STRINGLEN including \0 */

An even worse problem is using string functions (such as byte copies or string concatenations) that fail to check whether the array is long enough to hold the desired string (again, including that null terminator). There was really no excuse to define any string functions that fail to know about the array length, but that's what was done. You're taking a long walk off a short pier! The all-too-common result is the bytes of a string being written past the end of the character array, overwriting other data or even code. This has been exploited many times in buffer overflow attacks.

In the above code snippet, note the use of STRINGLEN in string calls to try to avoid overflow problems. It's better than nothing… but can still leave you with an unterminated string (no 0 byte) if SrcString has more than STRINGLEN-1 bytes of data (before its terminating null byte)!
Code: [Select]
  MyString[STRINGLEN] = '\0';
might be added after the strncpy() to take care of that problem (or anywhere before, on the assumption that it won't be overwritten by a string operation). Note that the index is not STRINGLEN+1, as that would be beyond the end of the array! Even with this fix, one character (byte) might be lost in making this a proper string.

The best solution would be an object to hold the array of bytes, along with the current array length and perhaps the current string length (less the null terminator). However, if you're working in C, it's likely that you don't have real objects, and at best, have to manually cart around the associated lengths and make sure you don't accidentally overwrite them, as well as avoiding the blind use of most string functions in the standard library. Perhaps a pseudo-object can be placed on the heap, with a single pointer to the byte array and associated data (lengths). There could be wrapper functions around all naïve native string functions, that would first check if there is sufficient array space to hold the end result. There's no harm done in manually tracking the actual length of the string (provided that your wrapper function updates it) and keeping a terminating null for the use of standard library functions.

Naturally, introducing additional checking like this will slow down the code, but may be worth it to avoid nasty buffer overflow errors. If you're writing in C in the first place, it's likely for the raw performance needed for real-time data processing (e.g., video conversion), and you can't afford a lot of sanity checking. In that case, it may be a worthwhile tradeoff to develop the code using macros and functions that do a lot of checking and verification, and then (for production) switch to lighter-weight macros that don't do such checking, and hope that your thorough code testing has found all the problem areas! Your debug/development code might even issue a run-time warning if switching to the faster (unchecked) code could produce a buffer overflow.

*

Offline Phil

  • Global Moderator
  • Hero Member
  • *****
  • 751
    • View Profile
Re: Null-terminated strings in C
« Reply #1: November 02, 2020, 12:56:57 PM »
With buffer overflow attacks in the news almost every day, matters have come to a head. It is time for everyone to program defensively, whether they're in C or in any other language with similar behavior in strcpy and strcat like functions, or functions calling these C library functions.

First of all, convert all strcpy and strcat (or like) calls to strncpy and strncat. You really ought to know what the maximum length is (including the terminating null) of any array you're writing into! This is a major reason that Object-Oriented programming can be so nice, if used correctly — it already carries around the length of a string as part of its internal data, and can do the length checking for you.

Second, you need to make sure that the target is always a proper string after the write, that is, with a terminating NULL byte. strncpy and strncat do not, by themselves, write a NULL byte if they hit the limit. You can handle this in one of three ways, where the buffer in question is STRINGLEN+1 bytes long:
  • Have the discipline to write '\0' to array[STRINGLEN] (or some earlier position) after each write (strncpy or strncat).
  • Have the discipline to use macros or local functions for copying and concatenating strings, which inside use strncpy/strncat and writes the NULL for you.
  • Have the discipline to use STRINGLEN in the strncpy or strncat call, and write the terminating NULL at the last position up at the top of the program. There might be another NULL earlier in the string, but the one at the end is there as a safety stop, and should never be touched. Perhaps you can use a macro to declare the string and initialize it with NULL bytes at [0] and [STRINGLEN]?
Any way you choose to do this requires some discipline to not do the lazy thing and just "this one time can't hurt" call strcpy or strcat. Until C library suppliers get their act together and remove strcpy and strcat from the library, you should consider making your own dummy stubs in a library that comes ahead of the standard libraries. They would simply print a message to STDERR that informs you that you called strcpy or strcat, and then die. At least, this would compel you to fix that oversight!

Now comes the tricky part: strncat checks only the length of the incoming string (being added to the target string), and not the current length of the target string (the one being added to)! Therefore, it is still possible to overflow the target string. You need to figure out the maximum length to copy (for concatenation) from the target string's current and maximum length, and (ideally) the current length of the source string (being concatenated to the target). The idea is that you don't want to overfill your target string, past its safety stop. Again, a macro or function might help with this.

Be aware that if your strcpy or strcat should overflow the target buffer string, the damage has been done: something valuable may already have been overwritten with malicious code! Inserting that NULL terminating byte at the very end of the buffer isn't going to undo the damage. It will merely turn this string into a proper (NULL terminated) string fitting within its array — it won't restore the content beyond it. Therefore it is important to avoid writing past the end of the buffer in the first place!

Finally, there is the question about what to do (if anything) about a source string that lacks a terminating NULL byte. It is conceivable that in such a situation that too much data can be copied, including possibly sensitive material beyond the proper end of the string. If you have been careful about your string lengths and using strncpy and strncat, you probably won't run into a corrupted input string that has lost its terminating NULL byte for some reason (but never say never!).
« Last Edit: November 02, 2020, 01:03:58 PM by Phil »