No matter how much cats fight, there always seems to be plenty of kittens.
— Abraham Lincoln
Posted on 2019-Oct-17 at 13:05:45 by Phil
In 1972, Dennis Ritchie defined a string in the C language as an array of bytes (characters), until you encounter a byte of value 0 (null byte) to terminate the string. Multibyte character systems may (e.g., UTF-8) or may not (e.g., UTF-16) be careful to avoid 0 bytes that would prematurely end the string. Arbitrary binary data that may include a 0 byte must be avoided, at least if you are going to use the standard string functions, which are looking for a null byte to end the string. There is no reason that a string (including the null terminator) has to completely fill the allocated array, but it certainly cannot be any longer.
This brings up the problem of trying to fill a string with more data than was allocated to the array in the first place. Particularly insidious is forgetting to account for the null terminator byte when sizing the array, and ending up writing the null byte one beyond the end of the array, where who knows what it’s overwriting. A compiler initializing a string may notice this, but don’t count on it! You may see something like
An even worse problem is using string functions (such as byte copies or string concatenations) that fail to check whether the array is long enough to hold the desired string (again, including that null terminator). There was really no excuse to define any string functions that fail to know about the array length, but that’s what was done. You’re taking a long walk off a short pier! The all-too-common result is the bytes of a string being written past the end of the character array, overwriting other data or even code. This has been exploited many times in buffer overflow attacks.
In the above code snippet, note the use of STRINGLEN
in string
calls to try to avoid overflow problems. It’s better than nothing…
but can still leave you with an unterminated string (no 0 byte) if
SrcString
has more than STRINGLEN-1 bytes of data (before its
terminating null byte)!
might be added after the strncpy() to take care of that problem (or anywhere before, on the assumption that it won’t be overwritten by a string operation). Note that the index is not STRINGLEN+1, as that would be beyond the end of the array! Even with this fix, one character (byte) might be lost in making this a proper string.
The best solution would be an object to hold the array of bytes, along with the current array length and perhaps the current string length (less the null terminator). However, if you’re working in C, it’s likely that you don’t have real objects, and at best, have to manually cart around the associated lengths and make sure you don’t accidentally overwrite them, as well as avoiding the blind use of most string functions in the standard library. Perhaps a pseudo-object can be placed on the heap, with a single pointer to the byte array and associated data (lengths). There could be wrapper functions around all naïve native string functions, that would first check if there is sufficient array space to hold the end result. There’s no harm done in manually tracking the actual length of the string (provided that your wrapper function updates it) and keeping a terminating null for the use of standard library functions.
Naturally, introducing additional checking like this will slow down the code, but may be worth it to avoid nasty buffer overflow errors. If you’re writing in C in the first place, it’s likely for the raw performance needed for real-time data processing (e.g., video conversion), and you can’t afford a lot of sanity checking. In that case, it may be a worthwhile tradeoff to develop the code using macros and functions that do a lot of checking and verification, and then (for production) switch to lighter-weight macros that don’t do such checking, and hope that your thorough code testing has found all the problem areas! Your debug/development code might even issue a run-time warning if switching to the faster (unchecked) code could produce a buffer overflow.
Posted on 2020-Nov-02 at 12:56:57 by Phil
Last update on 2020-Nov-02 at 13:03:58 by Phil
With buffer overflow attacks in the news almost every day, matters have come
to a head. It is time for everyone to program defensively, whether
they’re in C or in any other language with similar behavior in
strcpy-
and strcat-
like functions, or functions
calling these C library functions.
First of all, convert all strcpy
and strcat
(or like) calls to strncpy
and strncat
. You
really ought to know what the maximum length is (including the
terminating null) of any array you’re writing into! This is a major
reason that Object-Oriented programming can be so nice, if used correctly
— it already carries around the length of a string as part of its internal
data, and can do the length checking for you.
Second, you need to make sure that the target is always a proper
string after the write, that is, with a terminating NULL byte.
strncpy
and strncat
do not, by themselves,
write a NULL byte if they hit the limit. You can handle this in one of three
ways, where the buffer in question is STRINGLEN+1
bytes long:
'\0'
to array[STRINGLEN] (or some
earlier position) after each write (strncpy
or
strncat
).
strncpy
/strncat
and writes the NULL for you.
STRINGLEN
in the
strncpy
or strncat
call, and write the
terminating NULL at the last position up at the top of the program. There might
be another NULL earlier in the string, but the one at the end is there as a
safety stop, and should never be touched. Perhaps you can use a macro to
declare the string and initialize it with NULL bytes at [0]
and [STRINGLEN]?
Any way you choose to do this requires some discipline to not do the lazy
thing and just “this one time can’t hurt” call
strcpy
or strcat
. Until C library suppliers get their
act together and remove strcpy
and strcat
from
the library, you should consider making your own dummy stubs in a library that
comes ahead of the standard libraries. They would simply print a message to
STDERR that informs you that you called strcpy
or
strcat
, and then die. At least, this would compel you to fix that
oversight!
Now comes the tricky part: strncat
checks only the length of the
incoming string (being added to the target string), and not the current
length of the target string (the one being added to)! Therefore, it is still
possible to overflow the target string. You need to figure out the
maximum length to copy (for concatenation) from the target string’s
current and maximum length, and (ideally) the current length of the source
string (being concatenated to the target). The idea is that you don't want to
overfill your target string, past its safety stop. Again, a macro or function
might help with this.
Be aware that if your strcpy
or strcat
should
overflow the target buffer string, the damage has been done: something valuable
may already have been overwritten with malicious code! Inserting that NULL
terminating byte at the very end of the buffer isn’t going to undo the
damage. It will merely turn this string into a proper (NULL terminated) string
fitting within its array — it won’t restore the content beyond it.
Therefore it is important to avoid writing past the end of the buffer in the
first place!
Finally, there is the question about what to do (if anything) about a
source string that lacks a terminating NULL byte. It is conceivable that
in such a situation that too much data can be copied, including possibly
sensitive material beyond the proper end of the string. If you have been careful
about your string lengths and using strncpy
and
strncat
, you probably won’t run into a corrupted input
string that has lost its terminating NULL byte for some reason (but never say
never!).
All content © copyright 2005 – 2024
by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this
site remains under the copyright and license of its owners.
Catskill Technology Services, LLC does not claim copyright over such software.
This page is https://www.catskilltech.com/utils/show.php?link=null-terminated-strings-in-c
Search Quotations database.
Last updated Mon, 07 Oct 2024 at 2:18 PM