CTS logo
hazy blue Catskill Mountains in distance


Give our new Discussions area a try!

PDF::Builder v3.024 Released, 12 September 2022
   Please see the CPAN listing, GitHub entry.

PDF::Table v1.003 Released, 05 July 2022
   Please see the CPAN listing, as well as the GitHub entry.

A Thought…

You can break the thermometer but you can’t change the weather.

   — Russian proverb

Redaction and watermarking

Posted on 2018-Aug-10 at 13:10:36 by Phil

I was recently reminded about attempts to watermark images (so they can’t be easily stolen from a website), which in turn reminded me of failed attempts to redact information from online documents. In both cases, the problem is that the original information (image content or document) is not destroyed, but merely covered over by another layer (e.g., watermark text or a black bar). In such cases, with a little effort, the overlay information layer can be removed and the original image or text seen.

With an image to be modified (such as with a watermark or censor’s mark/redaction), you need to be careful that the new information is not carried in a separate layer, but is thoroughly “baked in” to the image, completely replacing whatever the original content was. You don’t want to “mix” in the new information — you want to completely replace the old content at that point. Mixing in the new information (at anything short of full opacity) leaves residue that might be extractable with image-processing tools. What happens to the image, and whether it carries the watermark or redaction in a separate layer, depends on the specific image format and image editor used. Some may not permit layers to be saved in an image format that most browsers can view, but be sure to test in a representative sample! At the least, understand whether your chosen image format (JPEG, GIF, PNG, TIFF, etc.) is capable of holding image data in separate layers. If your editor asks permission to merge all layers down to one when saving the image, it probably isn’t going to preserve layers, but you should never make assumptions without doing some research.

Redacting a text document is a bigger problem. In plain text (e.g., a .txt file), about the only thing you can do is to replace the offending text with XXX’s or spaces. In any format more complex than that, there is the possibility that new material, or material of a different format (graphics vs. text) may be carried in a separate layer that could easily be stripped out. For example, say your original document to be redacted is PDF. You go into some PDF editor and draw a filled box over the offending information, save it, and call it a day. Well, that box is probably drawn as a graphics object over the text object holding the text content. The box could be fairly easily removed, or even the original text read in a text editor! Such a stream is likely to be compressed, but there are plenty of tools to restore it to readable condition. This has been done enough times that everyone should be aware of the trick by now, but it seems that some people never get the memo. Anyway, the most foolproof method would be to convert the redacted document to an image, checking that everything is merged down into a single layer, and no information is recoverable (such as through image processing). A close second would be to print the document (either already redacted or not), physically redact if necessary, and scan it back in to a new document of any format.

This problem even shows up in the analog world. Think of “bleeping out” an obscenity or sensitive information from an audio track. This basically depends on mixing in a loud signal that overwhelms the original content, hopefully driving the recording into a very non-linear area. Since the “bleep” is typically a repeating waveform (even if not a pure sine wave), in theory it’s difficult but not impossible to generate a negative of that waveform that could cancel out the “bleep”, leaving whatever remains of the original signal. It may be very degraded due to non-linearities, but something usable may still be there. The only sure-fire way to remove audio is to first erase the original content (say, about 18-1/2 minute’s worth :) ) and then (if you wish), record the bleep tone over that. This could be done in one step by an A-B switch to record (on fresh media) either the original content or a bleep tone, eliminating the risk that some magnetic signature could remain from the erased content.

With printed documents or photos, if black ink or something like Wite-Out or Liquid Paper is applied, that will not destroy the content beneath it. With enough skill and technology (perhaps nothing more sophisticated than a razor blade!), the obscuring layer can be removed. In such a case, you should photocopy the document afterwards, and release that, so that there is no trace of the original (hidden) information. Just make sure the original information is thoroughly covered by the ink or other covering, so nothing shows through. Be on the lookout for something like black ink over blue handwriting — how thoroughly is that covered, or is it recoverable through image processing?

Posted 2022-Mar-21 at 20:58:00 by Phil

There’s another media issue regarding redaction, and that is rewritable magnetic or optical media. Even if you think you’ve overwritten or erased a file, there may still be faint traces left of the original signal that (with very sophisticated equipment) can be read. This is most likely CIA- or NSA-level spook gear, but never say never. It may be necessary to physically destroy the media so that it can’t be read — even fragments of a disk might be read if someone’s determined enough, so burning or melting down the media might have to be done, not just drilling (or shooting) holes in it.

Needless to say, write-once media (particularly optical) is even more vulnerable, as there’s no easy way to overwrite or erase it. Of course, it should be even more obvious than usual that such media can’t simply be “erased”, so perhaps it is even safer in practice.


All content © copyright 2005 – 2022 by Catskill Technology Services, LLC.
All rights reserved.
Note that Third Party software (whether Open Source or proprietary) on this site remains under the copyright and license of its owners. Catskill Technology Services, LLC does not claim copyright over such software.


This page is https://www.catskilltech.com/redaction-and-watermarking.html

Search Quotations database.

Last updated Sat, 09 Apr 2022 at 4:46 PM

Valid HTML 5

Sat, 24 Sep 2022 at 6:51 PM EDT