Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous. You can also use this link to report any problems registering or logging in.

Redaction and Watermarking

  • 0 Replies

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 437
    • View Profile
Redaction and Watermarking
« August 10, 2018, 01:10:36 PM »
I was recently reminded about attempts to watermark images (so they can't be easily stolen from a website), which in turn reminded me of failed attempts to redact information from online documents. In both cases, the problem is that the original information (image content or document) is not destroyed, but merely covered over by another layer (e.g., watermark text or a black bar). In such cases, with a little effort, the overlay information layer can be removed and the original image or text seen.

With an image to be modified (such as with a watermark or censor's mark/redaction), you need to be careful that the new information is not carried in a separate layer, but is thoroughly "baked in" to the image, completely replacing whatever the original content was. You don't want to "mix" in the new information — you want to completely replace the old content at that point. Mixing in the new information (at anything short of full opacity) leaves residue that might be extractable with image-processing tools. What happens to the image, and whether it carries the watermark or redaction in a separate layer, depends on the specific image format and image editor used. Some may not permit layers to be saved in an image format that most browsers can view, but be sure to test in a representative sample! At the least, understand whether your chosen image format (JPEG, GIF, PNG, TIFF, etc.) is capable of holding image data in separate layers. If your editor asks permission to merge all layers down to one when saving the image, it probably isn't going to preserve layers, but you should never make assumptions without doing some research.

Redacting a text document is a bigger problem. In plain text (e.g., a .txt file), about the only thing you can do is to replace the offending text with XXX's or spaces. In any format more complex than that, there is the possibility that new material, or material of a different format (graphics vs. text) may be carried in a separate layer that could easily be stripped out. For example, say your original document to be redacted is PDF. You go into some PDF editor and draw a filled box over the offending information, save it, and call it a day. Well, that box is probably drawn as a graphics object over the text object holding the text content. The box could be fairly easily removed, or even the original text read in a text editor! Such a stream is likely to be compressed, but there are plenty of tools to restore it to readable condition. This has been done enough times that everyone should be aware of the trick by now, but it seems that some people never get the memo. Anyway, the most foolproof method would be to convert the redacted document to an image, checking that everything is merged down into a single layer, and no information is recoverable (such as through image processing). A close second would be to print the document (either already redacted or not), physically redact if necessary, and scan it back in to a new document of any format.

This problem even shows up in the analog world. Think of "bleeping out" an obscenity or sensitive information from an audio track. This basically depends on mixing in a loud signal that overwhelms the original content, hopefully driving the recording into a very non-linear area. Since the "bleep" is typically a repeating waveform (even if not a pure sine wave), in theory it's difficult but not impossible to generate a negative of that waveform that could cancel out the "bleep", leaving whatever remains of the original signal. It may be very degraded due to non-linearities, but something usable may still be there. The only sure-fire way to remove audio is to first erase the original content (say, about 18-1/2 minute's worth :) ) and then (if you wish), record the bleep tone over that. This could be done in one step by an A-B switch to record (on fresh media) either the original content or a bleep tone, eliminating the risk that some magnetic signature could remain from the erased content.

With printed documents or photos, if black ink or something like Wite-Out or Liquid Paper is applied, that will not destroy the content beneath it. With enough skill and technology (perhaps nothing more sophisticated than a razor blade!), the obscuring layer can be removed. In such a case, you should photocopy the document afterwards, and release that, so that there is no trace of the original (hidden) information. Just make sure the original information is thoroughly covered by the ink or other covering, so nothing shows through. Be on the lookout for something like black ink over blue handwriting — how thoroughly is that covered, or is it recoverable through image processing?