Post without Account — your post will be reviewed, and if appropriate, posted under Anonymous.

CTS 7 - text and graphics objects

  • 12 Replies
  • 1764 Views
*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
CTS 7 - text and graphics objects
« April 10, 2017, 05:04:57 PM »
Note: discussion split out from RT 98576 bug

Regarding bullets 6 and (to a limited extent, 5) , I have updated the POD for lib/PDF/API2/Page.pm to reflect what appears to be going on with the interaction between PDF::API2's O-O implementation, and what PDF rendering does. Basically, PDF::API2 permits multiple graphics (gfx) and text (text) objects on one page, but there are a number of considerations to be aware of. PDF::API2 will group all output for one graphics or text object (regardless of how objects are intermixed on the page), and output it in one PDF object and stream. PDF::API2 objects (both graphics and text) are output in the order they are created on the page. In turn, PDF does not appear to reset the graphics and text states at the beginning of an object + stream, but allows the changes in one object + stream to be the entry state of the next stream. Finally, text shares many attributes with graphics (strokecolor, fillcolor, linewidth, linedash, etc.) and changes in one mode affect the following stream(s). Even if you have only one graphics and one text object, there still can be some unexpected interactions between the two. You may want to reset all your attributes at the beginning of either object, just so you are beginning from a known state.

This brings up a question: should PDF::API2 automatically insert PDF stream code to reset each (second and subsequent) stream to a known starting state, or should this be the responsibility of the programmer? Or, what about outputting a run of PDF stream output only until the object changes, and then creating a new PDF object and stream? Still, the naïve expectation would be that a setting in one object would not affect any other object, so the first suggestion might better match that. This could be an explicit $obj->reset() call, and not automatic. PDF::API2 might keep track of state changes in outputting a stream, and reset only those (rather than all possible ones).

Add: The save and restore methods apparently are only for the graphics stack. While they do not seem to produce any errors with a text method, nothing is inserted into the stream:
Code: [Select]
# text and gfx objects are output in the given order as objects/streams
$grfx1 = $page->gfx();
  $grfx1->save();           # start stream with q
$grfx2 = $page->gfx();
  $grfx2->restore();A       # start stream with Q q
  $grfx2->save();
$text1 = $page->text();
 #$text1->restore();        nop
 #$text1->save();           nop
$text2 = $page->text();
 #$text2->restore();        nop

Would this be sufficient? It seems to help with getting expected results in some brief tests, by restoring the graphics settings to default at the beginning of the $grfx2 stream. Further testing is needed to see what effect it has on text.
« Last Edit: April 12, 2017, 08:51:43 AM by Phil »

*

Offline sciurius

  • Jr. Member
  • **
  • 67
    • View Profile
    • Website
Re: CTS 7 - text and graphics objects
« Reply #1: April 11, 2017, 02:47:24 AM »
Any change in this area will be a big improvement, and break a lot of existing code.
Maybe it is worth considering new graphics and text objects that have better defined semantics?

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #2: April 11, 2017, 10:35:21 AM »
I'm open to specific improvements that won't break existing code (i.e., we would need to create new methods or add switches for changes to behavior). What I was thinking of doing was recommending that programmers do a restore() (except first object) and save() (except last object), and add text save() and restore() to do a graphics context save and restore, even for text (since text apparently is using the graphics state, although I want to do some more testing to confirm that). There is a text context, but there doesn't appear to be a built-in save and restore for it — maybe some code could be added to text save() and restore() to reset the text context to default?

The burden on the programmer might be eased by building this into PDF::API2, under a global control switch (such as ...->page(-context => 'autosave') or the like). it would automatically do the appropriate graphics (and text) saves and restores at the beginning of an object. The default would be "off", so that existing code continues to behave the same way, or if the programmer wishes for it to behave as it does today. Suggestions? Would it be useful at all to start a new PDF object/stream at every change of PDF::API2 object (again, switchable)?

Add: save() and restore() can be called by a text object, but specifically does nothing. With a graphics object, 'q' or 'Q' is issued. The first thing to consider is issuing 'q' or 'Q'  for text objects (to take care of the graphics part of text work). We then need to consider whether something should be done about either saving/restoring text settings, or just resetting them to standard defaults with a reset() method. Finally, should these things be done with explicit calls, or a global or page switch?
« Last Edit: April 11, 2017, 07:08:16 PM by Phil »

*

Offline sciurius

  • Jr. Member
  • **
  • 67
    • View Profile
    • Website
Re: CTS 7 - text and graphics objects
« Reply #3: April 12, 2017, 07:55:01 AM »
Basically, text is a subset of graphics. As the PDF ref says: "The text state comprises those graphics state parameters that only affect text.".

So the biggest confusion may come from the page->gfx and page->text methods, which suggest that text objects and graphics objects are distinct.

The only thing that distinguishes a text object from a graphics object is BT and ET operations, which have effect on the transformation matrix only.

What about introducing a gfx->text method to make it clear that (these) text objects are also graphics objects, and that the rules for gfx->save and gfx->restore apply?
I would certainly not add q/Q automatically since only the programmer knows when/what to save/restore.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #4: April 12, 2017, 08:47:15 AM »
That's an interesting look at the problem — that text and graphics are too artificially separated, and that text is really a superset of graphics. Or is it a subset, as you said? Text has a lot of graphics in it, but I'm not sure one is a pure superset of the other, and so gfx()->text() may or may not work well. There are font-related operators and settings (e.g., leading), and its own transformation matrix.

Anyway, the case remains that when working with a text object, there are a lot of graphics (gfx) operators and settings in play, as well as some things unique to text, which are not saved and restored at this time. What should someone using multiple text or graphic objects expect (more or less intuitively) for how much effect they have on each other (settings in one bleeding into the other)? Should text get its own save() and restore()? What should be saved and restored (or just reset) when a new object/stream is started? How mandatory should this be? And while we're at it, let's not break existing code!

*

Offline sciurius

  • Jr. Member
  • **
  • 67
    • View Profile
    • Website
Re: CTS 7 - text and graphics objects
« Reply #5: April 13, 2017, 03:13:52 AM »
PDF Reference version 1.7 ch 5 "Text" (p. 387):
Quote
Text state. A subset of the graphics state parameters pertain to text, including
parameters that select the font, scale the glyphs to an appropriate size, and
accomplish other graphical effects.

When text is a subclass of gfx, it will be clear that state is saved with gfx->save and it can be documented that text->save does nothing (or warns).

To not break existing code I think it is best to not touch the current page->text objects.

What matters is the behaviour of the transformation matrix. Translate/rotate/scale/skew do currently not accumulate for text, at least not in a way that I have been able to understand. It seems that every text operation (translate, scale, ...) starts with a fresh matrix. For example,

Code: [Select]
$text->translate( 100, 500 );
$text->text("text @ 100,500");
$text->translate( 100, 100 );
$text->text("text @ 100,100, not 200,600");

In particular:

Code: [Select]
$text->translate(100, 100);
$text->rotate(30);
$text->text("rotated at 0,0");

while this works:

Code: [Select]
$text->transform( -translate => [ 100, 100 ],
  -rotate => 30 );
$text->text("rotated @ 100,100");
« Last Edit: April 13, 2017, 05:22:43 AM by sciurius »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #6: April 13, 2017, 07:23:55 PM »
I'm trying myself to get a handle on the difference in transformation matrices between graphics and text. Other than that, it appears to me that "text" equals all "graphics" capabilities plus a handful of text-specific settings and operations. That would make text a superset of graphics, except for some reason, the coordinate transformations. If you drew a Venn diagram, you would probably see a great deal of overlap between the graphics and text sides, but not one entirely within the other.

Regarding whether save() and restore() should be enabled for "text" objects, that is, the output of 'q' and 'Q' PDF operators, I'm leaning towards doing that (removing the "graphics-only" test). My reasoning is that so much context is passed between graphics and text states. Color, line width, dash, etc. set in text show up in the next graphics object/stream, and vice-versa. It's difficult to control this if only "graphics" can do save and restore, because (keep in mind) all a PDF::API2 graphics object operators are consolidated (in order) into one PDF object/stream. Instead of graphics and text operations being intertwined as you go along, they are strictly separated — and which comes first depends on the order of creation of the graphics and text objects. If you could keep very close tabs on all calls, you might be able to force save() and restore() into the very beginning and end of a graphics object, but it would not be in a very logical order (i.e., hard to understand and easy to screw up).

I think that any call to textobj->save() in use (currently a no-op) would have been put there with the expectation that it would do something. If a programmer realized that it was a no-op, they wouldn't have put it in. Therefore, I don't think that enabling it to work in a text object is going to break any reasonably properly written code. I'm sure there is plenty of code written without understanding of the O-O interactions that I have just documented, complete with lots of "just in case" code, ad hoc settings, and Hail Mary passes, but in any case these ought to be cleaned up.

Some testing:

The following GRAPHICS/TEXT settings are saved and restored:
linewidth(), strokecolor(), fillcolor(), linecap(), linejoin(), miterlimit(), linedash(), flatness() [I think so, but I still can't see any difference between flatness settings, per RT 98539], current transformation matrix, and current clipping port

The following TEXT settings would be saved and restored if save() and restore() were enabled for TEXT:
charspace(), wordspace(), hscale(), lead(), font(), render(), rise()

The following have no PDF::API2 calls and were not tested (most, if not all, can probably be done with ExtGState):
rendering intent (graphics), stroke adjustment, blend mode, soft mask, alpha constant, alpha source, "current color space", "current color" (if different from stroke color and fill color), overprint, overprint mode, black generation, undercolor removal, transfer, halftone, smoothness, text knockout (TEXT setting)


« Last Edit: April 17, 2017, 10:57:45 PM by Phil »

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #7: April 18, 2017, 10:06:06 AM »
Besides the various settings calls in Content.pm, ExtGState.pm can also be used to set an extended graphics state. There is some overlap between the two (linewidth, linejoin, linecap, linedash, miterlimit, and flatness), in that the same PDF graphics state can be changed, but there is no coordination between Content and ExtGState as to who last set what, and new settings made by one aren't known by the other (such as when returning the current state of a setting).

Also be careful about expectations that the returned value of a setting is correct at any given time. Remembering that all actions for a given PDF::API2 text or graphics object are gathered up into a single PDF object and stream, overall operations might not necessarily be performed in exactly the same order as they appear in the source code (particularly if intermixing operations on different PDF::API2 objects). A setting change (e.g., fillcolor) made to $text1 will be unchanged when later queried under $text1->fillcolor(), according to PDF::API2, but it's somewhat possible that something happened in-between due to other objects being processed, or (much more likely), the entry conditions to the PDF object/stream are not what you expected, given your ordering of operations and intermixing of various objects.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #8: April 18, 2017, 10:17:04 AM »
Should we consider offering a new object (class) for PDF::API2, which
  • Draws no distinction between text and graphics.
  • Permits only one instance in a run, either through setting a switch and checking it at new(), or simply having all instances of this object pool their settings into one place.
It can't replace the existing text and graphics objects (classes), as those are already in use. But, it might be closer to what PDF is actually doing when it processes the stream. Would this be more, or less, confusing than what is currently available?

It could be automatically created, since there will be only one per page, or it might even just be part of $page:
Code: [Select]
$pdf = PDF::API2->new();
$page = $pdf->page();
$page->strokecolor('blue');
etc., unless this confusingly overloads the page object.

If there are any size limits on how much can be in a PDF object/stream, it could automatically split up content into multiple PDF objects.

*

Offline sciurius

  • Jr. Member
  • **
  • 67
    • View Profile
    • Website
Re: CTS 7 - text and graphics objects
« Reply #9: April 19, 2017, 04:09:31 PM »
A setting change (e.g., fillcolor) made to $text1 will be unchanged when later queried under $text1->fillcolor() ...

AFAIK, such a change is registered in the (internal) object so a subsequent query will return the updated value.

*

Offline sciurius

  • Jr. Member
  • **
  • 67
    • View Profile
    • Website
Re: CTS 7 - text and graphics objects
« Reply #10: April 19, 2017, 04:20:46 PM »
If there are any size limits on how much can be in a PDF object/stream, it could automatically split up content into multiple PDF objects.

If you need to split a stream at a given point, can you transfer the actual graphics state to the second part of the stream?

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #11: April 19, 2017, 09:03:02 PM »
AFAIK, such a change is registered in the (internal) object so a subsequent query will return the updated value.

Yes, it should, but I want to make sure we have our bases covered regarding something being executed in another PDF stream that might possibly change a PDF setting in-between PDF::API2 setting a value, and the actual execution of the later PDF command. Offhand, I can't think of any situation where this might occur, but still, I have this nagging feeling in the back of my head that there might be some odd edge case where something could appear to change asynchronously, especially if you mix Content methods and ExtGState methods.

*

Offline Phil

  • Global Moderator
  • Sr. Member
  • *****
  • 353
    • View Profile
Re: CTS 7 - text and graphics objects
« Reply #12: April 19, 2017, 09:08:33 PM »
If you need to split a stream at a given point, can you transfer the actual graphics state to the second part of the stream?

So long as you guarantee that the continuation object/stream comes immediately after its predecessor (no other objects between them), I don't see why the graphics state wouldn't automatically be the same. I'm not aware of PDF doing any graphics state reset at the beginning of a stream (the whole point of my discussion about object interactions with PDF processing).

Does anyone know offhand if there is a size limit to a PDF stream? Possibly something to do with the integer size (e.g., 16 or 32 bit) used to hold the stream length?