Render tree¶
For historical reasons, librsvg’s code flow during rendering is as
follows.  The rendering code traverses the SVG tree of elements, and
for each one, its ::draw() method is called; its signature looks
like this (some arguments omitted):
pub fn draw(
    &self,
    ...
    draw_ctx: &mut DrawingCtx,
) -> Result<BoundingBox, RenderingError> { ... }
The draw() methods perform the actual rendering as side effects on the
draw_ctx, and return a BoundingBox.  That is, the bounding box of
an element is computed at the same time that it is rendered.  This is
suboptimal for several reasons:
Many things that happen during rendering depend on knowing the bounding box. For example, gradients, patterns, and filters with units set to
objectBoundingBoxneed to know the bounds. The rendering code in drawing_ctx.rs is cluttered because it must resolve bounding boxes very late.This is especially problematic for filters, since a Cairo surface needs to be created before rendering, and that surface should have a size relative to the bounding box of the element being filtered! Bug #1 is precisely about this: librsvg instead creates a temporary surface as big as the document’s toplevel viewport and filters it, but this doesn’t work well for filters like Gaussian blur that should actually reference pixels outside of the document’s area (think of a shape that extends past the document’s area, which then gets blurred).
The way for an element to signal that it is not drawable (e.g.
<defs>is by returning an empty bounding box and not rendering anything. This is awkward.When rendering to a temporary surface for filtering or masking, there is a set of affine transformations that needs to be maintained carefully: an affine for the clipping path outside the temporary surface, an affine for drawing inside the surface, an affine to composite the surface into the final result. This is hard to understand and hard to test.
These problems can be solved by having a render tree.
What is a render tree?¶
As of 2022/Oct/06, librsvg does not compute a render tree data
structure prior to rendering.  Instead, in a very 2000s fashion, it
walks the tree of elements and calls a .draw() method for each
one.  Each element then calls whatever methods it needs from
DrawingCtx to draw itself.  Elements which don’t produce graphical
output (e.g. <defs> or <marker>) simply have an empty
draw() method.
Over time we have been refactoring that in the direction of actually being able to produce a render tree. What would that look like? Consider an SVG document like this:
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <defs>
    <rect id="TheRect" x="10" y="10" width="20" height="20" fill="blue"/>
  </defs>
  <g>
    <use href="#TheRect" stroke="red" stroke-width="2"/>
    <circle cx="50" cy="50" r="20" fill="yellow"/>
  </g>
</svg>
A render tree would be a list of nested instructions like this:
group {                            # refers to the toplevel SVG
  width: 100
  height: 100
  establishes_viewport: true       # because it is an <svg> element
  children {
    group {                        # refers to the <g>
      establishes_viewport: false  # because it is a simple <g>
      children {
        shape {
          path="the <rect> above but resolved to path commands"
          # note how the following is the cascaded style and the <use> semantics
          fill: blue
          stroke: red
          stroke-width: 2
        }
        shape {
          path="the <circle> above but resolved to path commands"
          fill: yellow
        }
      }
    }
  }
}
That is, we take the high-level SVG instructions and “lower” them to a few possible drawing primitives like path-based shapes that can be grouped. All the primitives have everything that is needed to draw them, like their set of computed values for styles, and their coordinates resolved to their user-space coordinate system.
Browser engines produce render trees more or less similar to the above (they don’t always call them that), and get various benefits:
The various recursively-nested subtrees can be rendered concurrently.
Having low-level primitives makes it easier to switch to another rendering engine in the future.
The tree can be re-rendered without recomputation, or subtrees can be recomputed efficiently if e.g. an animated element changes a few of its properties.
Why did librsvg not do that since the beginning?¶
Librsvg was originally written in the early 2000s, when several things were happening at the same time:
libxml2 (one of the early widely-available parsers for XML) had recently gotten a SAX API for parsing XML. This lets an application stream in the parsed XML elements and process them one by one, without having to build a tree of elements+attributes first. In those days, memory was at a premium and “not producing a tree” was seen as beneficial.
The SVG spec itself was being written, and it did not have all of the features we know now. In particular, maybe at some point it didn’t have elements that worked by referencing others, like
<use>or<filter>. The CSS cascade could be done on the fly for the XML elements being streamed in, and one could emit rendering commands for each element to produce the final result.
That is, at that time, it was indeed feasible to do this: stream in parsed XML elements one by one as produced by libxml2, and for each element, compute its CSS cascade and render it.
This scheme probably stopped working at some point when SVG got
features that allowed referencing elements that have not been declared
yet (think of <use href="#foo"/> but with the <defs> <path
id="foo" .../> </defs> declared until later in the document).  Or
elements that referenced others, like <rect filter="url(#blah)">.
In both cases, one needs to actually build an in-memory tree of parsed
elements, and then resolve the references between them.
That is where much of the complexity of librsvg’s code flow comes from:
AcquiredNodesis the thing that resolves references when needed. It also detects reference cycles, which are an error.ComputedValuesoften get resolved until pretty late, by passing theCascadedValuesstate down to children as they are drawn.DrawingCtxwas originally a giant ball of mutable state, but we have been whittling it down and moving part of that state elsewhere.
Summary of the SVG rendering model¶
In the SVG2 spec, this has been offloaded to the “Order of graphical operations” section of the Compositing and Blending Level 1 spec. Once the render tree is resolved, each node is painted like this, conceptually to a transparent, temporary surface:
Paint the shape/text/etc.
Filters.
Clip paths.
Masks.
Blend/composite the temporary surface onto the result.
The most critical function in librsvg is probably
DrawingCtx::with_discrete_layer();
it implements this drawing model.
Current state (2023/03/30)¶
layout.rs has the beginnings of the render tree.  It’s probably mis-named?  It contains this:
A
LayerKindwith primitives for path-based shapes, text, and images.A stacking context, which indicates each layer’s opacity/clip/mask/filters.
A
Layerwhich composes the previous two. TheStackingContextprovides the compositing/masking/filtering parameters, while theLayerKinddetermines the primitive contents of the layer.Various ancillary structures that try to have only user-space coordinates (e.g. a number of CSS pixels instead of
5cm) and no references to other things.
The last point is not yet fully realized.  For example,
StackingContext.clip_in_user_space has a reference to an element,
which will be used as the clip path — that one needs to be normalized
to user-space coordinates in the end.  Also,
StackingContext.filter is a filter list as parsed from the SVG,
not a FilterSpec that has been resolved to user space.
It would be good to resolve everything as early as possible to allow lowering concepts to their final renderable form. Whenever we have done this via refactoring, it has simplified the code closer to the actual rendering via Cairo.
Major subprojects¶
Path based shapes (layout::Shape) and text primitives
(layout::Text) are almost done.  The only missing thing for shapes
would be to “explode” their markers into the actual primitives that
would be rendered for them.  However…
There is no primitive for groups yet.  Every SVG element that allows
renderable children must produce a group primitive of some sort:
svg, g, use, marker, etc.  Among those, use and
marker are especially interesting since they must explode their
referenced subtree into a shadow DOM, which librsvg doesn’t support
yet for CSS cascading purposes (the reference subtree gets rendered
properly, but the full semantics of shadow DOM are not implemented
yet).
Elements that establish a viewport (svg, symbol, image,
marker, pattern) need to carry information about this
viewport, which is a viewBox plus preserveAspectRatio and
overflow.  See #298 for a somewhat obsolete description
of the refactoring work needed to unify this logic.
The layout::StackingContext struct should contain another field,
probably called layer, with something like this:
struct StackingContext {
    // ... all its current fields
    layer: Layer
}
enum Layer {
    Shape(Box<Shape>),
    Text(Box<Text>),
    StackingContext(Box<StackingContext>)
}
That is, every stacking context should contain the thing that it will draw, and that thing may be a shape/text or another stacking context!
As of 2023/03/30, the “current viewport” is no longer part of
DrawingCtx’s mutable state.  Instead, a Viewport struct is passed
down the call chain via a function argument.  This is not complete
yet, since the code modifies the current cr’s transform apart from
the current viewport’s transform.  The goal is to have the current
viewport actually have the full transform to be applied to the object
being rendered.  This should simplify gnarly code paths like the one
for rendering <pattern>.
Bounding boxes¶
SVG depends on the objectBoundingBox of an element in many places:
to resolve a gradient’s or pattern’s units, to determine the size of
masks and clips, to determine the size of the filter region.
The current big bug to solve is #778, which requires
knowing the objectBoundingBox of an element before rendering
it, so that a temporary surface of the appropriate size can be created
for rendering the element if it has isolated opacity or masks/filters.
Currently librsvg creates a temporary surface with the size and
position of the toplevel viewport, and this is wrong for shapes that
fall outside the viewport.
The problem is that librsvg computes bounding boxes at the time of
rendering, not before that.  However, now layout::Shape and
layout::Text already know their bounding box beforehand.  Work
needs to be done to do the same for a layout::Group or whatever
that primitive ends up being called (by taking the union of its
children’s bounding boxes, so e.g. that a group with a filter can
create a temporary surface to be able to render all of its children
and then filter the surface).
Being able to compute the objectBoundingBox of an element before
rendering it would open the door to fixing bug #1 (yeah, really):
currently, the temporary surface used for filtering has the size of
the toplevel viewport, but this doesn’t work well when one tries to
Gaussian-blur an element that lies partially outside that viewport.
The filter should apply to the element’s extents plus the filter
region, which takes into account the extra space needed for a Gaussian
blur to work around a shape.  Since librsvg cannot render the full
shape if it lies partially outside of the toplevel viewport, the
blurred result shows up with a halo near the image’s edge, since
transparent pixels get “blurred in” with the shape’s pixels.