DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
Abstract
Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learningbased models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles highlevel shapes from the lowlevel commands that encode the shape itself. The network directly predicts a set of shapes in a nonautoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new largescale dataset along with an opensource library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at https://github.com/alexandre01/deepsvg.
main.bib
1 Introduction
Despite recent success of rasterized image generation and content creation, little effort has been directed towards generation of vector graphics. Yet, vector images, often in the form of Scalable Vector Graphics w3c_svg (SVG), have become a standard in digital graphics, publicationready image assets, and webanimations. The main advantage over their rasterized counterpart is their scaling ability, making the same image file suitable for both tiny webicons or billboardscale graphics. Generative models for vector graphics could serve as powerful tools, allowing artists to generate, manipulate, and animate vector graphics, potentially enhancing their creativity and productivity.
Raster images are most often represented as a rectangular grid of pixels containing a shade or color value. The recent success of deep learning on these images much owes to the effectiveness of convolutional neural networks (CNNs) convolutional1998, learning powerful representations by taking advantage of the inherent translational invariance. On the other hand, vector images are generally represented as lists of 2D shapes, each encoded as sequence of 2D points connected by parametric curves. While this brings the task of learning SVG representations closer to that of sequence generation, there are fundamental differences with other applications, such as Natural Language Processing. For instance, similar to the translation invariance in raster images, an SVG image experiences permutation invariance as the order of shapes in an SVG image is arbitrary. This brings important challenges in the design of both architectures and learning objectives.
We address the task of learning generative models of complex vector graphics. To this end, we propose a Hierarchical Transformerbased architecture that effectively disentangles highlevel shapes from the lowlevel commands that encode the shape itself. Our encoder exploits the permutation invariance of its input by first encoding every shape separately, then producing the latent vector by reasoning about the relations between the encoded shapes. Our decoder mirrors this 2stage approach by first predicting, in a single forward pass, a set of shape representations along with their associated attributes. These vectors are finally decoded into sequences of draw commands, which combined produce the output SVG image. A schematic overview of our architecture is given in Fig. 4.
Contributions Our contributions are threefold: 1. We propose DeepSVG, a hierarchical transformerbased generative model for vector graphics. Our model is capable of both encoding and predicting the draw commands that constitute an SVG image. 2. We perform comprehensive experiments, demonstrating successful interpolation and manipulation of complex icons in vectorgraphics format. Examples are presented in Fig. 1. 3. We introduce a largescale dataset of SVG icons along with a framework for deep learningbased SVG manipulation, in order to facilitate further research in this area. To the best of our knowledge, this is the first work to explore generative models of complex vector graphics, and to show successful interpolation and manipulation results for this task.
2 Related Work
Previous works alex2017logo; mino2018logan for icon and logo generation mainly address rasterized image, by building on Generative Adversarial Networks goodfellow2014gan. Unlike raster graphics, vector graphics generation has not received extensive attention yet, and has been mostly limited to the simplified task of sketch generation, using the ’Quick, Draw!’ quickdraw dataset. SketchRNN ha2017sketchrnn was the first Long Short Term Memory (LSTM) lstm based variational autoencoder (VAE) kingma2013vae addressing the generation of sketches. More recently, Sketchformer ribeiro2020sketchformer has shown that a Transformerbased architecture enables more stable interpolations between sketches, without tackling the generation task. One reason of this success is the ability of transformers vaswani2017transformer to more effectively represent long temporal dependencies.
SVGVAE lopes2019svgvae was one of the first deep learningbased works that generate full vector graphics outputs, composed of straight lines and Bézier curves. However, it only tackles glyph icons, without global attributes, using an LSTMbased model. In contrast, we consider the hierarchical nature of SVG images, crucial for representing and generating complex vector graphics, such as icons. Fig. 4 compares previous onestage autoregressive approaches ha2017sketchrnn; lopes2019svgvae; ribeiro2020sketchformer to our hierarchical architecture. Our work is also related to the very recent PolyGen nash2020polygen for generating 3D polygon meshes using sequential prediction vertices and faces using a Transformerbased architecture.
3 DeepSVG
Here, we introduce our DeepSVG method. First, we propose a dataset of complex vector graphics and describe the SVG data representation in Sec. 3.1. We describe our learned embedding in Sec. 3.2. Finally, we present our architecture in Sec. 3.3 and training strategy in Sec. 3.4.
3.1 SVG Dataset and Representation
SVGIcons8 Dataset. Existing vector graphics datasets either only contain straight lines quickdraw or are constrained to font generation lopes2019svgvae. These datasets therefore do not pose the challenges associated with the generation of complex vector graphics, addressed in this work. Thus, we first introduce a new dataset, called SVGIcons8. It is composed of SVG icons obtained from the https://icons8.com website. In the compilation of the dataset, we carefully considered the consistency and diversity of the collected icons. This was mainly performed by ensuring that the vector graphics have similar scale, colors and style, while capturing diverse realworld graphics allowing to learn meaningful and generalizable shape representations. In summary, our dataset consists of 100,000 highquality icons in 56 different categories. Samples from the dataset are shown in Fig. 5. We believe that the SVGIcons8 dataset, which is to be released upon acceptance, will serve as a challenging new benchmark for the growing task of vector graphics generation and representation learning.
Vector Graphics and SVG. In contrast to Raster graphics, where the content is represented by a rectangular grid of pixels, Vector graphics employs in essence mathematical formulas to encode different shapes. Importantly, this allows vector graphics to be scaled without any aliasing or loss in detail. Scalable Vector Graphics (SVG) is an XMLbased format for vector graphics w3c_svg. In its simplest form, an SVG image is built up hierarchically as a set of shapes, called paths. A path is itself defined as a sequence of specific drawcommands (see Tab. 1) that constitute a closed or open curve.
Command  Arguments  Description  Visualization  

<SOS> 



, 



,  Draw a line to the point .  




z (ClosePath) 


<EOS> 

Data structure. In order to learn deep neural networks capable of encoding and predicting vector graphics, we first need a well defined and simple representation of the data. This is obtained by adopting the SVG format with the following simplifications. We employ the commands listed in Tab. 1. In fact, this does not significantly reduce the expressivity since other basic shapes can be converted into a sequence of Bézier curves and lines. We consider a Vector graphics image to be a set of paths . Each path is itself defined as a triplet , where indicates the visibility of the path and determines the fill property. Each contains a sequence of commands . The command itself is defined by its type and arguments, as listed in Tab. 1. To ensure efficient parallel processing, we use a fixedlength argument list , where any unused argument is set to . Therefore, we also use a fixed number of paths and commands by simply padding with invisible elements in each case. Further details are given in appendix.
3.2 SVG Embedding
By the discrete nature of the data and in order to let the encoder reason between the different commands, every is projected to a common continuous embedding space of dimension , similarly to the de facto approach used in Natural Language Processing vaswani2017transformer. This enables the encoder to perform operations across embedded vectors and learn complex dependencies between argument types, coordinate values and relative order of commands in the sequence. We formulate the embedding of the SVG command in a fashion similar to child2019sparse_transformer. In particular, the command is embedded to a vector as the sum of three embeddings, . We describe each individual embedding next.
Command embedding. The command type (see Tab. 1) is converted to a vector of dimension using a learnable matrix as , where designates the 6dimensional onehot vector containing a at the command index .
Coordinate embedding. Inspired by works such as PixelCNN oord2016pixelcnn and PolyGen nash2020polygen, which discretize continuous signals, we first quantize the input coordinates to 8bits. We also include a case indicating that the coordinate argument is unused by the command, thus leading to an input dimension of for the embedding itself. Each coordinate is first embedded separately with the weight matrix . The combined result of each coordinate is then projected to a dimensional vector using a linear layer ,
(1) 
Here, denotes the vectorization of a matrix.
Index embedding. Similar to child2019sparse_transformer, we finally use a learned index embedding^{1}^{1}1Known as positional embedding in the Natural Language Processing literature vaswani2017transformer. that indicates the index of the command in the given sequence using the weight as , where is the onehot vector of dimension filled with a at index .
3.3 Hierarchical Generative Network
In this section, we describe our Hierarchical Generative Network architecture for complex vector graphics interpolation and generation, called DeepSVG. A schematic representation of the model is shown in Fig. 6. Our network is a variational autoencoder (VAE) kingma2013vae, consisting of an encoder and a decoder network. Both networks are designed by considering the hierarchical representation of an SVG image, which consists of a set of paths, each path being a sequence of commands.
Feedforward prediction. For every path, we propose to predict the commands in a purely feedforward manner. Our generative model is thus factorized as,
(2) 
where is the latent vector and further factorizes into the individual arguments. Note that our approach is conceptually different to the autoregressive strategy used in previous works ha2017sketchrnn; lopes2019svgvae, which learns a model predicting the next command conditioned on the history. We found our approach to lead to significantly better reconstructions and smoother interpolations, as analyzed in Sec. 4. Intuitively, the feedforward strategy allows the network to primarily rely on the latent encoding to reconstruct the input, without taking advantage of the additional information of previous commands and arguments. Importantly, a feedforward model brings major advantages during training, since inference can be directly modeled during training. On the other hand, autoregressive methods graves2013seq_rnn; vaswani2017transformer condition on groundtruth to ensure efficient training through masking, while the inference stage conditions on the previously generated commands.
Encoder. To keep the permutation invariance property of the paths set , we first encode every path independently using path encoder . More specifically, takes the embeddings as input and outputs vectors of same dimension. To retrieve the single dimensional path encoding , we averagepool the output vectors along the sequential dimension. The path encodings are then input in encoder which, after pooling along the setdimension, outputs the parameters of a Gaussian distribution and . Note how the index embedding in vector enables to reason about the sequential nature of its input while maintains the permutation invariance of the input paths. The latent vector is finally obtained using the reparametrization trick kingma2013vae as , where .
Decoder. The decoder mirrors the twostage construction of the encoder. inputs the latent vector repeatedly, at each transformer block, and predicts a representation of each shape in the image. Unlike the corresponding encoder stage, permutation invariance is not a desired property for , since its purpose is to generate the shapes in the image. We achieve this by using a learned index embedding as input to the decoder. The embeddings are thus distinct for each path, breaking the symmetry during generation. The decoder is followed by a Fully Connected Network (FCN) that outputs, for each index , the predicted path encoding , filling and visibility attributes. Symmetrically to the encoder, the vectors are decoded by into the final output path representations . As for , we use learned constant embeddings as input and an FCN to predict the command and argument logits. Detailed descriptions about the architectures are given in the appendix.
Transformer. Inspired by the success of transformerbased architectures for a variety of tasks ribeiro2020sketchformer; transformer_imgcaptioning; child2019sparse_transformer; transformer_questanswering, we also adopt it as the basic building block for our network. Both the Encoders and the Decoders are Transformerbased. Specifically, as in ribeiro2020sketchformer, we use layers, with a feedforward dimension of 512 and .
3.4 Training Objective
Next, we present the training loss used by our DeepSVG. We first define the loss between a predicted path and a groundtruth path as,
(3) 
Here, denotes the CrossEntropy loss. The impact of each term is controlled by its weight . The losses for filling, commands and arguments are masked when the groundtruth path is not visible. The loss over the argument prediction is defined as,
(4) 
Having formulated the loss for a single path, the next question regards how to this can be used to achieve a loss on the entire prediction. However, recall that the collection of paths in a vector image has no natural ordering, raising the question of how to assign groundtruth paths to each prediction. Formally, a groundtruth assignment is a permutation , mapping the path index of the prediction to the corresponding groundtruth path index . We discuss two alternatives for solving the groundtruth assignment problem.
Ordered assignment. One strategy is to define the assignment by sorting the groundtruth paths according to some specific criterion. This induces an ordering , which the network learns to reproduce. We found defining the groundtruth assignment using the lexicographic order of the starting location of the paths to yield good results. Given any sorting criterion, the loss is defined as,
(5) 
where the first term corresponds to the latent space prior induced by the VAE learning.
Hungarian assignment. We also investigate a strategy that does not require defining a sorting criterion. For each prediction, we instead find the best possible assignment in terms of loss,
(6) 
The best permutation is found through the Hungarian algorithm kuhn1955hungarian; munkres1957algorithms.
Training details. We use the AdamW loshchilov2017adamW optimizer with initial learning rate , reduced by a factor of every epochs and a linear warmup period of initial steps. We use a dropout rate of in all transformer layers and gradient clipping of . We train our networks for 100 epochs with a total batchsize of 120 on two 1080Ti GPUs, which takes about one day.
4 Experiments
We validate the performance of our DeepSVG method on the introduced SVGIcons8 dataset. We also demonstrate results for glyph generation on the SVGFonts lopes2019svgvae dataset. Further experiments and interactive examples are presented in the supplementary material.
4.1 Ablation study
Feedforward  Hierarchical  Matching  1^{st} rank %  Average rank  

Baseline  9.7  
Onestage feedforward  ✓  19.5  
Ours – Hungarian  ✓  ✓  Hungarian  25.8  
Ours – Ordered  ✓  ✓  Ordered  44.8 
We ablate our model by conducting a human study. As baseline, we use an autoregressive onestage architecture by concatenating the set of (unpadded) input sequences, sorted using the Ordered criterion 3.4. The number of paths therefore becomes and only Encoder and Decoder are used; filling is ignored in that case. We analyze the effect of feedforward prediction, and then our hierarchical DeepSVG architecture, using either the Ordered or Hungarian assignment loss 3.4. The human study is conducted by randomly selecting 100 pairs of SVG icons, and showing the interpolations generated by the four models to 10 human participants, which rank them best (1) to worst (4). In Tab. 2 we present the results of this study by reporting the percentage of 1^{st} rank votes, as well as the average rank for each model. We also show qualitative results in Fig. 7, here ignoring the filling attribute since it is not supported by onestage architectures. Compared to the autoregressive baseline, the use of feedforward prediction brings substantial improvement in reconstruction and interpolation quality, as also confirmed by the qualitative results. In our human study, our hierarchical architecture with ordered assignment yields superior results. Although providing notably better reconstruction quality, this version provides much more stable and meaningful interpolations compared to the other approaches.
The Hungarian assignment achieves notably worse results compared to ordered assignment in average. Note that the latter is more related to the loss employed for the onestage baselines, although there acting on a command level. We hypothesize that the introduction of a sensible ordering during training helps the decoder learning by providing an explicit prior, which better breaks symmetries and reduces competition between the predicted paths. Fig. 8 further shows how the latent SVG representation translates to meaningful decodings by performing interpolation between 4 SVG icons.
4.2 Animation by interpolation
As visually demonstrated in the previous subsection, we observe significantly better reconstruction capability of our model than previous works. This property is crucial for realworld applications involving SVGs since users should be able to perform various operations on vector graphics while keeping their original drawings unchanged. With this requirement in mind, we examine if DeepSVG can be used to animate SVGs, by interpolating between two userdrawn ones. Fig. 9 shows the results of challenging scenes, after finetuning the model on the both frames for about 1,000 steps. Notice how DeepSVG handles well both translations and deformations.
4.3 Latent space algebra
Given DeepSVG’s smooth latent space and accurate reconstruction ability, we next ask if latent directions may enable to manipulate SVGs globally in a semantically meaningful way. We present two experiments in Fig. 10. In both cases, we note the difference between encodings of two similar SVGs differing by some visual semantics. We show how this latent direction can be added or subtracted to the latent vector of arbitrary SVG icons. More experiments are presented in the appendix. In particular, we examine whether DeepSVG’s hierarchical construction enables similar operations to be performed on single paths instead of globally.
4.4 Font generation
Our experiments have demonstrated so far reconstruction, interpolation and manipulation of vector graphics. In this section, we further show the generative capability of our method, by decoding random vectors sampled from the latent space. We train our model on the SVGFonts dataset, for the task of classconditioned glyph generation. DeepSVG is extended by adding label embeddings at every layer of each Transformer block. Fig. 12 presents random samples of our model. More details on the architecture, results and comparisons are shown in the appendix.
5 Conclusion
We have demonstrated how our hierarchical network can successfully perform SVG icons interpolations and manipulation. We hope that our architecture will serve as a strong baseline for future research in this, to date, littleexplored field. Interesting applications of our architecture include rastertovector conversion or the more general task of XML generation by extending the twolevel hierarchy used in this work. Furthermore, while DeepSVG was designed specifically for the natural representation of SVGs, our architecture can be used for any task involving data represented as a set of sequences. We therefore believe it can be used, with minimal modifications, in a wide variety of tasks, including multiinstrument audio generation, multihuman motion trajectory generation, etc.
Broader Impact
DeepSVG can be used as animation tool by performing interpolations and other latent space operations on userdrawn SVGs. Similarly to recent advances in rasterized content creation, we believe this work will serve as a potential way for creators and digital artists to enhance their creativity and productivity.
Appendix
In this appendix, we first present a visualization of the data structure used for SVGs in Sec. A. We provide detailed instructions used to preprocess our data in Sec.B. Additional details on training and architectures are given in Sec. C and Sec. D. Sec. E goes through the procedure to predict filling along with SVG paths. Finally, additional results for font generation, icons generation, latent space algebra, animations and interpolations are presented in sections F, G, H, I and J respectively.
Appendix A SVG Representation visualization
For a visual depiction of the data structure described in Sec. 3.1, we present in Fig. 13 an example of SVG image along with its tensor representation. The SVG image consists of 2 paths, and . The former, starts with a move m command from the top left corner. The arc is constructed from two Cubic Bézier curve c commands. This is followed by a line l and close path z command. The <EOS> command indicates the end of path . is constructed in a similar fashion using only a single Cubic Bézier curve.
Appendix B SVG Preprocessing
In Sec. 3.1, we consider that SVG images are given as a set of paths, restricted to the 6 commands described in Tab. 1. As mentioned, this does not reduce the expressivity of vector graphics since other basic shapes and commands can be converted to that format. We describe next the details of these conversions.
Path commands conversion. Lowercase letters in SVG path commands are used to specify that their corresponding arguments are relative to the preceding command’s endposition, as opposed to absolute for uppercase letters. We start by converting all commands to absolute. Other available commands (H: HorizonalLineTo, V: VerticalLineTo, S: SmoothBezier, Q: QuadraticBezier, T: SmoothQuadraticBezier) can be trivially converted to the commands subset of Tab. 1. The only missing command that needs further consideration is the ellipticalarc command A, described below.
Elliptical arc conversion. As illustrated in Fig. 14, command A , , draws an elliptical arc with radii and (semimajor and semiminor axes), rotated by angle to the axis, and endpoint . The bitflags and are used to uniquely determine which one of the four possible arcs is chosen: largearcflag is set to if the arc spanning more than is chosen, 0 otherwise; and sweepflag is set to 0 if the arc is oriented clockwise, 1 otherwise. We argue that this parametrization, while being intuitive from a userperspective, adds unnecessary complexity to the commands argument space described in Sec.3.1 and the bitflags make shapes noncontinuous w.r.t. their arguments, which would result in less smooth animations.
We therefore convert A commands to multiple Cubic Bézier curves. We first start by converting the endpoint parametrization to a center parametrization . The center of the ellipse is computed using:
(7) 
where,
(8) 
(9) 
We then determine the start angle and angle range which are given by computing:
(10) 
(11) 
Using , and , we obtain the parametric elliptical arc equation as follows (for ranging from to ):
(12) 
and the derivative of the parametric curve is:
(13) 
Given both equations, maisonobe2003ellipticalarc shows that the section of elliptical arc between angles and can be approximated by a cubic Bézier curve whose control points are computed as follows:
(14) 
where
Basic shape conversion. In addition to paths, SVG images can be built using 6 basic shapes: rectangles, lines, polylines, polygons, circles and ellipses. The first four can be converted to paths using Line commands, while the latter two are transformed to a path using four Elliptical Arc commands, which themselves are converted to Bézier curves using the previous section. Table 3 below shows examples of these conversions.
Basic Shape  Path equivalent  







<line x1="0" x2="1" y1="0" y2="1" />  <path d="M0,0 L1,1" />  
<polyline points="0, 0 1, 0 1, 1" />  <path d="M0,0 L1,0 L1,1" />  
<polgon points="0, 0 1, 0 1, 1" />  <path d="M0,0 L1,0 L1,1 z" /> 
Path simplification. Similarly to SketchRNN ha2017sketchrnn, we preprocess our dataset in order to simplify the network’s task of representation learning. However, unlike the latter work, our input consists of both straight lines and parametric curves. Ideally, if shapes were completely smooth, one could reparametrize points on a curve so that they are placed equidistantly from one another. In practice though, SVG shapes contain sharp angles, at which location points should remain unchanged. We therefore first split paths at points that form a sharp angle (e.g. where the angle between the incoming and outgoing tangents is less than some threshold ). We then apply either the RamerDouglasPeucker douglas1973RDP algorithm to simplify line segments or the Philip J. Schneider algorithm schneider1990simplify for segments of cubic Bézier curves. Finally, we divide the resulting lines and Bézier curves in multiple subsegments when their lengths is larger than some distance = 5. Examples of SVG simplifications are shown in Fig. 15. Notice how our algorithm both adds points when curve segments are too long or reduces the amount of points when the curve resolution is too high.
SVG normalization. All SVGs are scaled to a normalized viewbox of size , and paths are canonicalized, meaning that a shape’s starting position is chosen to be the topmost leftmost point, and commands are oriented clockwise.
Appendix C Additional Training details
We augment every SVG of the dataset using 20 random augmentations with the simple transformations described as follows.
Scaling. We scale the SVG by a random factor in the interval .
Translation. We translate the SVG by a random translation vector where and are sampled independently in the interval .
We believe further robustness in shape representation learning and interpolation stability can be obtained by simply implementing more complex data augmentation strategies.
Appendix D Architectural details
Fig. 4 presents an overview illustration of our Hierarchical autoencoder architecture. In Fig. 16, we here show a more detailed view of the four main components of DeepSVG, i.e. the two encoders , and decoders , . Similarly to nash2020polygen, we use the improved Transformer variant described in child2019sparse_transformer; parisotto2019stabilizing_transformer as building block in all our components. and employ a temporal pooling module to retrieve a single dimensional vector from the and outputs respectively. and use learned embeddings as input in order to generate all predictions in a single forwardpass (nonautoregressively) and break the symmetry. The decoders are conditioned on latent vector or path representation by applying a linear transformation and adding it to the intermediate transformer representation in every block.
Appendix E Filling procedure visualization
Thanks to its hierarchical construction, DeepSVG can predict any number of global pathlevel attributes, which could be e.g. color, dash size, strokewidth or opacity. As a first step towards a network modeling all path attributes supported by the SVG format, we demonstrate support for filling. When using the default nonzero fillrule in the SVG specification, a point in an SVG path is considered inside or outside the path based on the draw orientations (clockwise or counterclockwise) of the shapes surrounding it. In particular, the insideness of a point in the shape is determined by drawing a ray from that point to infinity in any direction, and then examining the places where a segment of the shape crosses the ray. Starting with a count of zero, add one each time a path segment crosses the ray from left to right and subtract one each time a path segment crosses the ray from right to left. We argue that this parametrization is not optimal for neural networks to encode filling/erasing. Therefore, we simply let the network output a fillattribute that can take one of three values: outline, fill or erase. This attribute is trained in a supervised way along with the other losses and is then used to export the actual SVG file. In particular, overlapping fill and erase shapes are grouped together in a same path and oriented in a clockwise/counterclockwise fashion respectively, while outlined shapes remain unchanged.
Appendix F Font generation
In this section, we provide details and additional results for font generation, presented in Sec. 4.4.
Experimental setup. We train our models on the SVGFonts dataset lopes2019svgvae for 5 epochs using the same training hyperparameters as described in Sec. 3.4, reducing the learning rate by a factor 0.9 every quarter epoch. Furthermore, all encoder and decoder Transformer blocks are extended to be classconditioned. Similarly to how latent vector is fed into , we add the learned label embedding to the intermediate transformer representation, after liner transformation. This is done in , , and and applies for both our final model and the onestage baselines.
Results. We compare the generative capability of our final model with the same baselines as in Sec. 4.1. In addition, we show random samples from SVGVAE lopes2019svgvae. Results are shown in Fig. 18. Notice how the nonautoregressive settings generate consistently visually more precise font characters, without having to pick the best example from a larger set or use any postprocessing. We also note that due to the simplicity of the SVGFont dataset, no significant visual improvement from our hierarchical architecture can be observed here. To validate that our model generates diverse font samples, we also present in Fig. 19 different samples for every glyph. Note how the latent vector is decoded into a styleconsistent set of font characters. Diversity here includes different levels of boldness and more or less italic glyphs.
Appendix G Random samples of icons
In this section, we show random samples of icons by our model. Fig. 20 presents a set of icons generated by DeepSVG, obtained by sampling a random latent vector for each. These results show diverse icons that look visually reasonable. Note that the problem of generic icon generation is much more challenging than font generation. These results are promising, but much scope for improvement remains.
Appendix H Additional results on latent space algebra
As mentioned in Sec.4.3, operations on vectors in the latent space lead to semantically meaningful SVG manipulations. By the hierarchical nature of our architecture, we here demonstrate that such operations can also be performed at the pathlevel, using path encodings . In Fig. 21 we consider the difference between path encodings of similar shapes, that differ by a horizontal or vertical translation. Adding or removing from a path encoding in arbitrary SVG images applies the same translation to it.
Appendix I Additional animations by interpolation
We here show three additional animations, generated by DeepSVG from two usercreated drawings. DeepSVG handles well deformation, scaling and rotation of shapes, see Fig. 22.
Appendix J Additional interpolations
Finally, we present additional interpolation results in Fig. 23 using our DeepSVG – ordered model, showing successful interpolations between challenging pairs of icons, along with some failure cases.