With over 500 Snaps available on the SnapLogic platform, structuring a data-intensive pipeline for best performance can be an art. As the release notes for our last few releases show, we’ve been working internally to optimize performance. In doing so, we have discovered ways to improve pipeline performance.
In this blog post, we focus on tips around doing the fewest number of operations possible to achieve your goals. Minimizing operations performed may seem obvious, but even experienced pipeline builders here at SnapLogic have been known to mess this up.
What follows are some of those tips for optimizing pipeline performance, roughly arranged in increasing order of sophistication:
- Filter before join, and not after, if possible. Less data to join (one of the more expensive operations) means faster joins.
- Use a Sort Snap immediately before a Join Snap – this will help memory usage and performance.
- If you’re going to delete fields from documents, do so as soon as possible – you’ll save memory from that point forward.
- For expressions, favor simpler expressions over more complex ones (for instance, if you have a choice between a simple string split() or a regex, do the split).
- If possible, use Snaps instead of the Expression Language. For instance, if you’re branching the handling of documents, try using a router instead of a mapper with a bunch of ternaries. It’ll be easier to follow the flow, and you may save yourself a bunch of repeated code.For example, you could do a transform to handle numbers via a mapper with the expression:$value <= 0 ? Math.abs($value) : ($value % 2 == 0 ? $value * 2 : ($value % 2 == 1 ? $value + 2 : null )) .Or you could use the pipeline below, which most people think is clearer and more maintainable. You also get four threads doing the computation (one per mapper), though you do pick up overhead owing to the Union and Sort. Still, using Snaps instead of the Expression Language is a good rule to follow when possible.
- Some advanced users use a technique popular in JavaScript to pick values:expr1 || expr2 || … || exprn
This is just the OR’ing of a bunch of expressions. Generally, you see this with a function, like Date.parse(). Since a date may be represented in multiple ways, you may need to try several different representation strings to get the right one. The SnapLogic expression language, like JavaScript, has a short-circuit OR (||). That means the first expression that evaluates truthy will be taken (reading left to right), and no further execution will take place. So if you use this pattern, try to put the expressions in descending order of frequency of execution.
Please join the SnapLogic Community forum as SnapLogic users and employees provide tips and suggestions regularly. There’s an excellent chance someone in the SnapLogic community has previously solved the challenge(s) you are facing. Additionally, the Product and Engineering teams are always interested in understanding what our customers are facing and are happy to assist via the SnapLogic Community.