PowerSync is an automatic sync engine for syncing data between different kinds of backend databases and SQLite. This is enabled by a custom protocol designed to correctly capture source database state, with checks at critical steps of the sync process enabling our consistency model.
PowerSync offers SDKs for popular programming languages and historically, each of our SDKs implemented most of this protocol themselves. That initially worked quite well for us, and it’s nice that we’re able to integrate with networking libraries available on the target stack (allowing e.g. the PowerSync Kotlin SDK to reuse a ktor client configured by the app developer).
However, the SDK implementations were not without issues:
- We have features on our roadmap requiring protocol additions, and we’d like to avoid updating four different client implementations each time.
- While we support both a text (JSON) and a binary (BSON) protocol, we prefer BSON for efficiency. Not all of our SDK targets have a good BSON library available though.
- In particular for React Native, the JavaScript implementation wasn’t as performant as we’d like. Profiling React Native is also quite challenging, with the official suggestions not yielding anything more helpful than a “yup, your JavaScript app is indeed busy running JavaScript”.
It was clear that we’d have to do something about this. While a shared client is the obvious solution, the exact details are less clear:
- We want to continue being good citizens of the platforms we build on, instead of e.g. linking a full HTTP client as part of our SDK and ignoring the underlying ecosystem.
- A key part of PowerSync is that it’s “just SQLite”, and it needs to work with a wide range of SQLite libraries. If we write a native client, we’d have to write bindings for each target SDK using e.g. JNI, %%dart:ffi%% or JSI.
While the first point prevents us from sharing anything that requires IO, it doesn’t mean that we can’t share any logic between our SDKs. We’re already using a custom SQLite extension written in Rust to define a few virtual tables and SQL helper functions, so our idea was to expand that extension to include a client state machine for the PowerSync protocol. The extension would be responsible for reacting to data received from the network and writing changes to the database, while SDKs open the HTTP stream and forward data into the client.
Architecture
Our SQLite extension is a Rust crate. To make it as easy as possible to load it on different platforms (we need to, for instance, support WebAssembly without Emscripten or any other JS glue code), we are quite strict about reducing external dependencies as much as possible. The crate is %%no_std%%, and we rely on SQLite’s allocator, mutex, and randomness implementations where necessary.
We keep the client’s state in-memory in Rust, while relying on our SDKs to forward external events (like a new chunk of sync data received from the network). To avoid new custom binding code, the state machine is driven by a SQLite function we call %%powersync_control%%:
- It is invoked with parameters encoding an event, e.g %%powersync_control('binary_line', <blob>)%% indicating new binary data from the network.
- The function returns a JSON array of instructions for the client SDK, such as:
- Persisting changes and updating user tables based on received data.
- Updating user-visible progress counters or other sync state information.
- Validating checksums received from the protocol.
- Emitting errors and reconnecting after a delay.
Most of the Rust client was a fairly straightforward port of the other clients. Decoding BSON was a fun issue: Since the %%bson%% crate does not support %%no_std%% environments, we wrote our own parser as a %%serde%% deserializer.
We have adopted the Rust client as an option in our public SDKs, with relatively few changes required for that. After all, we had a client in there already and just had to rewire things to forward lines to the extension instead of parsing it in the SDK. Importantly, we didn’t have to change any part of the build logic or configure new FFI/interop bridges.
Since the public API of our Rust client is a SQL function, we get to keep using our existing SQLite libraries.
Results
One of the most immediate benefits we noticed with the Rust client was a modest performance increase on React Native: Depending on the exact scenario, the new client is around 5 times faster than the JavaScript-based implementation! While that may not sound like much to developers used to JavaScript generally being slow, a more convincing metric may be that, in isolated benchmarks, we now spend almost all of the time actually inserting data. So the performance increase is a reduction of parsing overhead, and we’re about as fast as SQLite can write to disk now.
After a deeper profiling analysis on the original JS client, we learned that we spend almost all of the time decoding UTF-8 (using a polyfill, because Hermes doesn’t implement %%TextDecoder%%). This is terrible of course, but it also validates the idea of moving that logic to Rust - there simply isn’t a way to fix that in JS without introducing native dependencies.
There is still a considerable performance increase outside of React Native and our other SDKs though.
Another benefit of the core extension being written in IO-free Rust is that it’s much easier to test. Testing the sync process on our SDK requires mocking the HTTP client delivering data. Also, since the SDKs react to new data asynchronously, tests require an “add line“, “wait for next observable event from SDK”, “assert state” pattern that is more complex to write assertions for than the synchronous %%powersync_control%% function. So while we still have SDK tests to ensure that they’re using the Rust client correctly, this style of testing allows us to be more thorough and gives us more confidence in our sync client.
Some regressions
The Rust client has been available as an experimental option on all our SDKs for a while now, and we’re working towards making it stable and the default option (which will also make our SDKs a bit smaller, which is especially relevant on the web). During testing, we encountered one interesting issue on the web: Sometimes, sync sessions would crash for seemingly no reason. While everything works up to a certain point, our SQLite extension suddenly thinks there’s no active session to use. A silent and platform-specific error in deterministic Rust code that only appears in some cases. What’s up with that?
Eventually, we realized this is a consequence of how we bind to SQLite on the web. We use the amazing WA-sqlite package to be able to ship different storage implementations depending on what the current browser supports. One problem solved by that library is that, while most storage APIs in the web are asynchronous, SQLite, being a C library, expects a synchronous file system. While there are tools to “asyncify” existing C code, those generally cause painful slowdowns.
Instead, WA-sqlite uses a clever hack that relies on the fact that prepared statements are, in a way, asynchronous state machines already: Every time that you call sqlite3_step(), the statement’s internal cursors are updated to find a new row. So when WA-sqlite finds that it needs to do asynchronous filesystem work in the middle of a statement, it returns a retryable error code from the low-level filesystem shim that SQLite bubbles up without permanently disrupting the statement. At the higher levels in JavaScript, that error is recognized, the internal promise is awaited and the step call is transparently retried.
While this works really well with regular SQL, our trick to expose stateful Rust code as SQLite functions has messed the whole thing up! There are plenty of places in the client where we’re stepping through statements in Rust. When those encounter an error, we’d reset our session and forward the error to the client. While this error handling strategy works great in normal circumstances, it breaks for non-fatal errors where we should be retrying instead. But of course, we can’t retry in Rust because we’re writing synchronous code completely unaware about the outer JavaScript promise. Instead, we will have to behave just like SQLite: Allowing our operations to get interrupted in the middle and then having the outer WA-sqlite library retry. Luckily, most of the sync client has the natural form of a state machine already. So what we ended up doing is:
- When we receive new data, we first persist it to the database.
- Only when that has worked do we update the in-memory state.
In most languages, it would be tricky to be vigilant about this. In Rust, it’s almost trivial! The first step receives an immutable reference to the in-memory state, so it literally can’t get this wrong. It then returns a description of the state machine transition, which the second step applies if everything looks good. This ensures that, even if we hit errors internally, we don’t lose track of the overall state and can retry the exact operation again. With that refactored, our web client was working just as well as the other platforms.