May 30, 2024
May 31, 2024

What PowerSync Open Edition Means for Local-First

Conrad Hofmeyr

As we wrote about before, we believe that local-first is poised to eventually become the default architecture for the majority of apps, given its significant benefits for developers and end-users. To bring about this future state faster, our goal is to accelerate local-first adoption.

In 2024 so far, local-first has continued to gain momentum. More and more developers are adopting local-first architecture and evangelizing it. There is now a local-first conference, a dedicated podcast, and the community is growing.

A promising movement in pursuit of more mature tooling

When it comes to tooling that gives developers leverage in implementing local-first architecture, while there is plenty of early-stage innovation, the local-first space has not yet achieved significant and widespread maturity of tooling.

Mature tooling is valuable regardless of the type of team/organization using and deploying the tooling: For enterprises, it’s important because reliability is mission-critical and the stakes are high. For startups, it’s important because they can’t afford to waste cycles on struggling with tooling.

In light of this, we believe that PowerSync has an important role to play. Some of the hardest challenges of local-first are the distributed systems challenges, such as data consistency, concurrency and fault tolerance. With the significant R&D efforts and production battle-hardening that has gone into the PowerSync system over more than a decade, we can bring the strengths of our system to bear to meet some of these key challenges, and thereby help to accelerate the nascent local-first movement. 

As we talked about in our v1.0 introduction blog post, the team behind PowerSync has been building and refining iterative generations of our sync technology since 2009. The stand-alone PowerSync product is a spin-off from a full-stack app platform product where the sync architecture has been in production for more than a decade, and is in daily use around the world by tens of thousands of enterprise users at Fortune 500 companies in industries such as energy, manufacturing and mining. The system has been proven to provide high performance and reliability in real-world production use cases in hostile environments, with large data volumes.

That being said, while we believe we have a valuable contribution to make to the local-first community, PowerSync had two noteworthy structural shortcomings up till now: We had not yet fully opened the source code of the system, and it was not yet self-hostable (it was only available as a cloud service.)

That has now changed with the launch of our Open Edition: a source-available and self-hosted version of PowerSync. We believe that PowerSync is now better positioned to be a catalyst for the growth of local-first. This is not to say that PowerSync is the right solution for all local-first use cases, but our aim is to make a significant impact.

PowerSync design & architecture: 15 years in the making

In our PowerSync v1.0 introduction blog post, we covered some of the architectural decisions and engineering challenges behind the PowerSync system. In this post, we aim to expand on that with more color regarding the evolution of the system. 

We have learned significant lessons from the original implementation of the system and its evolution and production use over more than a decade, and have incorporated these lessons into the design and architecture of the stand-alone PowerSync product.

During this journey, we have strived to achieve a simple and robust architecture, avoiding nonessential complexity and brittleness, while making the system stable, reliable, performant and capable of handling a wide range of production workloads.

Dynamic partial replication, deduplicated for high scalability

Dynamic partial replication is known to be a thorny problem to solve: given a large primary database, how do you sync varying subsets of data with different distributed clients/replicas? Solving this in a highly scalable way compounds the challenge further. 

In the original generation of our sync system, we implemented support for dynamic partial replication and have refined it through several iterations to arrive at our current production-grade capabilities that have been battle-tested through years of real-world use.

The core scalability problem is that if you have n rows each synced to m clients, the naive approach to sync incremental changes to clients would effectively treat that as n*m individual sync operations, which is awful for scalability.

By contrast, the PowerSync architecture involves grouping data into buckets, which in simple terms allows sharing data between different users. Buckets allow us to group together changes to data in order to deduplicate work, so that the aforementioned equation becomes n*b sync operations per row, where b is the average number of buckets per row.

The bucket approach also allows for a very simple sync protocol. The client keeps track of the latest ‘operation ID’ (op_id) of each bucket, and the protocol just streams new changes (operation history) for each bucket from that point.

As a result of this architecture, PowerSync does not have to keep track of individual client-side state on the server. The system can scale to high volumes of data and users with an efficient memory footprint: PowerSync stores bucket state and operation history (indexed for efficient querying) in persistent storage (pluggable storage used by the PowerSync Service) and only stores limited information in memory: Bucket parameters for each bucket for each active connection, and data actively being sent to clients.

Another result of this architecture is that PowerSync is able to handle clients that go offline for longer periods of time (e.g. days, weeks, etc.) without adversely affecting the user experience of those clients. This is because the change data capture stream from the backend database (i.e. the WAL / logical replication stream in the case of Postgres) is preprocessed based on the bucket configuration, and arbitrary ranges of the operation history for buckets can be retrieved efficiently by the PowerSync Service.

Checksum system for data integrity

Each generation of our sync system has had integrity checks using checksums, providing an additional safety net to ensure that data is correctly in sync between the client and the server. During normal operation of the system, a checksum mismatch should never occur, but if any data corruption creeps in unexpectedly for any reason, the integrity check protects against that.

The original generations of our sync system used a somewhat complex approach: in response to a checksum failure, the system would recursively break down a range of data into smaller ranges in order to find the precise location of the checksum failure. This meant that the system could theoretically recover incrementally from checksum failures: if there was one row of data with a checksum failure, the system could locate that row and re-download only that part of the data 

In practice, this was not particularly performant. This was because operations such as adding or removing a bucket are spread over many individual operations, which effectively "fragments" the checksum failures, making the recursive checksum recovery very ineffective. Moreover, calculating checksums for different ranges of data was slow and imposed a scalability limit on the system, since it had to iterate through all data and calculate checksums for many records and sum them together. This started having a significant performance impact once we crossed the threshold of 100k synced records. 

The PowerSync system is significantly simpler: Checksums are calculated on a bucket level. When a bucket checksum fails, the system re-downloads the whole bucket. It makes the protocol simpler and in practice it’s not meaningfully slower. The checksum calculation is easier to optimize and the system scales better. For example, we can easily cache a checksum calculation without causing consistency issues.

Checkpoint system for consistency guarantees

When we spun off PowerSync as a stand-alone product, we introduced improved consistency guarantees through a checkpoint system. On the client-side, local writes are applied on top of the last consistent checkpoint received from the server and stored in an upload queue. Once the client’s upload queue is empty and the server checkpoint has been fully downloaded, the client-side state is updated to the latest consistent server-authoritative checkpoint. This means that the PowerSync system provides causal consistency guarantees, and the client-side is never in an inconsistent state.

Providing wide client platform/framework support

Another strength of PowerSync is its relatively wide set of client SDKs. At the time of publishing this blog post, we support Flutter, React Native, web, Kotlin Multiplatform (public alpha) and Swift (private alpha).

This was made possible in part due to our Rust-based SQLite extension architecture, which allowed us to implement support for more client frameworks relatively quickly. This architecture was inspired by CR-SQLite who did it before us. Using Rust also gives us some performance advantages: We can have low-level code running alongside the core SQLite, rather than building our client SDKs on top of higher-level SQLite libraries.

Looking ahead

We believe that PowerSync can play a valuable role in the local-first space by bringing a unique combination of attributes to the table: It has received substantial R&D investment to iteratively improve its architecture and design, while undergoing significant production battle-hardening. It works with any Postgres database that supports logical replication, in a non-invasive way. It supports a relatively wide range of client-side frameworks. And it is now source-available and self-hostable (while also being available as a managed cloud service for those who prefer that deployment option).

We are excited to continue to help accelerate local-first adoption. 

Feedback or questions? Join us on Discord to discuss.