bendy: Enforcing Correctness for Bencode Serialization

Today marks the day of our first Open Source release: bendy for Rust.

Bencode is a simple but very effective encoding scheme, originating with the BitTorrent peer-to-peer system. We’re not the first to implement Bencode. In fact there’s several implementations already: Arjan Topolovec’s rust-bencode, Murarth’s bencode, and Jonas Hermsmeier’s rust-bencode to name just a few of the Rust implementations.

Why Another Bencode Library?

So why the extra work adding yet-another-version of a thing that already exists, you might ask?

Enforce Correctness

The most important aspect to us is correctness. At P3KI we’re very careful, both in writing our own implementations as well as with selecting third party libraries because we’ve actually learned this the hard way:

In a very early proof of concept version of our software (long since retired), we’ve encountered a subtle incorrectness at a low level that didn’t show itself unless you triggered specific behavior in a specific order, all because of tiny ambiguities that added up. All would go exceptionally well for smaller scenarios that we used to explain the basic concepts of our solution to our audience at a conference. The demo worked well despite the fact it had as many moving parts as it had (all in all five seperate computers connected over network). Until we arrived at the last and most juicy piece of the demo: scaling up! This amplified effects that at first were not visible and it increased the probability of running into a minor data inconsitency which ultimately would present us and everyone in the audience with the thing most dreaded by all Java programmers:

NullPointerException and stack trace. Live. On stage.

Mind you, the software still presented valid results most of the time. But everyone knows what a stack trace looks like and that’s the only thing everyone sees.

It was a most humbling experience. Hunting this issue down and the related post mortem ate up serious resources but we’ve learned a lot and it directly informed our approach to developing software going forward. On the one hand we ditched Java for a language offering better guarantees where we needed them (hey there, Rust), on the other hand we’ve established principles that strongly direct us to arrive at software that’s correct from the lowest levels up.

But storytime aside, back to bendy the Bencode library: Other Bencode libraries either rely on the user to hand the library data that’s already in the correct order or simply and quietly sort it themselves. They are either teetering on the edge between correctness and unexpected behavior, or wasting memory and processing time. Where actually they should enforce a singular, correct form to not allow ambiguities.

P3KI’s bendy offers both: more confidence in correct behavior of the library and also defined behavior regarding memory usage.

Reduce Chances of Signature Invalidations

Bendy ensures that any de-serialize / serialize roundtrip produces the exact same and correct binary representation. This is relevant if you’re dealing with sets or map-structured data where theoretically the order is not relevant, but in practice it is. Especially if you want to ensure that signatures related to the data structure do not get invalidated accidentially.

Bendy saves us enough cycles on skipping unnecessary serialization and signature operations that it’s more than worth investing into ensuring a correct canonical representation.

Reduce Network Churn

At P3KI we’re handling a lot of signed distributed data structures which we’re able to partially de-serialize if we only need to inspect a subset of their content. If you were to partially de-serialize and then re-serialize a data structure and its representation changes despite the contents staying the same, you would not just ruin a perfectly good signature, you’d also need to re-publish your whole data structure.

Bendy ensures you that if you did not change the contents, the representation stays the same. Therefore bendy allows us to safely re-publish partially changed subsets of our data structures without compromising the correctness of the overall structure which greatly reduces network churn.

The How

Implementing a canonical encoding form is straight forward. It comes down to defining a proper way of handling unordered data.

In our case, bendy sorts data before encoding it using the regular Bencode rules. If your data is already sorted bendy will of course skip the extra sorting step to gain efficiency.

But bendy goes a step further to ensure correctness: If you hand the library data that you say is already sorted, bendy still does an in-place verification to ensure that your data actually is sorted and complains if it isn’t.

In the end, once bendy serialized your data, it’s Bencode through and through. So it’s perfectly compatible with every other Bencode library. Just remember: only bendy enforces the correctness of the canonical format if you read it back in.

To give you a headstart with the library, we’ve also included examples for serialization and de-serialization. And since Bencode originated with BitTorrent, we only thought it proper to use torrent files as examples.

Road Ahead

Bendy’s very strict enforcement of correctness might prohibit some more generic use-cases that are okay with relaxed parsing rules. The next version of bendy will include parsing relaxation options.

Conclusion

If you’re dealing with similar data structures as we are or if your OCD demands all data to be nice and ordered, bendy is for you!