diff --git a/docs/bipack.md b/docs/bipack.md new file mode 100644 index 0000000..1036cb1 --- /dev/null +++ b/docs/bipack.md @@ -0,0 +1,89 @@ +# Bipack: compact binary serialization + +## Why? + +Bipack was designed with the following main goals: + +### Be as compact as possible + +For this reason it is a binary notation, it uses binary form for decimal numbers and can use variery of encoding for +integers: + +#### Varint + +Variable-length compact encoding is used internally in some cases. It uses a 0x80 bit in every byte to mark coninuation. +See `object Varint`. + +#### Smartint + +Variable-length compact encoding for signed and unsigned integers use as few bytes as possible to encode integers. It is +used automatically when serializing integers. It is slightly more sophisticated than straight `Varint`. + +### Do not reveal information about stored data + +Many extendable formats, like JSON, BSON, BOSS and may others are keeping data in key-value pairs. While it is good in +many aspets, it has a clear disadvantages: it uses more space and it reveals inner data structure to the world. It is +possible to unpack such formats with zero information about inner structure. + +Bipack does not store field names, so it is not possible to unpack or interpret it without knowledge of the data +structure. Only probablistic analysis. Let's not make life of attacker easier :) + +### Allow upgrading data structures with backward compatibility + +The dark side of serialization formats of this kind is that you can't change the structures without either loosing +backward compatibility with already serialzied data or using volumous boilerplate code to implement some sort of +versioning. + +Not to waste space and reveal more information that needed Bipack allows extending classes marked as [@Extendable] to be +extended with more data _appended to the end of list of fields with required defaul values_. For such classes Bipack +stores number of actually serialized fields and atuomatically uses default values for non-serialized ones when unpacking +old data. + +### Protect data with framing and CRC + +When needed, serialization lobrary allow to store/check CRC32 tag of the structure name with `@Framed` (can be overriden +as usual with `@SerialName`), or be followed with CRC32 of the serialized binary data, that will be checked on +deserialization, using `@CrcProtected`. This allows to check the data consistency out of the box and only where needed. + +# Usage + +Use kotlinx serializatino as usual. There are following Bipack-specific annotation at your service. All class annotations could be combined. + +## @Extendable + +Classes marked this way store number of fields. It allows to add to the class data more fields, to the end of list, with +default initializers, keeping backward compatibility. For example if you have serialized: + +```kotlin +@Serializable +@Extendable +data class foo(i: Int) +``` + +and then decided to add a field: + +```kotlin +@Serializable +@Extendable +data class foo(val i: Int, bar: String = "buzz") +``` + +It adds 1 or more bytes to the serialized data (field counts in `Varint` format) + +Bipack will properly deserialize the data serialzied for an old version. + +## @CrcProtected + +Bipack will calculate and store CRC32 of serialized data at the end, and automatically check it on deserializing throwing `InvalidFrameCRCException` if it does not match. + +It adds 4 bytes to the serialized data. + +## @Framed + +Put the CRC32 of the serializing class name (`@SerialName` allows to change it as usual) and checks it on deserializing. Throws `InvalidFrameHeaderException` if it does not match. + +It adds 4 bytes to the serialized data. + +## @Unisgned + +This __field annontation__ allows to store __integer fields__ of any size more compact by not saving the sign. Could be applyed to both signed and unsigned integers of any size.