docs for bipack format
This commit is contained in:
parent
569d62faa9
commit
2cd68276d4
89
docs/bipack.md
Normal file
89
docs/bipack.md
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
# Bipack: compact binary serialization
|
||||||
|
|
||||||
|
## Why?
|
||||||
|
|
||||||
|
Bipack was designed with the following main goals:
|
||||||
|
|
||||||
|
### Be as compact as possible
|
||||||
|
|
||||||
|
For this reason it is a binary notation, it uses binary form for decimal numbers and can use variery of encoding for
|
||||||
|
integers:
|
||||||
|
|
||||||
|
#### Varint
|
||||||
|
|
||||||
|
Variable-length compact encoding is used internally in some cases. It uses a 0x80 bit in every byte to mark coninuation.
|
||||||
|
See `object Varint`.
|
||||||
|
|
||||||
|
#### Smartint
|
||||||
|
|
||||||
|
Variable-length compact encoding for signed and unsigned integers use as few bytes as possible to encode integers. It is
|
||||||
|
used automatically when serializing integers. It is slightly more sophisticated than straight `Varint`.
|
||||||
|
|
||||||
|
### Do not reveal information about stored data
|
||||||
|
|
||||||
|
Many extendable formats, like JSON, BSON, BOSS and may others are keeping data in key-value pairs. While it is good in
|
||||||
|
many aspets, it has a clear disadvantages: it uses more space and it reveals inner data structure to the world. It is
|
||||||
|
possible to unpack such formats with zero information about inner structure.
|
||||||
|
|
||||||
|
Bipack does not store field names, so it is not possible to unpack or interpret it without knowledge of the data
|
||||||
|
structure. Only probablistic analysis. Let's not make life of attacker easier :)
|
||||||
|
|
||||||
|
### Allow upgrading data structures with backward compatibility
|
||||||
|
|
||||||
|
The dark side of serialization formats of this kind is that you can't change the structures without either loosing
|
||||||
|
backward compatibility with already serialzied data or using volumous boilerplate code to implement some sort of
|
||||||
|
versioning.
|
||||||
|
|
||||||
|
Not to waste space and reveal more information that needed Bipack allows extending classes marked as [@Extendable] to be
|
||||||
|
extended with more data _appended to the end of list of fields with required defaul values_. For such classes Bipack
|
||||||
|
stores number of actually serialized fields and atuomatically uses default values for non-serialized ones when unpacking
|
||||||
|
old data.
|
||||||
|
|
||||||
|
### Protect data with framing and CRC
|
||||||
|
|
||||||
|
When needed, serialization lobrary allow to store/check CRC32 tag of the structure name with `@Framed` (can be overriden
|
||||||
|
as usual with `@SerialName`), or be followed with CRC32 of the serialized binary data, that will be checked on
|
||||||
|
deserialization, using `@CrcProtected`. This allows to check the data consistency out of the box and only where needed.
|
||||||
|
|
||||||
|
# Usage
|
||||||
|
|
||||||
|
Use kotlinx serializatino as usual. There are following Bipack-specific annotation at your service. All class annotations could be combined.
|
||||||
|
|
||||||
|
## @Extendable
|
||||||
|
|
||||||
|
Classes marked this way store number of fields. It allows to add to the class data more fields, to the end of list, with
|
||||||
|
default initializers, keeping backward compatibility. For example if you have serialized:
|
||||||
|
|
||||||
|
```kotlin
|
||||||
|
@Serializable
|
||||||
|
@Extendable
|
||||||
|
data class foo(i: Int)
|
||||||
|
```
|
||||||
|
|
||||||
|
and then decided to add a field:
|
||||||
|
|
||||||
|
```kotlin
|
||||||
|
@Serializable
|
||||||
|
@Extendable
|
||||||
|
data class foo(val i: Int, bar: String = "buzz")
|
||||||
|
```
|
||||||
|
|
||||||
|
It adds 1 or more bytes to the serialized data (field counts in `Varint` format)
|
||||||
|
|
||||||
|
Bipack will properly deserialize the data serialzied for an old version.
|
||||||
|
|
||||||
|
## @CrcProtected
|
||||||
|
|
||||||
|
Bipack will calculate and store CRC32 of serialized data at the end, and automatically check it on deserializing throwing `InvalidFrameCRCException` if it does not match.
|
||||||
|
|
||||||
|
It adds 4 bytes to the serialized data.
|
||||||
|
|
||||||
|
## @Framed
|
||||||
|
|
||||||
|
Put the CRC32 of the serializing class name (`@SerialName` allows to change it as usual) and checks it on deserializing. Throws `InvalidFrameHeaderException` if it does not match.
|
||||||
|
|
||||||
|
It adds 4 bytes to the serialized data.
|
||||||
|
|
||||||
|
## @Unisgned
|
||||||
|
|
||||||
|
This __field annontation__ allows to store __integer fields__ of any size more compact by not saving the sign. Could be applyed to both signed and unsigned integers of any size.
|
Loading…
Reference in New Issue
Block a user