221 lines
7.2 KiB
Markdown
221 lines
7.2 KiB
Markdown
# Binary tools and BiPack serializer
|
|
|
|
Multiplatform binary tools collection, including portable serialization of the compact and fast [Bipack] format, and
|
|
many useful tools to work with binary data, like CRC family checksums, dumps, etc. It works well also in the browser and
|
|
in native targets.
|
|
|
|
# Recent changes
|
|
|
|
- 0.1.6 add many useful features, added support to wasmJS and all other platforms. Note to wasmJS: it appears to be a bug in wasm compiler so BipackDecoder could cause wasm loading problem.
|
|
|
|
- 0.1.1: added serialized KVStorage with handy implementation on JVM and JS platforms and some required synchronization
|
|
tools.
|
|
-
|
|
- 0.1.0: uses modern kotlin 1.9.*, fixes problem with singleton or empty/object serialization
|
|
|
|
The last 1.8-based version is 0.0.8. Some fixes are not yet backported to it pls leave an issue of needed.
|
|
|
|
# Usage
|
|
|
|
Add our maven:
|
|
|
|
```kotlin
|
|
repositories {
|
|
// ...
|
|
maven("https://gitea.sergeych.net/api/packages/SergeychWorks/maven")
|
|
}
|
|
```
|
|
|
|
And add dependency to the proper place in your project like this:
|
|
|
|
```kotlin
|
|
dependencies {
|
|
// ...
|
|
implementation("net.sergeych:mp_bintools:0.1.0")
|
|
}
|
|
```
|
|
|
|
## Calculating CRCs:
|
|
|
|
~~~kotlin
|
|
CRC.crc32("Hello".encodeToByteArray())
|
|
CRC.crc16("happy".encodeToByteArray())
|
|
CRC.crc8("world".encodeToByteArray())
|
|
~~~
|
|
|
|
## Binary effective serialization with Bipack:
|
|
|
|
~~~kotlin
|
|
@Serializable
|
|
data class Foo(val bar: String,buzz: Int)
|
|
|
|
val foo = Foo("bar", 42)
|
|
val bytes = BipackEncoder.encode(foo)
|
|
val bar: Foo = BipackDecoder.decode(bytes)
|
|
assertEquals(foo, bar)
|
|
~~~
|
|
|
|
## Bipack-based auto-serializing storage:
|
|
|
|
Allows easily storing whatever `@Serializable` data type using delegates
|
|
and more:
|
|
|
|
~~~kotlin
|
|
val storage = defaultNamedStorage("test_mp_bintools")
|
|
|
|
var foo by s1("unknown") // default value makes it a String
|
|
foo = "bar"
|
|
|
|
// nullable:
|
|
var answer: Int? by storage.optStored()
|
|
answer = 42
|
|
|
|
s1.delete("foo")
|
|
~~~
|
|
|
|
## MotherPacker
|
|
|
|
This conception allows switching encoding on the fly. Create some MotherPacker instance
|
|
and pass it to your encoding/decoding code:
|
|
|
|
~~~kotlin
|
|
@Serializable
|
|
data class FB1(val foo: Int,val bar: String)
|
|
|
|
// This is JSON implementation of MotherPacker:
|
|
val mp = JsonPacker()
|
|
// it packs and unpacks to JSON:
|
|
println(mp.pack(mapOf("foo" to 42)).decodeToString())
|
|
assertEquals("""{"foo":42}""", mp.pack(mapOf("foo" to 42)).decodeToString())
|
|
val x = mp.unpack<FB1>("""{"foo":42, "bar": "foo"}""".encodeToByteArray())
|
|
assertEquals(42, x.foo)
|
|
assertEquals("foo", x.bar)
|
|
~~~
|
|
|
|
There is also [MotherBipack] `MotherPacker` implementation using Bipack. You can add more formats
|
|
easily by implementing `MotherPacker` interface.
|
|
|
|
# Bipack
|
|
|
|
## Why?
|
|
|
|
Bipack is a compact and efficient binary serialization library (and format) was designed with the following main goals:
|
|
|
|
### Allow easy unpacking existing binary structures
|
|
|
|
Yuo describe your structure as `@Serializable` classes, and - voilà, bipack decodes and encodes it for you! We aim to make it really easy to convert data from other binary formats by adding more format annotations
|
|
|
|
### Be as compact as possible
|
|
|
|
For this reason it is a binary notation, it uses binary form for decimal numbers and can use a variety of encoding for
|
|
integers:
|
|
|
|
#### Varint
|
|
|
|
Variable-length compact encoding is used internally in some cases. It uses a 0x80 bit in every byte to mark continuation.
|
|
See `object Varint`.
|
|
|
|
#### Smartint
|
|
|
|
Variable-length compact encoding for signed and unsigned integers uses as few bytes as possible to encode integers. It is used automatically when serializing integers. It is slightly more sophisticated than straight `Varint`.
|
|
|
|
### Do not reveal information about stored data
|
|
|
|
Many extendable formats, like JSON, BSON, BOSS and may others are keeping data in key-value pairs. While it is good in many aspects, it has some disadvantages: it uses more space, and it reveals inner data structure to the world. It is possible to unpack such formats with zero information about inner structure.
|
|
|
|
Bipack does not store field names, so it is not possible to unpack or interpret it without knowledge of the data structure. Only probabilistic analysis. Let's not make the life of attacker easier :)
|
|
|
|
### -- allows upgrading data structures with backward compatibility
|
|
|
|
The serialization formats of this kind have a dark side: you can't change the structures without either losing backward compatibility with already serialized data or using voluminous boilerplate code to implement some sort of versioning.
|
|
|
|
Not to waste space
|
|
and reveal more information that needed Bipack allows
|
|
extending classes marked
|
|
as [@Extendable] to be extended with more data _appended to the end of the field list with required default values_.
|
|
For such classes,
|
|
Bipack stores the number of actually serialized fields
|
|
and automatically uses default values for non-serialized ones when unpacking old data.
|
|
|
|
### Protect data with framing and CRC
|
|
|
|
When needed,
|
|
a serialization library allow to store/check CRC32 tag of the structure name with
|
|
`@Framed` (can be overridden as usual with `@SerialName`), or be followed with CRC32 of the serialized binary data, that will be checked on deserialization, using `@CrcProtected`. This allows checking the data consistency out of the box and only where needed.
|
|
|
|
# Usage
|
|
|
|
Use `kotlinx.serialization` as usual. There are the following Bipack-specific annotations at your disposal (can be
|
|
combined):
|
|
|
|
## @Extendable
|
|
|
|
Classes marked this way store number of fields. It allows adding to the class data more fields, to the end of the list, with
|
|
default initializers, keeping backward compatibility. For example, if you have serialized:
|
|
|
|
```kotlin
|
|
@Serializable
|
|
@Extendable
|
|
data class foo(val i: Int)
|
|
```
|
|
|
|
and then decided to add a field:
|
|
|
|
```kotlin
|
|
@Serializable
|
|
@Extendable
|
|
data class foo(val i: Int, val bar: String = "buzz")
|
|
```
|
|
|
|
It adds one or more bytes to the serialized data (field counts in `Varint` format)
|
|
|
|
Bipack will properly deserialize the data serialized for an old version.
|
|
|
|
## @CrcProtected
|
|
|
|
Bipack will calculate and store CRC32 of serialized data at the end, and automatically check it on deserializing
|
|
throwing `InvalidFrameCRCException` if it does not match.
|
|
|
|
It adds four bytes to the serialized data.
|
|
|
|
## @Framed
|
|
|
|
Put the CRC32 of the serializing class name (`@SerialName` allows to change it as usual) and checks it on deserializing.
|
|
Throws `InvalidFrameHeaderException` if it does not match.
|
|
|
|
It adds four bytes to the serialized data.
|
|
|
|
## @Unsigned
|
|
|
|
This __field annotation__ allows to store __integer fields__ of any size more compact by not saving the sign. It could be applied to both signed and unsigned integers of any size.
|
|
|
|
## @FixedSize(size)
|
|
|
|
Use it with fixed-size collections (like hashes, keys, etc.) to not keep collection size in the packed binary. It saves
|
|
at least one byte.
|
|
|
|
## @Fixed
|
|
|
|
Can be used with any integer type to store/restore it as is, fixed-size, big-endian:
|
|
|
|
- Short, UShort: 2 bytes
|
|
- Int, UInt: 4 bytes
|
|
- Long, ULong: 8 bytes
|
|
|
|
Note that without this modifier all integers are serialized into variable-length compressed format, see class [Smartint]
|
|
from this library.
|
|
|
|
Example:
|
|
|
|
~~~kotlin
|
|
@Serializable
|
|
class Foo(
|
|
@Fixed
|
|
val eightBytesLongInt: Long
|
|
)
|
|
|
|
// so:
|
|
assertEquals("00 00 00 01 00 00 00 02", BipackEncoder.encode(Foo(0x100000002)).encodeToHex())
|
|
~~~
|
|
|