toread/docs/fb2-import-export.md

65 lines
3.3 KiB
Markdown

# FB2 Import/Export Specification
## Scope
Toread supports FictionBook 2.0 files as plain XML (`.fb2`) and as a standard ZIP archive containing one FB2 XML file (`.fb2.zip`).
The vendored schema files live in `shared/src/commonMain/resources/fb2/`:
- `FictionBook.xsd`
- `FictionBookGenres.xsd`
- `FictionBookLang.xsd`
- `FictionBookLinks.xsd`
These files were copied from `https://github.com/gribuser/fb2` so builds and validation references do not depend on the upstream repository remaining available.
## Import
The common API is `Fb2Format.parse(input: ByteArray, fileName: String? = null)`.
Import detection:
- A file is treated as ZIP when its bytes start with the ZIP local-file signature `PK\003\004` or the provided filename ends with `.zip`.
- Otherwise bytes are decoded according to the XML declaration when supported. UTF-8 is the default, and unsupported or missing encodings fall back to UTF-8. `windows-1251` is supported for legacy FB2 files.
- In ZIP archives, the first entry ending with `.fb2` is used. If no such entry exists, the first non-directory entry is used.
ZIP support:
- Stored ZIP entries are supported on every multiplatform target.
- Deflated ZIP entries are supported on JVM and Android through `java.util.zip.Inflater`.
- Deflated ZIP entries currently fail with `Fb2ParseException` on JS and Wasm targets until a common/browser inflater is added.
- ZIP64 and encrypted archives are not supported.
XML mapping:
- `description/title-info/book-title` maps to `Fb2Book.title`.
- `description/title-info/author` maps to `Fb2Book.authors`.
- `genre`, `lang`, `keywords`, `date`, `annotation`, and `sequence` are imported from `title-info`.
- `src-lang`, `translator`, and `coverpage/image` are imported from `title-info`.
- `description/document-info` maps to `Fb2DocumentInfo`.
- The first non-notes `body` is imported as the readable body.
- Direct `body/image`, `body/title`, `section/title`, direct `section/image`, direct `section/p`, and nested `section` elements are preserved.
- `binary` elements are imported with `id`, `content-type`, and whitespace-normalized Base64 content.
- Image references keep their `href`; `Fb2ImageRef.binaryId` resolves `#cover.jpg` to `cover.jpg`, and `Fb2Book.binaryFor(image)` returns the corresponding embedded binary when present.
The importer is intentionally structural, not a full XSD validator.
## Export
The common API is:
- `Fb2Format.exportXml(book: Fb2Book)` for plain `.fb2` XML.
- `Fb2Format.exportZip(book: Fb2Book, entryName: String = "book.fb2")` for `.fb2.zip`.
Export behavior:
- XML is emitted as UTF-8 FictionBook 2.0 with the FB2 and XLink namespaces.
- Required FB2 description fields are emitted from the `Fb2Book` model.
- Missing `document-info` fields are filled with deterministic defaults: date `1970-01-01`, id `toread-generated`, version `1.0`.
- ZIP export uses a standard stored ZIP entry, so it is portable without requiring a common deflater.
Round-trip guarantees:
- Imported title, authors, language, source language, translators, genres, document info, cover images, body title/images, sections, paragraph text, and binaries are represented in the model.
- Formatting, comments, stylesheets, tables, inline style markup, cover references, and unknown FB2 extension elements are not preserved by the current model.