toread/docs/fb2-import-export.md

3.3 KiB

FB2 Import/Export Specification

Scope

Toread supports FictionBook 2.0 files as plain XML (.fb2) and as a standard ZIP archive containing one FB2 XML file (.fb2.zip).

The vendored schema files live in shared/src/commonMain/resources/fb2/:

  • FictionBook.xsd
  • FictionBookGenres.xsd
  • FictionBookLang.xsd
  • FictionBookLinks.xsd

These files were copied from https://github.com/gribuser/fb2 so builds and validation references do not depend on the upstream repository remaining available.

Import

The common API is Fb2Format.parse(input: ByteArray, fileName: String? = null).

Import detection:

  • A file is treated as ZIP when its bytes start with the ZIP local-file signature PK\003\004 or the provided filename ends with .zip.
  • Otherwise bytes are decoded according to the XML declaration when supported. UTF-8 is the default, and unsupported or missing encodings fall back to UTF-8. windows-1251 is supported for legacy FB2 files.
  • In ZIP archives, the first entry ending with .fb2 is used. If no such entry exists, the first non-directory entry is used.

ZIP support:

  • Stored ZIP entries are supported on every multiplatform target.
  • Deflated ZIP entries are supported on JVM and Android through java.util.zip.Inflater.
  • Deflated ZIP entries currently fail with Fb2ParseException on JS and Wasm targets until a common/browser inflater is added.
  • ZIP64 and encrypted archives are not supported.

XML mapping:

  • description/title-info/book-title maps to Fb2Book.title.
  • description/title-info/author maps to Fb2Book.authors.
  • genre, lang, keywords, date, annotation, and sequence are imported from title-info.
  • src-lang, translator, and coverpage/image are imported from title-info.
  • description/document-info maps to Fb2DocumentInfo.
  • The first non-notes body is imported as the readable body.
  • Direct body/image, body/title, section/title, direct section/image, direct section/p, and nested section elements are preserved.
  • binary elements are imported with id, content-type, and whitespace-normalized Base64 content.
  • Image references keep their href; Fb2ImageRef.binaryId resolves #cover.jpg to cover.jpg, and Fb2Book.binaryFor(image) returns the corresponding embedded binary when present.

The importer is intentionally structural, not a full XSD validator.

Export

The common API is:

  • Fb2Format.exportXml(book: Fb2Book) for plain .fb2 XML.
  • Fb2Format.exportZip(book: Fb2Book, entryName: String = "book.fb2") for .fb2.zip.

Export behavior:

  • XML is emitted as UTF-8 FictionBook 2.0 with the FB2 and XLink namespaces.
  • Required FB2 description fields are emitted from the Fb2Book model.
  • Missing document-info fields are filled with deterministic defaults: date 1970-01-01, id toread-generated, version 1.0.
  • ZIP export uses a standard stored ZIP entry, so it is portable without requiring a common deflater.

Round-trip guarantees:

  • Imported title, authors, language, source language, translators, genres, document info, cover images, body title/images, sections, paragraph text, and binaries are represented in the model.
  • Formatting, comments, stylesheets, tables, inline style markup, cover references, and unknown FB2 extension elements are not preserved by the current model.