lyng/notes/fast_ops_optimizations_plan.md

3.0 KiB

Fast Ops Optimizations Plan (Draft)

Baseline

  • See notes/nested_range_baseline.md

Candidates (not started)

  1. Primitive comparisons (done)
    • Emit fast CMP variants for known ObjString/ObjInt/ObjReal using temp/stable slots.
    • MixedCompareBenchmarkTest: 374 ms -> 347 ms.
  2. Mixed numeric ops (done)
    • Allow INT+REAL arithmetic to use primitive REAL ops (no obj fallback).
    • MixedCompareBenchmarkTest: 347 ms -> 275 ms.
  3. Boolean conversion (done; do not revert without review)
    • Skip redundant OBJ_TO_BOOL in logical AND/OR when compiler already emits BOOL.
    • MixedCompareBenchmarkTest: 275 ms -> 249 ms.
  4. Range/loop hot path (done)
    • Reuse a cached ObjVoid slot for if-statements in statement context (avoids per-iteration CONST_OBJ).
    • MixedCompareBenchmarkTest: 249 ms -> 247 ms.
  5. String ops (done)
    • Mark GET_INDEX results as stable only for closed ObjString elements to enable fast compares.
    • MixedCompareBenchmarkTest: 247 ms -> 240 ms.
  6. Box/unbox audit (done)
    • Unbox ObjInt/ObjReal in assign-op when target is INT/REAL to avoid boxing + obj ops.
    • MixedCompareBenchmarkTest: 240 ms -> 234 ms.
  7. Primitive list fill with capacity (done)
    • Extended the compiler/runtime fast path from List.fill(size) { intExpr } to List.fill(size, capacity) { intExpr }.
    • Added LIST_NEW_INT_CAP and LIST_FILL_INT_CAP so the 3-arg form keeps primitive-int storage instead of falling back to generic stdlib code.
    • OptTest.testAddToArray2: List.fill(n, n + 10) { ... } dropped from the prior anomaly (~10x slower than 2-arg fill) to the same range as List.fill(n) { ... }, roughly 56-67 ms vs 46-75 ms after warmup.
  8. Primitive list append preservation (done)
    • Fixed ObjList.add(...) to append through the primitive-aware fast path instead of forcing .list and boxing the backing storage.
    • OptTest.testAddToArray2: appending to the pre-extended list dropped from the prior anomaly (~10x slower) to sub-millisecond / low-millisecond timings (~0.05-0.16 ms for the extended list path, ~1.6-4.3 ms for the baseline path, depending on warmup).
  9. Mixed compare coverage
    • Emit CMP_*_REAL when one operand is known ObjReal in more expression forms (not just assign-op).
    • Verify with disassembly that fast cmp opcodes are emitted.
  10. Range-loop invariant hoist
  • Cache range end/step into temps once per loop; avoid repeated slot reads/boxing in body.
  • Confirm no extra CONST_OBJ in hot path.
  1. Boxing elision pass
  • Remove redundant BOX_OBJ when value feeds only primitive ops afterward (local liveness).
  • Ensure no impact on closures/escaping values.
  1. Closed-type fast paths expansion
  • Apply closed-type trust for ObjBool/ObjInt/ObjReal/ObjString in ternaries and conditional chains.
  • Guard with exact non-null temp/slot checks only.
  1. VM hot op micro-optimizations
  • Reduce frame reads/writes in ADD_INT, MUL_REAL, CMP_*_INT/REAL when operands are temps.
  • Compare against baseline; revert if regression after 10-run median.