5.7 KiB
Non-Suspending Call Optimization Plan
Current state
Completed in the current phase:
- Higher-order lambda inlining is now metadata-driven through
CallSignature. - Built-in member methods (
let,also,apply,run,forEach,map,mapNotNull,associateBy,getOrPut) publish inline metadata at declaration sites. - Lyng extension wrappers now preserve and expose
callSignature, so extension methods such asIterable.filteruse the same inlining path as built-in members. BytecodeCompilerno longer relies on a backend hardcoded name table for these higher-order inlining cases.- JVM tests are green after the metadata move.
Primary motivation remains unchanged: suspend call overhead is still significant, and lambda inlining only removes part of it.
Constraints
- Keep source positions, stack traces, and throw-site reporting correct.
- Do not reintroduce one-off special cases tied to specific stdlib method names.
- Prefer declaration metadata and reusable compiler/runtime mechanisms.
- Preserve Kotlin Multiplatform compatibility in
commonMain. - Avoid changing public language semantics just to optimize the runtime path.
Why this is the next step
Lambda inlining helps when the callee body is directly available at the call site. The next large remaining cost is calling compiled functions through suspend entry points even when the generated body never suspends.
That suggests a second optimization track:
- detect bytecode callables that are safe to execute through a non-suspending fast path;
- route direct calls to that path when the caller can prove it is safe;
- keep the suspend path as the fallback for correctness.
Proposed phases
Phase 1: Define "non-suspending compiled callable"
Add explicit metadata on compiled functions / lambdas indicating whether their bytecode body may suspend.
Requirements:
- Computed once during bytecode generation.
- Conservative: false negatives are acceptable; false positives are not.
- Must account for:
- direct suspend-capable call opcodes;
- flow / coroutine constructs;
- delegated runtime helpers that may suspend;
- nested lambda creation if invocation may suspend.
Likely implementation direction:
- store
maySuspendorfastOnly-adjacent metadata onCmdFunctionor the callable wrapper; - derive it from emitted bytecode opcodes and embedded lambda constants.
Phase 2: Add a direct non-suspending invoke path
For bytecode callables proven non-suspending, add an execution entry point that avoids suspend machinery for ordinary direct calls.
Requirements:
- Reuse as much of the existing fast frame setup as possible.
- Keep exception translation and source mapping identical to the suspend path.
- Do not depend on JVM-only tricks.
Potential direction:
- extend
BytecodeCallablewith a capability query or richer fast-call API; - let call sites choose among:
- inline body
- non-suspending compiled call
- existing suspend call
Phase 3: Teach bytecode call sites to use it
Apply the new path only where the callee is known precisely.
Initial targets:
- direct lambda invocation where exact lambda ref is known but inlining is not possible;
- direct local function calls where the binding resolves to a compiled callable;
- extension wrapper calls where wrapper binding is known and non-suspending.
Do not start with dynamic dispatch or reflective calls.
Phase 4: Validate behavioral fidelity
Must explicitly verify:
- thrown exceptions still report the same Lyng source positions;
- stack traces remain useful enough for debugging;
- optional calls / null propagation are unchanged;
- captures and implicit
thisstill bind correctly.
Phase 5: Measure before broadening
Benchmark after each widening step, especially:
OptTest.testAddToArray- iterable pipeline samples using
filter/map - direct lambda call microbenchmarks
- closure-heavy samples with captures
Open technical questions
-
Where should non-suspending capability live?
CmdFunctionBytecodeStatement- callable wrapper object
CallSignature-adjacent metadata
-
Should the compiler emit a separate opcode for known non-suspending compiled calls, or should runtime dispatch pick the fast path from a normal call opcode?
-
Can we preserve the current error/stack behavior if we bypass suspend wrappers entirely, or do we need a thin compatibility layer?
-
Should capture-free and capture-heavy compiled lambdas share the same direct-call mechanism, or should captured callables stay on the safer path initially?
Suggested order of execution
- Add conservative
maySuspendanalysis for compiled bytecode functions. - Expose a non-suspending direct-call capability on compiled callables.
- Use it for exact direct lambda calls first.
- Extend to exact local function calls.
- Re-measure.
- Only then consider broader dispatch sites.
Validation checklist
./gradlew :lynglib:compileKotlinJvm --console=plain./gradlew :lynglib:jvmTest --tests net.sergeych.lyng.OptTest.testAddToArray --console=plain./gradlew :lynglib:jvmTest --tests StdlibTest.testIterableFilter --tests CompilerVmReviewRegressionTest --console=plain./gradlew :lynglib:jvmTest --console=plain
Notes from the completed phase
Relevant current commits before this follow-up work:
3be2892Use fast compiled callbacks in dynamic and flow helpers1d5caaaBroaden lambda method inlining with captures0c3242cGeneralize higher-order lambda inliningf4ab2ebExtend lambda inlining to getOrPut and implicit it calls
Current working tree phase adds:
- metadata-driven higher-order inlining through member and extension signatures;
- extension wrapper signature propagation;
- removal of the compiler-side higher-order name table fallback.