147 lines
5.7 KiB
Markdown
147 lines
5.7 KiB
Markdown
# Non-Suspending Call Optimization Plan
|
|
|
|
## Current state
|
|
|
|
Completed in the current phase:
|
|
|
|
- Higher-order lambda inlining is now metadata-driven through `CallSignature`.
|
|
- Built-in member methods (`let`, `also`, `apply`, `run`, `forEach`, `map`, `mapNotNull`, `associateBy`, `getOrPut`) publish inline metadata at declaration sites.
|
|
- Lyng extension wrappers now preserve and expose `callSignature`, so extension methods such as `Iterable.filter` use the same inlining path as built-in members.
|
|
- `BytecodeCompiler` no longer relies on a backend hardcoded name table for these higher-order inlining cases.
|
|
- JVM tests are green after the metadata move.
|
|
|
|
Primary motivation remains unchanged: suspend call overhead is still significant, and lambda inlining only removes part of it.
|
|
|
|
## Constraints
|
|
|
|
- Keep source positions, stack traces, and throw-site reporting correct.
|
|
- Do not reintroduce one-off special cases tied to specific stdlib method names.
|
|
- Prefer declaration metadata and reusable compiler/runtime mechanisms.
|
|
- Preserve Kotlin Multiplatform compatibility in `commonMain`.
|
|
- Avoid changing public language semantics just to optimize the runtime path.
|
|
|
|
## Why this is the next step
|
|
|
|
Lambda inlining helps when the callee body is directly available at the call site.
|
|
The next large remaining cost is calling compiled functions through suspend entry points even when the generated body never suspends.
|
|
|
|
That suggests a second optimization track:
|
|
|
|
1. detect bytecode callables that are safe to execute through a non-suspending fast path;
|
|
2. route direct calls to that path when the caller can prove it is safe;
|
|
3. keep the suspend path as the fallback for correctness.
|
|
|
|
## Proposed phases
|
|
|
|
### Phase 1: Define "non-suspending compiled callable"
|
|
|
|
Add explicit metadata on compiled functions / lambdas indicating whether their bytecode body may suspend.
|
|
|
|
Requirements:
|
|
|
|
- Computed once during bytecode generation.
|
|
- Conservative: false negatives are acceptable; false positives are not.
|
|
- Must account for:
|
|
- direct suspend-capable call opcodes;
|
|
- flow / coroutine constructs;
|
|
- delegated runtime helpers that may suspend;
|
|
- nested lambda creation if invocation may suspend.
|
|
|
|
Likely implementation direction:
|
|
|
|
- store `maySuspend` or `fastOnly`-adjacent metadata on `CmdFunction` or the callable wrapper;
|
|
- derive it from emitted bytecode opcodes and embedded lambda constants.
|
|
|
|
### Phase 2: Add a direct non-suspending invoke path
|
|
|
|
For bytecode callables proven non-suspending, add an execution entry point that avoids suspend machinery for ordinary direct calls.
|
|
|
|
Requirements:
|
|
|
|
- Reuse as much of the existing fast frame setup as possible.
|
|
- Keep exception translation and source mapping identical to the suspend path.
|
|
- Do not depend on JVM-only tricks.
|
|
|
|
Potential direction:
|
|
|
|
- extend `BytecodeCallable` with a capability query or richer fast-call API;
|
|
- let call sites choose among:
|
|
- inline body
|
|
- non-suspending compiled call
|
|
- existing suspend call
|
|
|
|
### Phase 3: Teach bytecode call sites to use it
|
|
|
|
Apply the new path only where the callee is known precisely.
|
|
|
|
Initial targets:
|
|
|
|
- direct lambda invocation where exact lambda ref is known but inlining is not possible;
|
|
- direct local function calls where the binding resolves to a compiled callable;
|
|
- extension wrapper calls where wrapper binding is known and non-suspending.
|
|
|
|
Do not start with dynamic dispatch or reflective calls.
|
|
|
|
### Phase 4: Validate behavioral fidelity
|
|
|
|
Must explicitly verify:
|
|
|
|
- thrown exceptions still report the same Lyng source positions;
|
|
- stack traces remain useful enough for debugging;
|
|
- optional calls / null propagation are unchanged;
|
|
- captures and implicit `this` still bind correctly.
|
|
|
|
### Phase 5: Measure before broadening
|
|
|
|
Benchmark after each widening step, especially:
|
|
|
|
- `OptTest.testAddToArray`
|
|
- iterable pipeline samples using `filter` / `map`
|
|
- direct lambda call microbenchmarks
|
|
- closure-heavy samples with captures
|
|
|
|
## Open technical questions
|
|
|
|
1. Where should non-suspending capability live?
|
|
- `CmdFunction`
|
|
- `BytecodeStatement`
|
|
- callable wrapper object
|
|
- `CallSignature`-adjacent metadata
|
|
|
|
2. Should the compiler emit a separate opcode for known non-suspending compiled calls, or should runtime dispatch pick the fast path from a normal call opcode?
|
|
|
|
3. Can we preserve the current error/stack behavior if we bypass suspend wrappers entirely, or do we need a thin compatibility layer?
|
|
|
|
4. Should capture-free and capture-heavy compiled lambdas share the same direct-call mechanism, or should captured callables stay on the safer path initially?
|
|
|
|
## Suggested order of execution
|
|
|
|
1. Add conservative `maySuspend` analysis for compiled bytecode functions.
|
|
2. Expose a non-suspending direct-call capability on compiled callables.
|
|
3. Use it for exact direct lambda calls first.
|
|
4. Extend to exact local function calls.
|
|
5. Re-measure.
|
|
6. Only then consider broader dispatch sites.
|
|
|
|
## Validation checklist
|
|
|
|
- `./gradlew :lynglib:compileKotlinJvm --console=plain`
|
|
- `./gradlew :lynglib:jvmTest --tests net.sergeych.lyng.OptTest.testAddToArray --console=plain`
|
|
- `./gradlew :lynglib:jvmTest --tests StdlibTest.testIterableFilter --tests CompilerVmReviewRegressionTest --console=plain`
|
|
- `./gradlew :lynglib:jvmTest --console=plain`
|
|
|
|
## Notes from the completed phase
|
|
|
|
Relevant current commits before this follow-up work:
|
|
|
|
- `3be2892` Use fast compiled callbacks in dynamic and flow helpers
|
|
- `1d5caaa` Broaden lambda method inlining with captures
|
|
- `0c3242c` Generalize higher-order lambda inlining
|
|
- `f4ab2eb` Extend lambda inlining to getOrPut and implicit it calls
|
|
|
|
Current working tree phase adds:
|
|
|
|
- metadata-driven higher-order inlining through member and extension signatures;
|
|
- extension wrapper signature propagation;
|
|
- removal of the compiler-side higher-order name table fallback.
|