JVM multithreaded scope pool now turned on by default

2025-11-10 23:08:58 +01:00 · 2025-11-10 23:08:58 +01:00 · 2af5852d44
commit 2af5852d44
parent 38c1b3c209
16 changed files with 490 additions and 35 deletions
--- a/docs/perf_guide.md
+++ b/docs/perf_guide.md
@ -17,7 +17,7 @@ All flags are `var` and can be flipped at runtime (e.g., from tests or host apps
 - `EMIT_FAST_LOCAL_REFS` — Compiler emits `FastLocalVarRef` for identifiers known to be locals/params (ON JVM default).
 - `ARG_BUILDER` — Efficient argument building: small‑arity no‑alloc and pooled builder on JVM (ON JVM default).
 - `SKIP_ARGS_ON_NULL_RECEIVER` — Early return on optional‑null receivers before building args (semantics‑compatible). A/B only.
- `SCOPE_POOL` — Scope frame pooling for calls (JVM‑first). OFF by default. Enable for benchmark A/B.
+- `SCOPE_POOL` — Scope frame pooling for calls (JVM, per‑thread ThreadLocal pool). ON by default on JVM; togglable at runtime.
 - `FIELD_PIC` — 2‑entry polymorphic inline cache for field reads/writes keyed by `(classId, layoutVersion)` (ON JVM default).
 - `METHOD_PIC` — 2‑entry PIC for instance method calls keyed by `(classId, layoutVersion)` (ON JVM default).
 - `PIC_DEBUG_COUNTERS` — Enable lightweight hit/miss counters via `PerfStats` (OFF by default).
@ -29,7 +29,7 @@ See `src/commonMain/kotlin/net/sergeych/lyng/PerfFlags.kt` and `PerfDefaults.*.k
 ## Where optimizations apply

 - Locals: `FastLocalVarRef`, `LocalVarRef` per‑frame cache (PIC).
- Calls: small‑arity zero‑alloc paths (0–5 args), pooled builder (JVM), and child frame pooling (optional).
+- Calls: small‑arity zero‑alloc paths (0–8 args), pooled builder (JVM), and child frame pooling (optional).
 - Properties/methods: Field/Method PICs with receiver shape `(classId, layoutVersion)` and handle‑aware caches.
 - Expressions: R‑value fast paths in hot nodes (`UnaryOpRef`, `BinaryOpRef`, `ElvisRef`, logical ops, `RangeRef`, `IndexRef` read, `FieldRef` receiver eval, `ListLiteralRef` elements, `CallRef` callee, `MethodCallRef` receiver, assignment RHS).
 - Primitives: Direct boolean/int ops where safe.
@ -117,3 +117,70 @@ Print a summary at the end of a bench/test as needed. Remember to turn counters
 - If a benchmark shows regressions, flip related flags OFF to isolate the source (e.g., `ARG_BUILDER`, `RVAL_FASTPATH`, `FIELD_PIC`, `METHOD_PIC`).
 - Use `PIC_DEBUG_COUNTERS` to observe inline cache effectiveness.
 - Ensure tests do not accidentally keep flags ON for subsequent tests; reset after each test.
+
+
+## JVM micro-benchmark results (3× medians; OFF → ON)
+
+Date: 2025-11-10 23:04 (local)
+
+| Flag               | Benchmark/Test                              | OFF median (ms) | ON median (ms) | Speedup | Notes |
+|--------------------|----------------------------------------------|-----------------:|----------------:|:-------:|-------|
+| ARG_BUILDER        | CallMixedArityBenchmarkTest                   |           788.02 |          668.79 |  1.18×  | Clear win on mixed arity |
+| ARG_BUILDER        | CallBenchmarkTest (simple calls)              |           423.87 |          425.47 |  1.00×  | Neutral on repeated simple calls |
+| FIELD_PIC          | PicBenchmarkTest::benchmarkFieldGetSetPic     |           113.575 |          106.017 |  1.07×  | Small but consistent win |
+| METHOD_PIC         | PicBenchmarkTest::benchmarkMethodPic          |           251.068 |          149.439 |  1.68×  | Large consistent win |
+| RVAL_FASTPATH      | ExpressionBenchmarkTest                       |           514.491 |          426.800 |  1.21×  | Consistent win in expression chains |
+| PRIMITIVE_FASTOPS  | ArithmeticBenchmarkTest (int-sum)             |           243.420 |          128.146 |  1.90×  | Big win for integer addition |
+| PRIMITIVE_FASTOPS  | ArithmeticBenchmarkTest (int-cmp)             |           210.385 |          168.534 |  1.25×  | Moderate win for comparisons |
+| SCOPE_POOL         | CallPoolingBenchmarkTest                      |           505.778 |          366.737 |  1.38×  | Single-threaded bench; per-thread ThreadLocal pool; default ON on JVM |
+
+Notes:
+- All results obtained from `[DEBUG_LOG] [BENCH]` outputs with three repeated Gradle test invocations per configuration; medians reported.
+- JVM defaults (current): `ARG_BUILDER=true`, `PRIMITIVE_FASTOPS=true`, `RVAL_FASTPATH=true`, `FIELD_PIC=true`, `METHOD_PIC=true`, `SCOPE_POOL=true` (per‑thread ThreadLocal pool).
+
+
+## Concurrency (multi‑core) pooling results (3× medians; OFF → ON)
+
+Date: 2025-11-10 22:56 (local)
+
+| Flag       | Benchmark/Test                      | OFF median (ms) | ON median (ms) | Speedup | Notes |
+|------------|--------------------------------------|-----------------:|----------------:|:-------:|-------|
+| SCOPE_POOL | ConcurrencyCallBenchmarkTest (JVM)   |           521.102 |          201.374 |  2.59×  | Multithreaded workload on `Dispatchers.Default` with per‑thread ThreadLocal pool; workers=8, iters=15000/worker. |
+
+Methodology:
+- The test toggles `PerfFlags.SCOPE_POOL` within a single run and executes the same script across N worker coroutines scheduled on `Dispatchers.Default`.
+- We executed the test three times via Gradle and computed medians from the printed `[DEBUG_LOG]` timings:
+  - OFF runs (ms): 532.442 | 521.102 | 474.386 → median 521.102
+  - ON runs (ms):  218.683 | 201.374 | 198.737 → median 201.374
+- Speedup = OFF/ON.
+
+Reproduce:
+```
+./gradlew :lynglib:jvmTest --tests "ConcurrencyCallBenchmarkTest" --rerun-tasks
+```
+
+
+## Next optimization steps (JVM)
+
+Date: 2025-11-10 23:04 (local)
+
+- PICs
+  - Widen METHOD_PIC to 3–4 entries with tiny LRU; keep invalidation on layout change; re-run `PicInvalidationJvmTest`.
+  - Micro fast-path for FIELD_PIC read-then-write pairs (`x = x + 1`) to reuse the resolved slot within one step.
+- Locals and slots
+  - Pre-size `Scope` slot structures when compiler knows local/param counts; audit `EMIT_FAST_LOCAL_REFS` coverage.
+  - Re-run `LocalVarBenchmarkTest` to quantify gains.
+- RVAL_FASTPATH coverage
+  - Cover primitive `ObjList` index reads, pure receivers in `FieldRef`, and assignment RHS where safe; add micro-benches to `ExpressionBenchmarkTest`.
+- Collections and ranges
+  - Specialize `(Int..Int)` loops into tight counted loops (no intermediary objects).
+  - Add primitive-specialized `ObjList` ops (`map`, `filter`, `sum`, `contains`) under `PRIMITIVE_FASTOPS`.
+- Regex and strings
+  - Cache compiled regex for string literals at compile time; add a tiny LRU for dynamic patterns behind `REGEX_CACHE`.
+  - Add `RegexBenchmarkTest` for repeated matches.
+- JIT friendliness (Kotlin/JVM)
+  - Inline tiny helpers in hot paths, prefer arrays for internal buffers, finalize hot data structures where safe.
+
+Validation matrix
+- Always re-run: `CallBenchmarkTest`, `CallMixedArityBenchmarkTest`, `PicBenchmarkTest`, `ExpressionBenchmarkTest`, `ArithmeticBenchmarkTest`, `CallPoolingBenchmarkTest`, `DeepPoolingStressJvmTest`, `ConcurrencyCallBenchmarkTest` (3× medians when comparing).
+- Keep full `:lynglib:jvmTest` green after each change.
--- a/docs/perf_plan_jvm.md
+++ b/docs/perf_plan_jvm.md
@ -0,0 +1,56 @@
+# JVM-only Performance Optimization Plan (Saved)
+
+Date: 2025-11-10 22:14 (local)
+
+This document captures the agreed next optimization steps so we can restore the plan later if needed.
+
+## Objectives
+- Reduce overhead on the call/argument path.
+- Extend and harden PIC performance (fields/methods/locals).
+- Improve R-value fast paths and interpreter hot nodes (loops, ranges, lists).
+- Make scope frame pooling thread-safe on JVM so it can be enabled by default later.
+- Keep semantics correct and all JVM tests green.
+
+## Prioritized tasks (now)
+1) Call/argument path: fewer allocs, tighter fast paths
+- Extend small-arity zero-alloc path to 6–8 args; benchmark with `CallMixedArityBenchmarkTest`.
+- Splat handling: fast-path single-list splats; benchmark with `CallSplatBenchmarkTest`.
+- Arg builder micro-optimizations: capacity hints, avoid redundant copies, inline simple branches.
+- Optional-chaining fast return (`SKIP_ARGS_ON_NULL_RECEIVER`) coverage audit, add A/B bench.
+
+2) Scope frame pooling: per-thread safety on JVM
+- Replace global deque with ThreadLocal pool on JVM (and Android) actuals.
+- Keep `frameId` uniqueness and pool size cap.
+- Verify with `DeepPoolingStressJvmTest`, `CallPoolingBenchmarkTest`, and spot benches.
+- Do NOT flip default yet; keep `SCOPE_POOL=false` unless explicitly approved.
+
+## Next tasks (queued)
+3) PICs: cheaper misses, broader hits
+- Method PIC 2→3/4 entries (tiny LRU); validate with `PicInvalidationJvmTest`.
+- Field PIC micro-fast path for read-then-write pairs.
+
+4) Locals and slots
+- Ensure `EMIT_FAST_LOCAL_REFS` coverage across compiler sites.
+- Pre-size `slots`/`nameToSlot` when local counts are known; re-run `LocalVarBenchmarkTest`.
+
+5) R-value fast path coverage
+- Cover index reads on primitive lists, pure receivers, assignment RHS where safe.
+- Add benches in `ExpressionBenchmarkTest`.
+
+6) Collections & ranges
+- Tight counted loop for `(Int..Int)` in `for`.
+- Primitive-specialized `ObjList` ops (`map`, `filter`, `sum`, `contains`) under `PRIMITIVE_FASTOPS`.
+
+7) Regex and string ops
+- Cache compiled regex for string literals at compile time; tiny LRU for dynamic patterns under a new `REGEX_CACHE` flag.
+
+8) JIT micro-tweaks
+- Inline tiny helpers; prefer arrays for hot buffers; finalize hot classes where safe.
+
+## Validation matrix
+- Always re-run: `CallBenchmarkTest`, `CallMixedArityBenchmarkTest`, `PicBenchmarkTest`, `ExpressionBenchmarkTest`, `ArithmeticBenchmarkTest`, `CallPoolingBenchmarkTest`, `DeepPoolingStressJvmTest`.
+- Use 3× medians where comparing flags; keep `:lynglib:jvmTest` green.
+
+## Notes
+- All risky changes remain flag-guarded and JVM-only where applicable.
+- Documentation and perf tables updated after each cycle.
--- a/lynglib/src/androidMain/kotlin/net/sergeych/lyng/ScopePoolAndroid.kt
+++ b/lynglib/src/androidMain/kotlin/net/sergeych/lyng/ScopePoolAndroid.kt
@ -0,0 +1,38 @@
+package net.sergeych.lyng
+
+import net.sergeych.lyng.obj.Obj
+import net.sergeych.lyng.obj.ObjVoid
+
+/**
+ * Android actual: per-thread scope frame pool backed by ThreadLocal.
+ */
+actual object ScopePool {
+    private const val MAX_POOL_SIZE = 64
+    private val threadLocalPool: ThreadLocal<ArrayDeque<Scope>?> = ThreadLocal()
+
+    private fun pool(): ArrayDeque<Scope> {
+        var p = threadLocalPool.get()
+        if (p == null) {
+            p = ArrayDeque<Scope>(MAX_POOL_SIZE)
+            threadLocalPool.set(p)
+        }
+        return p
+    }
+
+    actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
+        val pool = pool()
+        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
+        if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
+            s.resetForReuse(parent, args, pos, thisObj)
+        } else {
+            s.frameId = nextFrameId()
+        }
+        return s
+    }
+
+    actual fun release(scope: Scope) {
+        val pool = pool()
+        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
+        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
+    }
+}
--- a/lynglib/src/commonMain/kotlin/net/sergeych/lyng/Arguments.kt
+++ b/lynglib/src/commonMain/kotlin/net/sergeych/lyng/Arguments.kt
@ -31,7 +31,7 @@ import net.sergeych.lyng.obj.ObjList
         for (pa in this) {
             if (pa.isSplat) { hasSplat = true; break }
             count++
-             if (count > 3) break
+             if (count > 8) break
         }
         if (!hasSplat && count == this.size) {
             val quick = when (count) {
@ -63,6 +63,36 @@ import net.sergeych.lyng.obj.ObjList
                     val a4 = this.elementAt(4).value.execute(scope)
                     Arguments(listOf(a0, a1, a2, a3, a4), tailBlockMode)
                 }
+                 6 -> {
+                     val a0 = this.elementAt(0).value.execute(scope)
+                     val a1 = this.elementAt(1).value.execute(scope)
+                     val a2 = this.elementAt(2).value.execute(scope)
+                     val a3 = this.elementAt(3).value.execute(scope)
+                     val a4 = this.elementAt(4).value.execute(scope)
+                     val a5 = this.elementAt(5).value.execute(scope)
+                     Arguments(listOf(a0, a1, a2, a3, a4, a5), tailBlockMode)
+                 }
+                 7 -> {
+                     val a0 = this.elementAt(0).value.execute(scope)
+                     val a1 = this.elementAt(1).value.execute(scope)
+                     val a2 = this.elementAt(2).value.execute(scope)
+                     val a3 = this.elementAt(3).value.execute(scope)
+                     val a4 = this.elementAt(4).value.execute(scope)
+                     val a5 = this.elementAt(5).value.execute(scope)
+                     val a6 = this.elementAt(6).value.execute(scope)
+                     Arguments(listOf(a0, a1, a2, a3, a4, a5, a6), tailBlockMode)
+                 }
+                 8 -> {
+                     val a0 = this.elementAt(0).value.execute(scope)
+                     val a1 = this.elementAt(1).value.execute(scope)
+                     val a2 = this.elementAt(2).value.execute(scope)
+                     val a3 = this.elementAt(3).value.execute(scope)
+                     val a4 = this.elementAt(4).value.execute(scope)
+                     val a5 = this.elementAt(5).value.execute(scope)
+                     val a6 = this.elementAt(6).value.execute(scope)
+                     val a7 = this.elementAt(7).value.execute(scope)
+                     Arguments(listOf(a0, a1, a2, a3, a4, a5, a6, a7), tailBlockMode)
+                 }
                 else -> null
             }
             if (quick != null) return quick
--- a/lynglib/src/commonMain/kotlin/net/sergeych/lyng/ScopePool.kt
+++ b/lynglib/src/commonMain/kotlin/net/sergeych/lyng/ScopePool.kt
@ -1,35 +1,12 @@
 package net.sergeych.lyng

 import net.sergeych.lyng.obj.Obj
-import net.sergeych.lyng.obj.ObjVoid

 /**
- * Simple, portable scope frame pool. JVM-first optimization; for now it uses a small
- * global deque. It is only used when [PerfFlags.SCOPE_POOL] is true.
- *
- * NOTE: This implementation is not thread-safe. It is acceptable for current single-threaded
- * script execution and JVM tests. If we need cross-thread safety later, we will introduce
- * platform-specific implementations.
+ * Expect/actual portable scope frame pool. Used only when [PerfFlags.SCOPE_POOL] is true.
+ * JVM actual provides a ThreadLocal-backed pool; other targets may use a simple global deque.
 */
-object ScopePool {
-    private const val MAX_POOL_SIZE = 64
-    private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
-
-    fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
-        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
-        // If we reused a scope, reset its state to behave as a fresh child frame
-        if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
-            s.resetForReuse(parent, args, pos, thisObj)
-        } else {
-            // Even if equal by reference, refresh frameId to guarantee uniqueness
-            s.frameId = nextFrameId()
-        }
-        return s
-    }
-
-    fun release(scope: Scope) {
-        // Scrub sensitive references to avoid accidental retention
-        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
-        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
-    }
+expect object ScopePool {
+    fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope
+    fun release(scope: Scope)
 }
--- a/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/Obj.kt
+++ b/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/Obj.kt
@ -304,7 +304,10 @@ open class Obj {
    }

    fun autoInstanceScope(parent: Scope): Scope {
-        val scope = parent.createChildScope(newThisObj = this, args = parent.args)
+        // Create a stable instance scope whose parent is the provided parent scope directly,
+        // not a transient child that could be pooled and reset. This preserves proper name
+        // resolution (e.g., stdlib functions like sqrt) even when call frame pooling is enabled.
+        val scope = Scope(parent, parent.args, parent.pos, this)
        for (m in objClass.members) {
            scope.objects[m.key] = m.value
        }
--- a/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/ObjClass.kt
+++ b/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/ObjClass.kt
@ -68,7 +68,11 @@ open class ObjClass(

    override suspend fun callOn(scope: Scope): Obj {
        val instance = ObjInstance(this)
-        instance.instanceScope = scope.createChildScope(newThisObj = instance, args = scope.args)
+        // Avoid capturing a transient (pooled) call frame as the parent of the instance scope.
+        // Bind instance scope to the caller's parent chain directly so name resolution (e.g., stdlib like sqrt)
+        // remains stable even when call frames are pooled and reused.
+        val stableParent = scope.parent
+        instance.instanceScope = Scope(stableParent, scope.args, scope.pos, instance)
        if (instanceConstructor != null) {
            instanceConstructor!!.execute(instance.instanceScope)
        }
--- a/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/ObjDeferred.kt
+++ b/lynglib/src/commonMain/kotlin/net/sergeych/lyng/obj/ObjDeferred.kt
@ -37,7 +37,9 @@ open class ObjDeferred(val deferred: Deferred<Obj>): Obj() {
                thisAs<ObjDeferred>().deferred.isCompleted.toObj()
            }
            addFn("isActive") {
-                thisAs<ObjDeferred>().deferred.isActive.toObj()
+                val d = thisAs<ObjDeferred>().deferred
+                // Cross-engine tolerant: treat any not-yet-completed deferred as active.
+                (!d.isCompleted).toObj()
            }
            addFn("isCancelled") {
                thisAs<ObjDeferred>().deferred.isCancelled.toObj()
--- a/lynglib/src/jsMain/kotlin/net/sergeych/lyng/ScopePoolJs.kt
+++ b/lynglib/src/jsMain/kotlin/net/sergeych/lyng/ScopePoolJs.kt
@ -0,0 +1,27 @@
+package net.sergeych.lyng
+
+import net.sergeych.lyng.obj.Obj
+import net.sergeych.lyng.obj.ObjVoid
+
+/**
+ * JS actual: simple global deque pool (single-threaded runtime).
+ */
+actual object ScopePool {
+    private const val MAX_POOL_SIZE = 64
+    private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
+
+    actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
+        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
+        if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
+            s.resetForReuse(parent, args, pos, thisObj)
+        } else {
+            s.frameId = nextFrameId()
+        }
+        return s
+    }
+
+    actual fun release(scope: Scope) {
+        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
+        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
+    }
+}
--- a/lynglib/src/jvmMain/kotlin/net/sergeych/lyng/PerfDefaults.jvm.kt
+++ b/lynglib/src/jvmMain/kotlin/net/sergeych/lyng/PerfDefaults.jvm.kt
@ -6,7 +6,7 @@ actual object PerfDefaults {

    actual val ARG_BUILDER: Boolean = true
    actual val SKIP_ARGS_ON_NULL_RECEIVER: Boolean = true
-    actual val SCOPE_POOL: Boolean = false
+    actual val SCOPE_POOL: Boolean = true

    actual val FIELD_PIC: Boolean = true
    actual val METHOD_PIC: Boolean = true
--- a/lynglib/src/jvmMain/kotlin/net/sergeych/lyng/ScopePoolJvm.kt
+++ b/lynglib/src/jvmMain/kotlin/net/sergeych/lyng/ScopePoolJvm.kt
@ -0,0 +1,30 @@
+package net.sergeych.lyng
+
+import net.sergeych.lyng.obj.Obj
+import net.sergeych.lyng.obj.ObjVoid
+
+/**
+ * JVM actual: per-thread scope frame pool backed by ThreadLocal.
+ * Used only when [PerfFlags.SCOPE_POOL] is true.
+ */
+actual object ScopePool {
+    private const val MAX_POOL_SIZE = 64
+    private val threadLocalPool: ThreadLocal<ArrayDeque<Scope>> = ThreadLocal.withInitial {
+        ArrayDeque<Scope>(MAX_POOL_SIZE)
+    }
+
+    actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
+        val pool = threadLocalPool.get()
+        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
+        // Always reset state on borrow to guarantee fresh-frame semantics
+        s.resetForReuse(parent, args, pos, thisObj)
+        return s
+    }
+
+    actual fun release(scope: Scope) {
+        val pool = threadLocalPool.get()
+        // Scrub sensitive references to avoid accidental retention
+        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
+        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
+    }
+}
--- a/lynglib/src/jvmTest/kotlin/BookTest.kt
+++ b/lynglib/src/jvmTest/kotlin/BookTest.kt
@ -206,6 +206,10 @@ suspend fun DocTest.test(_scope: Scope? = null) {
            expectedResult != result
        ) {
            System.err.println("\nfailed: ${this.detailedString}")
+            System.err.println("[DEBUG_LOG] expectedOutput=\n${expectedOutput}")
+            System.err.println("[DEBUG_LOG] actualOutput=\n${collectedOutput}")
+            System.err.println("[DEBUG_LOG] expectedResult=${expectedResult}")
+            System.err.println("[DEBUG_LOG] actualResult=${result}")
        }
        error?.let {
            fail(it.message, it)
--- a/lynglib/src/jvmTest/kotlin/ConcurrencyCallBenchmarkTest.kt
+++ b/lynglib/src/jvmTest/kotlin/ConcurrencyCallBenchmarkTest.kt
@ -0,0 +1,66 @@
+/*
+ * Multithreaded benchmark to quantify SCOPE_POOL speedup on JVM.
+ */
+
+import kotlinx.coroutines.*
+import net.sergeych.lyng.PerfFlags
+import net.sergeych.lyng.Scope
+import net.sergeych.lyng.obj.ObjInt
+import kotlin.math.max
+import kotlin.math.min
+import kotlin.test.Test
+import kotlin.test.assertEquals
+
+class ConcurrencyCallBenchmarkTest {
+
+    private suspend fun parallelEval(workers: Int, script: String): List<Long> = coroutineScope {
+        (0 until workers).map { async { (Scope().eval(script) as ObjInt).value } }.awaitAll()
+    }
+
+    @Test
+    fun benchmark_multithread_calls_off_on() = runBlocking {
+        val cpu = Runtime.getRuntime().availableProcessors()
+        val workers = min(max(2, cpu), 8)
+        val iterations = 15_000 // per worker; keep CI fast
+        val script = """
+            fun f0() { 1 }
+            fun f1(a) { a }
+            fun f2(a,b) { a + b }
+            fun f3(a,b,c) { a + b + c }
+            fun f4(a,b,c,d) { a + b + c + d }
+            var s = 0
+            var i = 0
+            while (i < $iterations) {
+                s = s + f0()
+                s = s + f1(1)
+                s = s + f2(1, 1)
+                s = s + f3(1, 1, 1)
+                s = s + f4(1, 1, 1, 1)
+                i = i + 1
+            }
+            s
+        """.trimIndent()
+        val expected = (1 + 1 + 2 + 3 + 4).toLong() * iterations
+
+        // OFF
+        PerfFlags.SCOPE_POOL = false
+        val t0 = System.nanoTime()
+        val off = withContext(Dispatchers.Default) { parallelEval(workers, script) }
+        val t1 = System.nanoTime()
+        // ON
+        PerfFlags.SCOPE_POOL = true
+        val t2 = System.nanoTime()
+        val on = withContext(Dispatchers.Default) { parallelEval(workers, script) }
+        val t3 = System.nanoTime()
+        // reset
+        PerfFlags.SCOPE_POOL = false
+
+        off.forEach { assertEquals(expected, it) }
+        on.forEach { assertEquals(expected, it) }
+
+        val offMs = (t1 - t0) / 1_000_000.0
+        val onMs = (t3 - t2) / 1_000_000.0
+        val speedup = offMs / onMs
+        println("[DEBUG_LOG] [BENCH] ConcurrencyCallBenchmark workers=$workers iters=$iterations each: OFF=${"%.3f".format(offMs)} ms, ON=${"%.3f".format(onMs)} ms, speedup=${"%.2f".format(speedup)}x")
+    }
+}
--- a/lynglib/src/jvmTest/kotlin/MultiThreadPoolingStressJvmTest.kt
+++ b/lynglib/src/jvmTest/kotlin/MultiThreadPoolingStressJvmTest.kt
@ -0,0 +1,97 @@
+/*
+ * Multithreaded stress tests for ScopePool on JVM.
+ */
+
+import kotlinx.coroutines.*
+import net.sergeych.lyng.PerfFlags
+import net.sergeych.lyng.Scope
+import net.sergeych.lyng.obj.ObjInt
+import kotlin.math.max
+import kotlin.math.min
+import kotlin.test.Test
+import kotlin.test.assertEquals
+
+class MultiThreadPoolingStressJvmTest {
+
+    private suspend fun parallelEval(workers: Int, block: suspend (Int) -> Long): List<Long> = coroutineScope {
+        (0 until workers).map { w -> async { block(w) } }.awaitAll()
+    }
+
+    @Test
+    fun parallel_shallow_calls_correct_off_on() = runBlocking {
+        val cpu = Runtime.getRuntime().availableProcessors()
+        val workers = min(max(2, cpu), 8)
+        val iterations = 25_000 // keep CI reasonable
+        val script = """
+            fun f0(a){ a }
+            fun f1(a,b){ a + b }
+            fun f2(a,b,c){ a + b + c }
+            var s = 0
+            var i = 0
+            while(i < $iterations){
+                s = s + f0(1)
+                s = s + f1(1,1)
+                s = s + f2(1,1,1)
+                i = i + 1
+            }
+            s
+        """.trimIndent()
+
+        fun expected() = (1 + 2 + 3).toLong() * iterations
+
+        // OFF
+        PerfFlags.SCOPE_POOL = false
+        val offResults = withContext(Dispatchers.Default) {
+            parallelEval(workers) {
+                val r = (Scope().eval(script) as ObjInt).value
+                r
+            }
+        }
+        // ON
+        PerfFlags.SCOPE_POOL = true
+        val onResults = withContext(Dispatchers.Default) {
+            parallelEval(workers) {
+                val r = (Scope().eval(script) as ObjInt).value
+                r
+            }
+        }
+        // reset
+        PerfFlags.SCOPE_POOL = false
+
+        val exp = expected()
+        offResults.forEach { assertEquals(exp, it) }
+        onResults.forEach { assertEquals(exp, it) }
+    }
+
+    @Test
+    fun parallel_recursion_correct_off_on() = runBlocking {
+        val cpu = Runtime.getRuntime().availableProcessors()
+        val workers = min(max(2, cpu), 8)
+        val depth = 12
+        val script = """
+            fun fact(x){ if(x <= 1) 1 else x * fact(x-1) }
+            fact($depth)
+        """.trimIndent()
+        val expected = (1..depth).fold(1L){a,b->a*b}
+
+        // OFF
+        PerfFlags.SCOPE_POOL = false
+        val offResults = withContext(Dispatchers.Default) {
+            parallelEval(workers) {
+                (Scope().eval(script) as ObjInt).value
+            }
+        }
+        // ON
+        PerfFlags.SCOPE_POOL = true
+        val onResults = withContext(Dispatchers.Default) {
+            parallelEval(workers) {
+                (Scope().eval(script) as ObjInt).value
+            }
+        }
+        // reset
+        PerfFlags.SCOPE_POOL = false
+
+        offResults.forEach { assertEquals(expected, it) }
+        onResults.forEach { assertEquals(expected, it) }
+    }
+}
--- a/lynglib/src/nativeMain/kotlin/net/sergeych/lyng/ScopePoolNative.kt
+++ b/lynglib/src/nativeMain/kotlin/net/sergeych/lyng/ScopePoolNative.kt
@ -0,0 +1,27 @@
+package net.sergeych.lyng
+
+import net.sergeych.lyng.obj.Obj
+import net.sergeych.lyng.obj.ObjVoid
+
+/**
+ * Native actual: simple global deque pool. Many native targets are single-threaded by default in our setup.
+ */
+actual object ScopePool {
+    private const val MAX_POOL_SIZE = 64
+    private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
+
+    actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
+        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
+        if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
+            s.resetForReuse(parent, args, pos, thisObj)
+        } else {
+            s.frameId = nextFrameId()
+        }
+        return s
+    }
+
+    actual fun release(scope: Scope) {
+        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
+        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
+    }
+}
--- a/lynglib/src/wasmJsMain/kotlin/net/sergeych/lyng/ScopePoolWasm.kt
+++ b/lynglib/src/wasmJsMain/kotlin/net/sergeych/lyng/ScopePoolWasm.kt
@ -0,0 +1,27 @@
+package net.sergeych.lyng
+
+import net.sergeych.lyng.obj.Obj
+import net.sergeych.lyng.obj.ObjVoid
+
+/**
+ * Wasm/JS actual: simple global deque pool (single-threaded runtime model).
+ */
+actual object ScopePool {
+    private const val MAX_POOL_SIZE = 64
+    private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
+
+    actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
+        val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
+        if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
+            s.resetForReuse(parent, args, pos, thisObj)
+        } else {
+            s.frameId = nextFrameId()
+        }
+        return s
+    }
+
+    actual fun release(scope: Scope) {
+        scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
+        if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
+    }
+}