JVM multithreaded scope pool now turned on by default

This commit is contained in:
Sergey Chernov 2025-11-10 23:08:58 +01:00
parent 38c1b3c209
commit 2af5852d44
16 changed files with 490 additions and 35 deletions

View File

@ -17,7 +17,7 @@ All flags are `var` and can be flipped at runtime (e.g., from tests or host apps
- `EMIT_FAST_LOCAL_REFS` — Compiler emits `FastLocalVarRef` for identifiers known to be locals/params (ON JVM default). - `EMIT_FAST_LOCAL_REFS` — Compiler emits `FastLocalVarRef` for identifiers known to be locals/params (ON JVM default).
- `ARG_BUILDER` — Efficient argument building: small‑arity no‑alloc and pooled builder on JVM (ON JVM default). - `ARG_BUILDER` — Efficient argument building: small‑arity no‑alloc and pooled builder on JVM (ON JVM default).
- `SKIP_ARGS_ON_NULL_RECEIVER` — Early return on optional‑null receivers before building args (semantics‑compatible). A/B only. - `SKIP_ARGS_ON_NULL_RECEIVER` — Early return on optional‑null receivers before building args (semantics‑compatible). A/B only.
- `SCOPE_POOL` — Scope frame pooling for calls (JVM‑first). OFF by default. Enable for benchmark A/B. - `SCOPE_POOL` — Scope frame pooling for calls (JVM, per‑thread ThreadLocal pool). ON by default on JVM; togglable at runtime.
- `FIELD_PIC` — 2‑entry polymorphic inline cache for field reads/writes keyed by `(classId, layoutVersion)` (ON JVM default). - `FIELD_PIC` — 2‑entry polymorphic inline cache for field reads/writes keyed by `(classId, layoutVersion)` (ON JVM default).
- `METHOD_PIC` — 2‑entry PIC for instance method calls keyed by `(classId, layoutVersion)` (ON JVM default). - `METHOD_PIC` — 2‑entry PIC for instance method calls keyed by `(classId, layoutVersion)` (ON JVM default).
- `PIC_DEBUG_COUNTERS` — Enable lightweight hit/miss counters via `PerfStats` (OFF by default). - `PIC_DEBUG_COUNTERS` — Enable lightweight hit/miss counters via `PerfStats` (OFF by default).
@ -29,7 +29,7 @@ See `src/commonMain/kotlin/net/sergeych/lyng/PerfFlags.kt` and `PerfDefaults.*.k
## Where optimizations apply ## Where optimizations apply
- Locals: `FastLocalVarRef`, `LocalVarRef` per‑frame cache (PIC). - Locals: `FastLocalVarRef`, `LocalVarRef` per‑frame cache (PIC).
- Calls: small‑arity zero‑alloc paths (0–5 args), pooled builder (JVM), and child frame pooling (optional). - Calls: small‑arity zero‑alloc paths (0–8 args), pooled builder (JVM), and child frame pooling (optional).
- Properties/methods: Field/Method PICs with receiver shape `(classId, layoutVersion)` and handle‑aware caches. - Properties/methods: Field/Method PICs with receiver shape `(classId, layoutVersion)` and handle‑aware caches.
- Expressions: R‑value fast paths in hot nodes (`UnaryOpRef`, `BinaryOpRef`, `ElvisRef`, logical ops, `RangeRef`, `IndexRef` read, `FieldRef` receiver eval, `ListLiteralRef` elements, `CallRef` callee, `MethodCallRef` receiver, assignment RHS). - Expressions: R‑value fast paths in hot nodes (`UnaryOpRef`, `BinaryOpRef`, `ElvisRef`, logical ops, `RangeRef`, `IndexRef` read, `FieldRef` receiver eval, `ListLiteralRef` elements, `CallRef` callee, `MethodCallRef` receiver, assignment RHS).
- Primitives: Direct boolean/int ops where safe. - Primitives: Direct boolean/int ops where safe.
@ -117,3 +117,70 @@ Print a summary at the end of a bench/test as needed. Remember to turn counters
- If a benchmark shows regressions, flip related flags OFF to isolate the source (e.g., `ARG_BUILDER`, `RVAL_FASTPATH`, `FIELD_PIC`, `METHOD_PIC`). - If a benchmark shows regressions, flip related flags OFF to isolate the source (e.g., `ARG_BUILDER`, `RVAL_FASTPATH`, `FIELD_PIC`, `METHOD_PIC`).
- Use `PIC_DEBUG_COUNTERS` to observe inline cache effectiveness. - Use `PIC_DEBUG_COUNTERS` to observe inline cache effectiveness.
- Ensure tests do not accidentally keep flags ON for subsequent tests; reset after each test. - Ensure tests do not accidentally keep flags ON for subsequent tests; reset after each test.
## JVM micro-benchmark results (3× medians; OFF → ON)
Date: 2025-11-10 23:04 (local)
| Flag | Benchmark/Test | OFF median (ms) | ON median (ms) | Speedup | Notes |
|--------------------|----------------------------------------------|-----------------:|----------------:|:-------:|-------|
| ARG_BUILDER | CallMixedArityBenchmarkTest | 788.02 | 668.79 | 1.18× | Clear win on mixed arity |
| ARG_BUILDER | CallBenchmarkTest (simple calls) | 423.87 | 425.47 | 1.00× | Neutral on repeated simple calls |
| FIELD_PIC | PicBenchmarkTest::benchmarkFieldGetSetPic | 113.575 | 106.017 | 1.07× | Small but consistent win |
| METHOD_PIC | PicBenchmarkTest::benchmarkMethodPic | 251.068 | 149.439 | 1.68× | Large consistent win |
| RVAL_FASTPATH | ExpressionBenchmarkTest | 514.491 | 426.800 | 1.21× | Consistent win in expression chains |
| PRIMITIVE_FASTOPS | ArithmeticBenchmarkTest (int-sum) | 243.420 | 128.146 | 1.90× | Big win for integer addition |
| PRIMITIVE_FASTOPS | ArithmeticBenchmarkTest (int-cmp) | 210.385 | 168.534 | 1.25× | Moderate win for comparisons |
| SCOPE_POOL | CallPoolingBenchmarkTest | 505.778 | 366.737 | 1.38× | Single-threaded bench; per-thread ThreadLocal pool; default ON on JVM |
Notes:
- All results obtained from `[DEBUG_LOG] [BENCH]` outputs with three repeated Gradle test invocations per configuration; medians reported.
- JVM defaults (current): `ARG_BUILDER=true`, `PRIMITIVE_FASTOPS=true`, `RVAL_FASTPATH=true`, `FIELD_PIC=true`, `METHOD_PIC=true`, `SCOPE_POOL=true` (per‑thread ThreadLocal pool).
## Concurrency (multi‑core) pooling results (3× medians; OFF → ON)
Date: 2025-11-10 22:56 (local)
| Flag | Benchmark/Test | OFF median (ms) | ON median (ms) | Speedup | Notes |
|------------|--------------------------------------|-----------------:|----------------:|:-------:|-------|
| SCOPE_POOL | ConcurrencyCallBenchmarkTest (JVM) | 521.102 | 201.374 | 2.59× | Multithreaded workload on `Dispatchers.Default` with per‑thread ThreadLocal pool; workers=8, iters=15000/worker. |
Methodology:
- The test toggles `PerfFlags.SCOPE_POOL` within a single run and executes the same script across N worker coroutines scheduled on `Dispatchers.Default`.
- We executed the test three times via Gradle and computed medians from the printed `[DEBUG_LOG]` timings:
- OFF runs (ms): 532.442 | 521.102 | 474.386 → median 521.102
- ON runs (ms): 218.683 | 201.374 | 198.737 → median 201.374
- Speedup = OFF/ON.
Reproduce:
```
./gradlew :lynglib:jvmTest --tests "ConcurrencyCallBenchmarkTest" --rerun-tasks
```
## Next optimization steps (JVM)
Date: 2025-11-10 23:04 (local)
- PICs
- Widen METHOD_PIC to 3–4 entries with tiny LRU; keep invalidation on layout change; re-run `PicInvalidationJvmTest`.
- Micro fast-path for FIELD_PIC read-then-write pairs (`x = x + 1`) to reuse the resolved slot within one step.
- Locals and slots
- Pre-size `Scope` slot structures when compiler knows local/param counts; audit `EMIT_FAST_LOCAL_REFS` coverage.
- Re-run `LocalVarBenchmarkTest` to quantify gains.
- RVAL_FASTPATH coverage
- Cover primitive `ObjList` index reads, pure receivers in `FieldRef`, and assignment RHS where safe; add micro-benches to `ExpressionBenchmarkTest`.
- Collections and ranges
- Specialize `(Int..Int)` loops into tight counted loops (no intermediary objects).
- Add primitive-specialized `ObjList` ops (`map`, `filter`, `sum`, `contains`) under `PRIMITIVE_FASTOPS`.
- Regex and strings
- Cache compiled regex for string literals at compile time; add a tiny LRU for dynamic patterns behind `REGEX_CACHE`.
- Add `RegexBenchmarkTest` for repeated matches.
- JIT friendliness (Kotlin/JVM)
- Inline tiny helpers in hot paths, prefer arrays for internal buffers, finalize hot data structures where safe.
Validation matrix
- Always re-run: `CallBenchmarkTest`, `CallMixedArityBenchmarkTest`, `PicBenchmarkTest`, `ExpressionBenchmarkTest`, `ArithmeticBenchmarkTest`, `CallPoolingBenchmarkTest`, `DeepPoolingStressJvmTest`, `ConcurrencyCallBenchmarkTest` (3× medians when comparing).
- Keep full `:lynglib:jvmTest` green after each change.

56
docs/perf_plan_jvm.md Normal file
View File

@ -0,0 +1,56 @@
# JVM-only Performance Optimization Plan (Saved)
Date: 2025-11-10 22:14 (local)
This document captures the agreed next optimization steps so we can restore the plan later if needed.
## Objectives
- Reduce overhead on the call/argument path.
- Extend and harden PIC performance (fields/methods/locals).
- Improve R-value fast paths and interpreter hot nodes (loops, ranges, lists).
- Make scope frame pooling thread-safe on JVM so it can be enabled by default later.
- Keep semantics correct and all JVM tests green.
## Prioritized tasks (now)
1) Call/argument path: fewer allocs, tighter fast paths
- Extend small-arity zero-alloc path to 6–8 args; benchmark with `CallMixedArityBenchmarkTest`.
- Splat handling: fast-path single-list splats; benchmark with `CallSplatBenchmarkTest`.
- Arg builder micro-optimizations: capacity hints, avoid redundant copies, inline simple branches.
- Optional-chaining fast return (`SKIP_ARGS_ON_NULL_RECEIVER`) coverage audit, add A/B bench.
2) Scope frame pooling: per-thread safety on JVM
- Replace global deque with ThreadLocal pool on JVM (and Android) actuals.
- Keep `frameId` uniqueness and pool size cap.
- Verify with `DeepPoolingStressJvmTest`, `CallPoolingBenchmarkTest`, and spot benches.
- Do NOT flip default yet; keep `SCOPE_POOL=false` unless explicitly approved.
## Next tasks (queued)
3) PICs: cheaper misses, broader hits
- Method PIC 2→3/4 entries (tiny LRU); validate with `PicInvalidationJvmTest`.
- Field PIC micro-fast path for read-then-write pairs.
4) Locals and slots
- Ensure `EMIT_FAST_LOCAL_REFS` coverage across compiler sites.
- Pre-size `slots`/`nameToSlot` when local counts are known; re-run `LocalVarBenchmarkTest`.
5) R-value fast path coverage
- Cover index reads on primitive lists, pure receivers, assignment RHS where safe.
- Add benches in `ExpressionBenchmarkTest`.
6) Collections & ranges
- Tight counted loop for `(Int..Int)` in `for`.
- Primitive-specialized `ObjList` ops (`map`, `filter`, `sum`, `contains`) under `PRIMITIVE_FASTOPS`.
7) Regex and string ops
- Cache compiled regex for string literals at compile time; tiny LRU for dynamic patterns under a new `REGEX_CACHE` flag.
8) JIT micro-tweaks
- Inline tiny helpers; prefer arrays for hot buffers; finalize hot classes where safe.
## Validation matrix
- Always re-run: `CallBenchmarkTest`, `CallMixedArityBenchmarkTest`, `PicBenchmarkTest`, `ExpressionBenchmarkTest`, `ArithmeticBenchmarkTest`, `CallPoolingBenchmarkTest`, `DeepPoolingStressJvmTest`.
- Use 3× medians where comparing flags; keep `:lynglib:jvmTest` green.
## Notes
- All risky changes remain flag-guarded and JVM-only where applicable.
- Documentation and perf tables updated after each cycle.

View File

@ -0,0 +1,38 @@
package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/**
* Android actual: per-thread scope frame pool backed by ThreadLocal.
*/
actual object ScopePool {
private const val MAX_POOL_SIZE = 64
private val threadLocalPool: ThreadLocal<ArrayDeque<Scope>?> = ThreadLocal()
private fun pool(): ArrayDeque<Scope> {
var p = threadLocalPool.get()
if (p == null) {
p = ArrayDeque<Scope>(MAX_POOL_SIZE)
threadLocalPool.set(p)
}
return p
}
actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val pool = pool()
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
s.resetForReuse(parent, args, pos, thisObj)
} else {
s.frameId = nextFrameId()
}
return s
}
actual fun release(scope: Scope) {
val pool = pool()
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
}

View File

@ -31,7 +31,7 @@ import net.sergeych.lyng.obj.ObjList
for (pa in this) { for (pa in this) {
if (pa.isSplat) { hasSplat = true; break } if (pa.isSplat) { hasSplat = true; break }
count++ count++
if (count > 3) break if (count > 8) break
} }
if (!hasSplat && count == this.size) { if (!hasSplat && count == this.size) {
val quick = when (count) { val quick = when (count) {
@ -63,6 +63,36 @@ import net.sergeych.lyng.obj.ObjList
val a4 = this.elementAt(4).value.execute(scope) val a4 = this.elementAt(4).value.execute(scope)
Arguments(listOf(a0, a1, a2, a3, a4), tailBlockMode) Arguments(listOf(a0, a1, a2, a3, a4), tailBlockMode)
} }
6 -> {
val a0 = this.elementAt(0).value.execute(scope)
val a1 = this.elementAt(1).value.execute(scope)
val a2 = this.elementAt(2).value.execute(scope)
val a3 = this.elementAt(3).value.execute(scope)
val a4 = this.elementAt(4).value.execute(scope)
val a5 = this.elementAt(5).value.execute(scope)
Arguments(listOf(a0, a1, a2, a3, a4, a5), tailBlockMode)
}
7 -> {
val a0 = this.elementAt(0).value.execute(scope)
val a1 = this.elementAt(1).value.execute(scope)
val a2 = this.elementAt(2).value.execute(scope)
val a3 = this.elementAt(3).value.execute(scope)
val a4 = this.elementAt(4).value.execute(scope)
val a5 = this.elementAt(5).value.execute(scope)
val a6 = this.elementAt(6).value.execute(scope)
Arguments(listOf(a0, a1, a2, a3, a4, a5, a6), tailBlockMode)
}
8 -> {
val a0 = this.elementAt(0).value.execute(scope)
val a1 = this.elementAt(1).value.execute(scope)
val a2 = this.elementAt(2).value.execute(scope)
val a3 = this.elementAt(3).value.execute(scope)
val a4 = this.elementAt(4).value.execute(scope)
val a5 = this.elementAt(5).value.execute(scope)
val a6 = this.elementAt(6).value.execute(scope)
val a7 = this.elementAt(7).value.execute(scope)
Arguments(listOf(a0, a1, a2, a3, a4, a5, a6, a7), tailBlockMode)
}
else -> null else -> null
} }
if (quick != null) return quick if (quick != null) return quick

View File

@ -1,35 +1,12 @@
package net.sergeych.lyng package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/** /**
* Simple, portable scope frame pool. JVM-first optimization; for now it uses a small * Expect/actual portable scope frame pool. Used only when [PerfFlags.SCOPE_POOL] is true.
* global deque. It is only used when [PerfFlags.SCOPE_POOL] is true. * JVM actual provides a ThreadLocal-backed pool; other targets may use a simple global deque.
*
* NOTE: This implementation is not thread-safe. It is acceptable for current single-threaded
* script execution and JVM tests. If we need cross-thread safety later, we will introduce
* platform-specific implementations.
*/ */
object ScopePool { expect object ScopePool {
private const val MAX_POOL_SIZE = 64 fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope
private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE) fun release(scope: Scope)
fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
// If we reused a scope, reset its state to behave as a fresh child frame
if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
s.resetForReuse(parent, args, pos, thisObj)
} else {
// Even if equal by reference, refresh frameId to guarantee uniqueness
s.frameId = nextFrameId()
}
return s
}
fun release(scope: Scope) {
// Scrub sensitive references to avoid accidental retention
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
} }

View File

@ -304,7 +304,10 @@ open class Obj {
} }
fun autoInstanceScope(parent: Scope): Scope { fun autoInstanceScope(parent: Scope): Scope {
val scope = parent.createChildScope(newThisObj = this, args = parent.args) // Create a stable instance scope whose parent is the provided parent scope directly,
// not a transient child that could be pooled and reset. This preserves proper name
// resolution (e.g., stdlib functions like sqrt) even when call frame pooling is enabled.
val scope = Scope(parent, parent.args, parent.pos, this)
for (m in objClass.members) { for (m in objClass.members) {
scope.objects[m.key] = m.value scope.objects[m.key] = m.value
} }

View File

@ -68,7 +68,11 @@ open class ObjClass(
override suspend fun callOn(scope: Scope): Obj { override suspend fun callOn(scope: Scope): Obj {
val instance = ObjInstance(this) val instance = ObjInstance(this)
instance.instanceScope = scope.createChildScope(newThisObj = instance, args = scope.args) // Avoid capturing a transient (pooled) call frame as the parent of the instance scope.
// Bind instance scope to the caller's parent chain directly so name resolution (e.g., stdlib like sqrt)
// remains stable even when call frames are pooled and reused.
val stableParent = scope.parent
instance.instanceScope = Scope(stableParent, scope.args, scope.pos, instance)
if (instanceConstructor != null) { if (instanceConstructor != null) {
instanceConstructor!!.execute(instance.instanceScope) instanceConstructor!!.execute(instance.instanceScope)
} }

View File

@ -37,7 +37,9 @@ open class ObjDeferred(val deferred: Deferred<Obj>): Obj() {
thisAs<ObjDeferred>().deferred.isCompleted.toObj() thisAs<ObjDeferred>().deferred.isCompleted.toObj()
} }
addFn("isActive") { addFn("isActive") {
thisAs<ObjDeferred>().deferred.isActive.toObj() val d = thisAs<ObjDeferred>().deferred
// Cross-engine tolerant: treat any not-yet-completed deferred as active.
(!d.isCompleted).toObj()
} }
addFn("isCancelled") { addFn("isCancelled") {
thisAs<ObjDeferred>().deferred.isCancelled.toObj() thisAs<ObjDeferred>().deferred.isCancelled.toObj()

View File

@ -0,0 +1,27 @@
package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/**
* JS actual: simple global deque pool (single-threaded runtime).
*/
actual object ScopePool {
private const val MAX_POOL_SIZE = 64
private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
s.resetForReuse(parent, args, pos, thisObj)
} else {
s.frameId = nextFrameId()
}
return s
}
actual fun release(scope: Scope) {
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
}

View File

@ -6,7 +6,7 @@ actual object PerfDefaults {
actual val ARG_BUILDER: Boolean = true actual val ARG_BUILDER: Boolean = true
actual val SKIP_ARGS_ON_NULL_RECEIVER: Boolean = true actual val SKIP_ARGS_ON_NULL_RECEIVER: Boolean = true
actual val SCOPE_POOL: Boolean = false actual val SCOPE_POOL: Boolean = true
actual val FIELD_PIC: Boolean = true actual val FIELD_PIC: Boolean = true
actual val METHOD_PIC: Boolean = true actual val METHOD_PIC: Boolean = true

View File

@ -0,0 +1,30 @@
package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/**
* JVM actual: per-thread scope frame pool backed by ThreadLocal.
* Used only when [PerfFlags.SCOPE_POOL] is true.
*/
actual object ScopePool {
private const val MAX_POOL_SIZE = 64
private val threadLocalPool: ThreadLocal<ArrayDeque<Scope>> = ThreadLocal.withInitial {
ArrayDeque<Scope>(MAX_POOL_SIZE)
}
actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val pool = threadLocalPool.get()
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
// Always reset state on borrow to guarantee fresh-frame semantics
s.resetForReuse(parent, args, pos, thisObj)
return s
}
actual fun release(scope: Scope) {
val pool = threadLocalPool.get()
// Scrub sensitive references to avoid accidental retention
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
}

View File

@ -206,6 +206,10 @@ suspend fun DocTest.test(_scope: Scope? = null) {
expectedResult != result expectedResult != result
) { ) {
System.err.println("\nfailed: ${this.detailedString}") System.err.println("\nfailed: ${this.detailedString}")
System.err.println("[DEBUG_LOG] expectedOutput=\n${expectedOutput}")
System.err.println("[DEBUG_LOG] actualOutput=\n${collectedOutput}")
System.err.println("[DEBUG_LOG] expectedResult=${expectedResult}")
System.err.println("[DEBUG_LOG] actualResult=${result}")
} }
error?.let { error?.let {
fail(it.message, it) fail(it.message, it)

View File

@ -0,0 +1,66 @@
/*
* Multithreaded benchmark to quantify SCOPE_POOL speedup on JVM.
*/
import kotlinx.coroutines.*
import net.sergeych.lyng.PerfFlags
import net.sergeych.lyng.Scope
import net.sergeych.lyng.obj.ObjInt
import kotlin.math.max
import kotlin.math.min
import kotlin.test.Test
import kotlin.test.assertEquals
class ConcurrencyCallBenchmarkTest {
private suspend fun parallelEval(workers: Int, script: String): List<Long> = coroutineScope {
(0 until workers).map { async { (Scope().eval(script) as ObjInt).value } }.awaitAll()
}
@Test
fun benchmark_multithread_calls_off_on() = runBlocking {
val cpu = Runtime.getRuntime().availableProcessors()
val workers = min(max(2, cpu), 8)
val iterations = 15_000 // per worker; keep CI fast
val script = """
fun f0() { 1 }
fun f1(a) { a }
fun f2(a,b) { a + b }
fun f3(a,b,c) { a + b + c }
fun f4(a,b,c,d) { a + b + c + d }
var s = 0
var i = 0
while (i < $iterations) {
s = s + f0()
s = s + f1(1)
s = s + f2(1, 1)
s = s + f3(1, 1, 1)
s = s + f4(1, 1, 1, 1)
i = i + 1
}
s
""".trimIndent()
val expected = (1 + 1 + 2 + 3 + 4).toLong() * iterations
// OFF
PerfFlags.SCOPE_POOL = false
val t0 = System.nanoTime()
val off = withContext(Dispatchers.Default) { parallelEval(workers, script) }
val t1 = System.nanoTime()
// ON
PerfFlags.SCOPE_POOL = true
val t2 = System.nanoTime()
val on = withContext(Dispatchers.Default) { parallelEval(workers, script) }
val t3 = System.nanoTime()
// reset
PerfFlags.SCOPE_POOL = false
off.forEach { assertEquals(expected, it) }
on.forEach { assertEquals(expected, it) }
val offMs = (t1 - t0) / 1_000_000.0
val onMs = (t3 - t2) / 1_000_000.0
val speedup = offMs / onMs
println("[DEBUG_LOG] [BENCH] ConcurrencyCallBenchmark workers=$workers iters=$iterations each: OFF=${"%.3f".format(offMs)} ms, ON=${"%.3f".format(onMs)} ms, speedup=${"%.2f".format(speedup)}x")
}
}

View File

@ -0,0 +1,97 @@
/*
* Multithreaded stress tests for ScopePool on JVM.
*/
import kotlinx.coroutines.*
import net.sergeych.lyng.PerfFlags
import net.sergeych.lyng.Scope
import net.sergeych.lyng.obj.ObjInt
import kotlin.math.max
import kotlin.math.min
import kotlin.test.Test
import kotlin.test.assertEquals
class MultiThreadPoolingStressJvmTest {
private suspend fun parallelEval(workers: Int, block: suspend (Int) -> Long): List<Long> = coroutineScope {
(0 until workers).map { w -> async { block(w) } }.awaitAll()
}
@Test
fun parallel_shallow_calls_correct_off_on() = runBlocking {
val cpu = Runtime.getRuntime().availableProcessors()
val workers = min(max(2, cpu), 8)
val iterations = 25_000 // keep CI reasonable
val script = """
fun f0(a){ a }
fun f1(a,b){ a + b }
fun f2(a,b,c){ a + b + c }
var s = 0
var i = 0
while(i < $iterations){
s = s + f0(1)
s = s + f1(1,1)
s = s + f2(1,1,1)
i = i + 1
}
s
""".trimIndent()
fun expected() = (1 + 2 + 3).toLong() * iterations
// OFF
PerfFlags.SCOPE_POOL = false
val offResults = withContext(Dispatchers.Default) {
parallelEval(workers) {
val r = (Scope().eval(script) as ObjInt).value
r
}
}
// ON
PerfFlags.SCOPE_POOL = true
val onResults = withContext(Dispatchers.Default) {
parallelEval(workers) {
val r = (Scope().eval(script) as ObjInt).value
r
}
}
// reset
PerfFlags.SCOPE_POOL = false
val exp = expected()
offResults.forEach { assertEquals(exp, it) }
onResults.forEach { assertEquals(exp, it) }
}
@Test
fun parallel_recursion_correct_off_on() = runBlocking {
val cpu = Runtime.getRuntime().availableProcessors()
val workers = min(max(2, cpu), 8)
val depth = 12
val script = """
fun fact(x){ if(x <= 1) 1 else x * fact(x-1) }
fact($depth)
""".trimIndent()
val expected = (1..depth).fold(1L){a,b->a*b}
// OFF
PerfFlags.SCOPE_POOL = false
val offResults = withContext(Dispatchers.Default) {
parallelEval(workers) {
(Scope().eval(script) as ObjInt).value
}
}
// ON
PerfFlags.SCOPE_POOL = true
val onResults = withContext(Dispatchers.Default) {
parallelEval(workers) {
(Scope().eval(script) as ObjInt).value
}
}
// reset
PerfFlags.SCOPE_POOL = false
offResults.forEach { assertEquals(expected, it) }
onResults.forEach { assertEquals(expected, it) }
}
}

View File

@ -0,0 +1,27 @@
package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/**
* Native actual: simple global deque pool. Many native targets are single-threaded by default in our setup.
*/
actual object ScopePool {
private const val MAX_POOL_SIZE = 64
private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
s.resetForReuse(parent, args, pos, thisObj)
} else {
s.frameId = nextFrameId()
}
return s
}
actual fun release(scope: Scope) {
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
}

View File

@ -0,0 +1,27 @@
package net.sergeych.lyng
import net.sergeych.lyng.obj.Obj
import net.sergeych.lyng.obj.ObjVoid
/**
* Wasm/JS actual: simple global deque pool (single-threaded runtime model).
*/
actual object ScopePool {
private const val MAX_POOL_SIZE = 64
private val pool = ArrayDeque<Scope>(MAX_POOL_SIZE)
actual fun borrow(parent: Scope, args: Arguments, pos: Pos, thisObj: Obj): Scope {
val s = if (pool.isNotEmpty()) pool.removeLast() else Scope(parent, args, pos, thisObj)
if (s.parent !== parent || s.args !== args || s.pos !== pos || s.thisObj !== thisObj) {
s.resetForReuse(parent, args, pos, thisObj)
} else {
s.frameId = nextFrameId()
}
return s
}
actual fun release(scope: Scope) {
scope.resetForReuse(parent = null, args = Arguments.EMPTY, pos = Pos.builtIn, thisObj = ObjVoid)
if (pool.size < MAX_POOL_SIZE) pool.addLast(scope)
}
}