further optimizations
This commit is contained in:
parent
fdb056e78e
commit
0eea73c118
@ -421,3 +421,77 @@ Results (representative runs; OFF → ON):
|
||||
|
||||
Summary: All three areas improved with optimizations ON; no regressions observed in these runs. For publication‑grade stability, run each test 3× and report medians (see sections below for methodology and previous median tables).
|
||||
|
||||
|
||||
## Additional tweaks — verification snapshot (Index write fast‑path, List literal pre‑size, Regex LRU)
|
||||
|
||||
Date: 2025-11-11 21:31 (local)
|
||||
|
||||
Scope: Implemented three semantics‑neutral optimizations and verified they are green across targeted and broader JVM benches.
|
||||
|
||||
What changed (guarded by flags where applicable):
|
||||
- RVAL_FASTPATH: Index write fast‑path
|
||||
- `IndexRef.setAt`: direct path for `ObjList` + `ObjInt` (`list[i] = value`) mirrors the read fast‑path. Optional chaining semantics preserved; bounds exceptions propagate unchanged.
|
||||
- RVAL_FASTPATH: List literal pre‑sizing
|
||||
- `ListLiteralRef.get`: pre‑counts element entries and uses `ArrayList` with capacity hint; for spreads of `ObjList`, uses `ensureCapacity` before bulk add. Evaluation order unchanged.
|
||||
- REGEX_CACHE: LRU‑like behavior
|
||||
- `RegexCache`: emulates access‑order LRU within a tiny bounded map (`MAX=64`) by moving accessed entries to the tail; improves alternating‑pattern scenarios. Only active when `PerfFlags.REGEX_CACHE` is true.
|
||||
|
||||
Reproduce quick verification (1× runs):
|
||||
```
|
||||
./gradlew :lynglib:jvmTest --tests ExpressionBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ListOpsBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests RegexBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests PicBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests PicInvalidationJvmTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests LocalVarBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ConcurrencyCallBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests DeepPoolingStressJvmTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests MultiThreadPoolingStressJvmTest --rerun-tasks
|
||||
```
|
||||
|
||||
Observation: All listed tests green in this cycle; no behavioral regressions observed. For the new paths (index write, list literal), performance was neutral‑to‑positive in smoke runs; Regex benches remained positive or neutral with the LRU behavior. For publication‑grade medians, extend to 3× per test as in earlier sections.
|
||||
|
||||
|
||||
## Sanity matrix (JVM) — quick OFF→ON runs
|
||||
|
||||
Date: 2025-11-11 21:59 (local)
|
||||
|
||||
Scope: Final Round 1 sanity sweep across JVM micro‑benches and stress tests to confirm that optimizations ON do not regress performance vs OFF in representative scenarios. Each benchmark prints `[DEBUG_LOG] [BENCH]` timings for OFF → ON within a single run. This section records a quick pass confirmation (not 3× medians) and reproduction commands.
|
||||
|
||||
Environment:
|
||||
- Gradle: 8.7 (stdout enabled, maxParallelForks=1)
|
||||
- JVM: as configured by the project toolchain
|
||||
- OS/Arch: macOS 14.x (aarch64)
|
||||
|
||||
Benches covered (all green; no regressions observed in these runs):
|
||||
- Calls/Args: `CallBenchmarkTest`, `CallMixedArityBenchmarkTest` (ARG_BUILDER)
|
||||
- PICs: `PicBenchmarkTest` (field/method); `PicInvalidationJvmTest` correctness reconfirmed
|
||||
- Expressions/Arithmetic: `ExpressionBenchmarkTest`, `ArithmeticBenchmarkTest` (RVAL_FASTPATH, PRIMITIVE_FASTOPS)
|
||||
- Ranges: `RangeBenchmarkTest` (PRIMITIVE_FASTOPS counted loop)
|
||||
- List ops: `ListOpsBenchmarkTest` (PRIMITIVE_FASTOPS specializations)
|
||||
- Regex: `RegexBenchmarkTest` (REGEX_CACHE with LRU behavior)
|
||||
- Locals: `LocalVarBenchmarkTest` (LOCAL_SLOT_PIC + FAST_LOCAL)
|
||||
- Concurrency/Pooling: `ConcurrencyCallBenchmarkTest`, `DeepPoolingStressJvmTest`, `MultiThreadPoolingStressJvmTest` (SCOPE_POOL per‑thread)
|
||||
|
||||
Reproduce (examples):
|
||||
```
|
||||
./gradlew :lynglib:jvmTest --tests CallBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests CallMixedArityBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests PicBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests PicInvalidationJvmTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ExpressionBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ArithmeticBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests RangeBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ListOpsBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests RegexBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests LocalVarBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests ConcurrencyCallBenchmarkTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests DeepPoolingStressJvmTest --rerun-tasks
|
||||
./gradlew :lynglib:jvmTest --tests MultiThreadPoolingStressJvmTest --rerun-tasks
|
||||
```
|
||||
|
||||
Summary:
|
||||
- All listed tests passed in this sanity sweep.
|
||||
- For each benchmark’s OFF → ON printouts examined during this pass, ON was equal or faster than OFF; no ON<OFF regressions were observed.
|
||||
- For publication‑grade numbers, use the 3× medians methodology outlined earlier in this document. The existing median tables in previous sections remain representative, and the additional tweaks (Index write, List literal pre‑size, Regex LRU, Field PIC 4‑way + read→write reuse, mixed Int/Real fast‑ops) remained neutral‑to‑positive.
|
||||
|
||||
|
||||
@ -11,17 +11,20 @@ object RegexCache {
|
||||
|
||||
fun get(pattern: String): Regex {
|
||||
// Fast path: return cached instance if present
|
||||
map[pattern]?.let { return it }
|
||||
map[pattern]?.let {
|
||||
// Emulate access-order LRU on all targets by moving the entry to the tail
|
||||
// (LinkedHashMap preserves insertion order; remove+put moves it to the end)
|
||||
map.remove(pattern)
|
||||
map[pattern] = it
|
||||
return it
|
||||
}
|
||||
// Compile new pattern
|
||||
val re = pattern.toRegex()
|
||||
// Keep the cache size bounded
|
||||
if (map.size >= MAX) {
|
||||
// Remove the oldest inserted entry (first key in iteration order)
|
||||
val it = map.keys.iterator()
|
||||
if (it.hasNext()) {
|
||||
val k = it.next()
|
||||
it.remove()
|
||||
}
|
||||
if (it.hasNext()) { val k = it.next(); it.remove() }
|
||||
}
|
||||
map[pattern] = re
|
||||
return re
|
||||
|
||||
@ -110,6 +110,29 @@ class BinaryOpRef(private val op: BinOp, private val left: ObjRef, private val r
|
||||
return r.asReadonly
|
||||
}
|
||||
}
|
||||
// Fast numeric mixed ops for Int/Real combinations by promoting to double
|
||||
if ((a is ObjInt || a is ObjReal) && (b is ObjInt || b is ObjReal)) {
|
||||
val ad: Double = if (a is ObjInt) a.doubleValue else (a as ObjReal).value
|
||||
val bd: Double = if (b is ObjInt) b.doubleValue else (b as ObjReal).value
|
||||
val rNum: Obj? = when (op) {
|
||||
BinOp.PLUS -> ObjReal(ad + bd)
|
||||
BinOp.MINUS -> ObjReal(ad - bd)
|
||||
BinOp.STAR -> ObjReal(ad * bd)
|
||||
BinOp.SLASH -> ObjReal(ad / bd)
|
||||
BinOp.PERCENT -> ObjReal(ad % bd)
|
||||
BinOp.LT -> if (ad < bd) ObjTrue else ObjFalse
|
||||
BinOp.LTE -> if (ad <= bd) ObjTrue else ObjFalse
|
||||
BinOp.GT -> if (ad > bd) ObjTrue else ObjFalse
|
||||
BinOp.GTE -> if (ad >= bd) ObjTrue else ObjFalse
|
||||
BinOp.EQ -> if (ad == bd) ObjTrue else ObjFalse
|
||||
BinOp.NEQ -> if (ad != bd) ObjTrue else ObjFalse
|
||||
else -> null
|
||||
}
|
||||
if (rNum != null) {
|
||||
if (net.sergeych.lyng.PerfFlags.PIC_DEBUG_COUNTERS) net.sergeych.lyng.PerfStats.primitiveFastOpsHit++
|
||||
return rNum.asReadonly
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
val r: Obj = when (op) {
|
||||
@ -261,12 +284,18 @@ class FieldRef(
|
||||
private val name: String,
|
||||
private val isOptional: Boolean,
|
||||
) : ObjRef {
|
||||
// 2-entry PIC for reads/writes (guarded by PerfFlags.FIELD_PIC)
|
||||
// 4-entry PIC for reads/writes (guarded by PerfFlags.FIELD_PIC)
|
||||
// Reads
|
||||
private var rKey1: Long = 0L; private var rVer1: Int = -1; private var rGetter1: (suspend (Obj, Scope) -> ObjRecord)? = null
|
||||
private var rKey2: Long = 0L; private var rVer2: Int = -1; private var rGetter2: (suspend (Obj, Scope) -> ObjRecord)? = null
|
||||
private var rKey3: Long = 0L; private var rVer3: Int = -1; private var rGetter3: (suspend (Obj, Scope) -> ObjRecord)? = null
|
||||
private var rKey4: Long = 0L; private var rVer4: Int = -1; private var rGetter4: (suspend (Obj, Scope) -> ObjRecord)? = null
|
||||
|
||||
// Writes
|
||||
private var wKey1: Long = 0L; private var wVer1: Int = -1; private var wSetter1: (suspend (Obj, Scope, Obj) -> Unit)? = null
|
||||
private var wKey2: Long = 0L; private var wVer2: Int = -1; private var wSetter2: (suspend (Obj, Scope, Obj) -> Unit)? = null
|
||||
private var wKey3: Long = 0L; private var wVer3: Int = -1; private var wSetter3: (suspend (Obj, Scope, Obj) -> Unit)? = null
|
||||
private var wKey4: Long = 0L; private var wVer4: Int = -1; private var wSetter4: (suspend (Obj, Scope, Obj) -> Unit)? = null
|
||||
|
||||
// Transient per-step cache to optimize read-then-write sequences within the same frame
|
||||
private var tKey: Long = 0L; private var tVer: Int = -1; private var tFrameId: Long = -1L; private var tRecord: ObjRecord? = null
|
||||
@ -290,6 +319,39 @@ class FieldRef(
|
||||
} }
|
||||
rGetter2?.let { g -> if (key == rKey2 && ver == rVer2) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicHit++
|
||||
// move-to-front: promote 2→1
|
||||
val tK = rKey2; val tV = rVer2; val tG = rGetter2
|
||||
rKey2 = rKey1; rVer2 = rVer1; rGetter2 = rGetter1
|
||||
rKey1 = tK; rVer1 = tV; rGetter1 = tG
|
||||
val rec0 = g(base, scope)
|
||||
if (base is ObjClass) {
|
||||
val idx0 = base.classScope?.getSlotIndexOf(name)
|
||||
if (idx0 != null) { tKey = key; tVer = ver; tFrameId = scope.frameId; tRecord = rec0 } else { tRecord = null }
|
||||
} else { tRecord = null }
|
||||
return rec0
|
||||
} }
|
||||
rGetter3?.let { g -> if (key == rKey3 && ver == rVer3) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicHit++
|
||||
// move-to-front: promote 3→1
|
||||
val tK = rKey3; val tV = rVer3; val tG = rGetter3
|
||||
rKey3 = rKey2; rVer3 = rVer2; rGetter3 = rGetter2
|
||||
rKey2 = rKey1; rVer2 = rVer1; rGetter2 = rGetter1
|
||||
rKey1 = tK; rVer1 = tV; rGetter1 = tG
|
||||
val rec0 = g(base, scope)
|
||||
if (base is ObjClass) {
|
||||
val idx0 = base.classScope?.getSlotIndexOf(name)
|
||||
if (idx0 != null) { tKey = key; tVer = ver; tFrameId = scope.frameId; tRecord = rec0 } else { tRecord = null }
|
||||
} else { tRecord = null }
|
||||
return rec0
|
||||
} }
|
||||
rGetter4?.let { g -> if (key == rKey4 && ver == rVer4) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicHit++
|
||||
// move-to-front: promote 4→1
|
||||
val tK = rKey4; val tV = rVer4; val tG = rGetter4
|
||||
rKey4 = rKey3; rVer4 = rVer3; rGetter4 = rGetter3
|
||||
rKey3 = rKey2; rVer3 = rVer2; rGetter3 = rGetter2
|
||||
rKey2 = rKey1; rVer2 = rVer1; rGetter2 = rGetter1
|
||||
rKey1 = tK; rVer1 = tV; rGetter1 = tG
|
||||
val rec0 = g(base, scope)
|
||||
if (base is ObjClass) {
|
||||
val idx0 = base.classScope?.getSlotIndexOf(name)
|
||||
@ -300,7 +362,9 @@ class FieldRef(
|
||||
// Slow path
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicMiss++
|
||||
val rec = base.readField(scope, name)
|
||||
// Install move-to-front with a handle-aware getter. Where safe, capture resolved handles.
|
||||
// Install move-to-front with a handle-aware getter (shift 1→2→3→4; put new at 1)
|
||||
rKey4 = rKey3; rVer4 = rVer3; rGetter4 = rGetter3
|
||||
rKey3 = rKey2; rVer3 = rVer2; rGetter3 = rGetter2
|
||||
rKey2 = rKey1; rVer2 = rVer1; rGetter2 = rGetter1
|
||||
when (base) {
|
||||
is ObjClass -> {
|
||||
@ -336,6 +400,19 @@ class FieldRef(
|
||||
// no-op on null receiver for optional chaining assignment
|
||||
return
|
||||
}
|
||||
// Read→write micro fast-path: reuse transient record captured by get()
|
||||
if (fieldPic) {
|
||||
val (k, v) = receiverKeyAndVersion(base)
|
||||
val rec = tRecord
|
||||
if (rec != null && tKey == k && tVer == v && tFrameId == scope.frameId) {
|
||||
// visibility/mutability checks
|
||||
if (!rec.isMutable) scope.raiseError(ObjIllegalAssignmentException(scope, "can't reassign val $name"))
|
||||
if (!rec.visibility.isPublic)
|
||||
scope.raiseError(ObjAccessException(scope, "can't access non-public field $name"))
|
||||
if (rec.value.assign(scope, newValue) == null) rec.value = newValue
|
||||
return
|
||||
}
|
||||
}
|
||||
if (fieldPic) {
|
||||
val (key, ver) = receiverKeyAndVersion(base)
|
||||
wSetter1?.let { s -> if (key == wKey1 && ver == wVer1) {
|
||||
@ -344,12 +421,37 @@ class FieldRef(
|
||||
} }
|
||||
wSetter2?.let { s -> if (key == wKey2 && ver == wVer2) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicSetHit++
|
||||
// move-to-front: promote 2→1
|
||||
val tK = wKey2; val tV = wVer2; val tS = wSetter2
|
||||
wKey2 = wKey1; wVer2 = wVer1; wSetter2 = wSetter1
|
||||
wKey1 = tK; wVer1 = tV; wSetter1 = tS
|
||||
return s(base, scope, newValue)
|
||||
} }
|
||||
wSetter3?.let { s -> if (key == wKey3 && ver == wVer3) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicSetHit++
|
||||
// move-to-front: promote 3→1
|
||||
val tK = wKey3; val tV = wVer3; val tS = wSetter3
|
||||
wKey3 = wKey2; wVer3 = wVer2; wSetter3 = wSetter2
|
||||
wKey2 = wKey1; wVer2 = wVer1; wSetter2 = wSetter1
|
||||
wKey1 = tK; wVer1 = tV; wSetter1 = tS
|
||||
return s(base, scope, newValue)
|
||||
} }
|
||||
wSetter4?.let { s -> if (key == wKey4 && ver == wVer4) {
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicSetHit++
|
||||
// move-to-front: promote 4→1
|
||||
val tK = wKey4; val tV = wVer4; val tS = wSetter4
|
||||
wKey4 = wKey3; wVer4 = wVer3; wSetter4 = wSetter3
|
||||
wKey3 = wKey2; wVer3 = wVer2; wSetter3 = wSetter2
|
||||
wKey2 = wKey1; wVer2 = wVer1; wSetter2 = wSetter1
|
||||
wKey1 = tK; wVer1 = tV; wSetter1 = tS
|
||||
return s(base, scope, newValue)
|
||||
} }
|
||||
// Slow path
|
||||
if (picCounters) net.sergeych.lyng.PerfStats.fieldPicSetMiss++
|
||||
base.writeField(scope, name, newValue)
|
||||
// Install move-to-front with a handle-aware setter
|
||||
// Install move-to-front with a handle-aware setter (shift 1→2→3→4; put new at 1)
|
||||
wKey4 = wKey3; wVer4 = wVer3; wSetter4 = wSetter3
|
||||
wKey3 = wKey2; wVer3 = wVer2; wSetter3 = wSetter2
|
||||
wKey2 = wKey1; wVer2 = wVer1; wSetter2 = wSetter1
|
||||
when (base) {
|
||||
is ObjClass -> {
|
||||
@ -409,12 +511,21 @@ class IndexRef(
|
||||
}
|
||||
|
||||
override suspend fun setAt(pos: Pos, scope: Scope, newValue: Obj) {
|
||||
val base = target.get(scope).value
|
||||
val fastRval = net.sergeych.lyng.PerfFlags.RVAL_FASTPATH
|
||||
val base = if (fastRval) target.evalValue(scope) else target.get(scope).value
|
||||
if (base == ObjNull && isOptional) {
|
||||
// no-op on null receiver for optional chaining assignment
|
||||
return
|
||||
}
|
||||
val idx = index.get(scope).value
|
||||
val idx = if (fastRval) index.evalValue(scope) else index.get(scope).value
|
||||
if (fastRval) {
|
||||
// Mirror read fast-path with direct write for ObjList + ObjInt index
|
||||
if (base is ObjList && idx is ObjInt) {
|
||||
val i = idx.toInt()
|
||||
base.list[i] = newValue
|
||||
return
|
||||
}
|
||||
}
|
||||
base.putAt(scope, idx, newValue)
|
||||
}
|
||||
}
|
||||
@ -710,7 +821,9 @@ class FastLocalVarRef(
|
||||
|
||||
class ListLiteralRef(private val entries: List<ListEntry>) : ObjRef {
|
||||
override suspend fun get(scope: Scope): ObjRecord {
|
||||
val list = mutableListOf<Obj>()
|
||||
// Heuristic capacity hint: count element entries; spreads handled opportunistically
|
||||
val elemCount = entries.count { it is ListEntry.Element }
|
||||
val list = ArrayList<Obj>(elemCount)
|
||||
for (e in entries) {
|
||||
when (e) {
|
||||
is ListEntry.Element -> {
|
||||
@ -720,7 +833,11 @@ class ListLiteralRef(private val entries: List<ListEntry>) : ObjRef {
|
||||
is ListEntry.Spread -> {
|
||||
val elements = if (net.sergeych.lyng.PerfFlags.RVAL_FASTPATH) e.ref.evalValue(scope) else e.ref.get(scope).value
|
||||
when (elements) {
|
||||
is ObjList -> list.addAll(elements.list)
|
||||
is ObjList -> {
|
||||
// Grow underlying array once when possible
|
||||
if (list is ArrayList) list.ensureCapacity(list.size + elements.list.size)
|
||||
list.addAll(elements.list)
|
||||
}
|
||||
else -> scope.raiseError("Spread element must be list")
|
||||
}
|
||||
}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user