Contents
- Overview
- Symptoms / How to Recognize the Issue
- Reproduced Environment
- Root Cause (Engineering Analysis)
- Troubleshooting / What to Check
- Mitigation Options
- Cache/Address-Space Metrics Request
- Frequently Asked Questions
Overview
On ObjectStore for Windows, “small” read-only transactions can become dramatically slower (reported as up to ~1000× slower) after a large read fills the ObjectStore cache. Engineering analysis determined the slowdown occurs primarily at transaction boundaries (commit/abort), where ObjectStore must re-apply memory protection/invalidation across cached pages using Windows memory protection APIs (for example, VirtualProtect).
The cost scales mainly with cache occupancy / number of cached regions, not the size of the current transaction. This is considered an architectural behavior on Windows rather than a defect addressed by a specific patch in this case.
Solution
Symptoms / How to Recognize the Issue
You may be experiencing this behavior if you see the following sequence:
- Small read-only transactions are fast immediately after startup or with an empty/light cache.
- A large read (or sequence of reads) populates/fills the ObjectStore cache and becomes faster on subsequent runs (expected caching benefit).
- After the cache is full, even very small transactions become orders of magnitude slower (e.g., “up to ~1000× slower”).
- Clearing the cache restores the original small-transaction performance.
Reproduced Environment
- Platform: Windows (reported reproduction on Windows 11, local database and local test)
- ObjectStore version: ObjectStore Cumulative Update Release 2025.1 Update 0
- Example configuration used in testing:
OS_AS_SIZE=0x400000000OS_CACHE_SIZE=0x78000000
Root Cause (Engineering Analysis)
At the end of every transaction (commit or abort), ObjectStore must preserve consistency by invalidating/protecting cached page mappings. On Windows, this is implemented using OS memory protection mechanisms (for example, VirtualProtect) applied over the cached address space.
Key point: the transaction-end cost scales primarily with cache occupancy (and the number of cached memory regions/pages), not with the amount of data touched by the current “small” transaction. After a large read fills the cache, every subsequent transaction boundary can incur the high scan/protect cost, making small transactions appear extremely slow.
This behavior is considered architectural on Windows and was not treated as a defect requiring a patch in this case.
Troubleshooting / What to Check
-
Confirm the workload pattern
- Measure “N small transactions” with an empty/cold cache (baseline).
- Run a “large read” that populates the cache.
- Measure the same “N small transactions” again after the cache is full.
- If performance returns to baseline after clearing cache, it supports this diagnosis.
-
Validate that the slowdown is at transaction boundaries
- If your profiling tools allow, look for time concentrated around commit/abort or transaction teardown when cache is full.
-
Confirm configuration and system context
- Record current values for
OS_CACHE_SIZEandOS_AS_SIZE. - Record CPU core count and whether the workload runs across many cores (relevant to Windows TLB flush / IPI overhead during protection changes).
- Record current values for
Mitigation Options (Choose Based on Workload Constraints)
1) Reduce cache size (OS_CACHE_SIZE) — configuration change
Why it helps: fewer cached pages/regions means less transaction-end scan/protect work.
Trade-off: large reads benefit less from caching.
Example: set OS_CACHE_SIZE smaller, such as:
OS_CACHE_SIZE=0x10000000 // example value (~256 MB)
Validate: rerun the “small transactions after full-cache” scenario and compare end-to-end time.
2) Reduce address space size (OS_AS_SIZE) — configuration change
Why it helps: reduces the range involved in scan/protect operations.
Trade-off: must remain large enough for your dataset plus overhead.
Example:
OS_AS_SIZE=0x100000000 // example value (4 GB)
Validate: repeat the benchmark and confirm both improved timings and stability (no address-space allocation/mapping failures).
3) Batch multiple reads into fewer transactions — code change
Why it helps: the expensive work occurs once per transaction boundary; fewer transactions means fewer times you pay the cost.
Trade-off: longer transactions may hold locks longer and reduce concurrency.
Example pattern:
OS_BEGIN_TXN(txn, 0, os_transaction::read_only)
for (int i = 0; i < <count>; i++) {
processSmallRead(i);
}
OS_END_TXN(txn)
Validate: compare total time for N reads as N transactions versus fewer batched transactions.
4) Clear cache after large/batch operations — code change
Why it helps: removing accumulated cached regions after a large read prevents subsequent small transactions from paying “full-cache” transaction-end overhead.
Trade-off: subsequent accesses will fetch from server/disk again.
Example:
objectstore::return_all_pages();
Validate: confirm that clearing pages returns small-transaction timing to baseline.
5) Co-locate related objects in the same segment — design/code change
Why it helps: more contiguous allocation can reduce the number of distinct memory regions tracked in cache, lowering transaction-end overhead.
Trade-off: requires allocation strategy changes and may affect load-time performance.
Example pattern:
new(os_segment::of(this), ...) DataElement(...);
Validate: measure performance before/after and confirm application concurrency behavior remains acceptable.
6) Reduce active CPU cores / set CPU affinity — system-level change
Why it helps: on Windows, memory protection changes can trigger cross-core TLB flush behavior; fewer cores can reduce that overhead.
Trade-off: reduces parallelism.
Example approach: set CPU affinity for key ObjectStore processes using Windows Task Manager → Details → Set affinity, or programmatically.
Validate: compare the same benchmark with default affinity versus restricted affinity.
Cache/Address-Space Metrics Request
A request to expose “cache usage” and “address space usage” metrics (API/tooling) was reviewed and accepted into the product roadmap. No delivery commitment or ETA is available.
Frequently Asked Questions
- 1. How do I know I’m hitting this specific behavior and not a general performance regression?
- Look for the distinctive sequence: small transactions are fast with an empty cache, become extremely slow only after a large read fills the cache, and then return to baseline after clearing the cache (or after reducing cache/address space).
- 2. Is this a defect fixed in a later ObjectStore patch?
- In this scenario, it was identified as architectural behavior related to transaction-end cache mapping/protection work on Windows (for example,
VirtualProtect), not a specific defect with a patch referenced in the investigation. - 3. Which mitigation is safest to try first if I cannot change application code?
- Start with configuration-only changes: reduce
OS_CACHE_SIZEand/or reduceOS_AS_SIZE(ensuring it remains large enough for your dataset). Then validate by rerunning the same small/large/small benchmark pattern. - 4. What verification steps should I use after changing
OS_CACHE_SIZEorOS_AS_SIZE? - Re-run a controlled benchmark: (1) small transactions with cold cache, (2) a large read to fill cache, (3) the same small transactions again. Confirm that step (3) no longer shows extreme slowdown and that the system remains stable (no mapping/allocation failures).
- 5. If my app is interactive and cannot batch transactions, what should I do?
- Favor tuning
OS_CACHE_SIZEandOS_AS_SIZEto “as small as possible while still meeting workload needs,” and consider targeted cache clearing after known large operations (if your workflow includes any), plus segment co-location strategies to reduce region fragmentation. - 6. Is there an API/tool in the current ObjectStore version to show cache usage and address space usage?
- No such API/tool was confirmed as available in the investigated scenario. A request to add these metrics was accepted into the product roadmap, but no ETA is available.
Priyanka Bhotika
Comments