ObjectStore on Windows: Small Transactions Become Extremely Slow After Cache Is Filled (Transaction-Boundary <code>VirtualProtect</code> Overhead)

Overview
Symptoms / How to Recognize the Issue
Reproduced Environment
Root Cause (Engineering Analysis)
Troubleshooting / What to Check
Mitigation Options
Cache/Address-Space Metrics Request
Frequently Asked Questions

Overview

On ObjectStore for Windows, “small” read-only transactions can become dramatically slower (reported as up to ~1000× slower) after a large read fills the ObjectStore cache. Engineering analysis determined the slowdown occurs primarily at transaction boundaries (commit/abort), where ObjectStore must re-apply memory protection/invalidation across cached pages using Windows memory protection APIs (for example, VirtualProtect).

The cost scales mainly with cache occupancy / number of cached regions, not the size of the current transaction. This is considered an architectural behavior on Windows rather than a defect addressed by a specific patch in this case.

Solution

Symptoms / How to Recognize the Issue

You may be experiencing this behavior if you see the following sequence:

Small read-only transactions are fast immediately after startup or with an empty/light cache.
A large read (or sequence of reads) populates/fills the ObjectStore cache and becomes faster on subsequent runs (expected caching benefit).
After the cache is full, even very small transactions become orders of magnitude slower (e.g., “up to ~1000× slower”).
Clearing the cache restores the original small-transaction performance.

Reproduced Environment

Platform: Windows (reported reproduction on Windows 11, local database and local test)
ObjectStore version: ObjectStore Cumulative Update Release 2025.1 Update 0
Example configuration used in testing:
- OS_AS_SIZE=0x400000000
- OS_CACHE_SIZE=0x78000000

Root Cause (Engineering Analysis)

At the end of every transaction (commit or abort), ObjectStore must preserve consistency by invalidating/protecting cached page mappings. On Windows, this is implemented using OS memory protection mechanisms (for example, VirtualProtect) applied over the cached address space.

Key point: the transaction-end cost scales primarily with cache occupancy (and the number of cached memory regions/pages), not with the amount of data touched by the current “small” transaction. After a large read fills the cache, every subsequent transaction boundary can incur the high scan/protect cost, making small transactions appear extremely slow.

This behavior is considered architectural on Windows and was not treated as a defect requiring a patch in this case.

Troubleshooting / What to Check

Confirm the workload pattern
- Measure “N small transactions” with an empty/cold cache (baseline).
- Run a “large read” that populates the cache.
- Measure the same “N small transactions” again after the cache is full.
- If performance returns to baseline after clearing cache, it supports this diagnosis.
Validate that the slowdown is at transaction boundaries
- If your profiling tools allow, look for time concentrated around commit/abort or transaction teardown when cache is full.
Confirm configuration and system context
- Record current values for OS_CACHE_SIZE and OS_AS_SIZE.
- Record CPU core count and whether the workload runs across many cores (relevant to Windows TLB flush / IPI overhead during protection changes).

Mitigation Options (Choose Based on Workload Constraints)

1) Reduce cache size (`OS_CACHE_SIZE`) — configuration change

Why it helps: fewer cached pages/regions means less transaction-end scan/protect work.

Trade-off: large reads benefit less from caching.

Example: set OS_CACHE_SIZE smaller, such as:

OS_CACHE_SIZE=0x10000000  // example value (~256 MB)

Validate: rerun the “small transactions after full-cache” scenario and compare end-to-end time.

2) Reduce address space size (`OS_AS_SIZE`) — configuration change

Why it helps: reduces the range involved in scan/protect operations.

Trade-off: must remain large enough for your dataset plus overhead.

Example:

OS_AS_SIZE=0x100000000  // example value (4 GB)

Validate: repeat the benchmark and confirm both improved timings and stability (no address-space allocation/mapping failures).

3) Batch multiple reads into fewer transactions — code change

Why it helps: the expensive work occurs once per transaction boundary; fewer transactions means fewer times you pay the cost.

Trade-off: longer transactions may hold locks longer and reduce concurrency.

Example pattern:

OS_BEGIN_TXN(txn, 0, os_transaction::read_only)
  for (int i = 0; i < <count>; i++) {
    processSmallRead(i);
  }
OS_END_TXN(txn)

Validate: compare total time for N reads as N transactions versus fewer batched transactions.

4) Clear cache after large/batch operations — code change

Why it helps: removing accumulated cached regions after a large read prevents subsequent small transactions from paying “full-cache” transaction-end overhead.

Trade-off: subsequent accesses will fetch from server/disk again.

Example:

objectstore::return_all_pages();

Validate: confirm that clearing pages returns small-transaction timing to baseline.

5) Co-locate related objects in the same segment — design/code change

Why it helps: more contiguous allocation can reduce the number of distinct memory regions tracked in cache, lowering transaction-end overhead.

Trade-off: requires allocation strategy changes and may affect load-time performance.

Example pattern:

new(os_segment::of(this), ...) DataElement(...);

Validate: measure performance before/after and confirm application concurrency behavior remains acceptable.

6) Reduce active CPU cores / set CPU affinity — system-level change

Why it helps: on Windows, memory protection changes can trigger cross-core TLB flush behavior; fewer cores can reduce that overhead.

Trade-off: reduces parallelism.

Example approach: set CPU affinity for key ObjectStore processes using Windows Task Manager → Details → Set affinity, or programmatically.

Validate: compare the same benchmark with default affinity versus restricted affinity.

Cache/Address-Space Metrics Request

A request to expose “cache usage” and “address space usage” metrics (API/tooling) was reviewed and accepted into the product roadmap. No delivery commitment or ETA is available.

Frequently Asked Questions

1. How do I know I’m hitting this specific behavior and not a general performance regression?: Look for the distinctive sequence: small transactions are fast with an empty cache, become extremely slow only after a large read fills the cache, and then return to baseline after clearing the cache (or after reducing cache/address space).
2. Is this a defect fixed in a later ObjectStore patch?: In this scenario, it was identified as architectural behavior related to transaction-end cache mapping/protection work on Windows (for example, VirtualProtect), not a specific defect with a patch referenced in the investigation.
3. Which mitigation is safest to try first if I cannot change application code?: Start with configuration-only changes: reduce OS_CACHE_SIZE and/or reduce OS_AS_SIZE (ensuring it remains large enough for your dataset). Then validate by rerunning the same small/large/small benchmark pattern.
4. What verification steps should I use after changing OS_CACHE_SIZE or OS_AS_SIZE?: Re-run a controlled benchmark: (1) small transactions with cold cache, (2) a large read to fill cache, (3) the same small transactions again. Confirm that step (3) no longer shows extreme slowdown and that the system remains stable (no mapping/allocation failures).
5. If my app is interactive and cannot batch transactions, what should I do?: Favor tuning OS_CACHE_SIZE and OS_AS_SIZE to “as small as possible while still meeting workload needs,” and consider targeted cache clearing after known large operations (if your workflow includes any), plus segment co-location strategies to reduce region fragmentation.
6. Is there an API/tool in the current ObjectStore version to show cache usage and address space usage?: No such API/tool was confirmed as available in the investigated scenario. A request to add these metrics was accepted into the product roadmap, but no ETA is available.

Choose files or drag and drop files

ai_initiated

atlas_studio

kb_automation

Tags:

Was this article helpful?

Yes

Priyanka Bhotika
Posted

Comments

Please sign in to comment