Higress AI-Statistics Plugin: Resolving Excessive Memory Allocation In Streaming Requests
This article dives deep into a memory allocation issue encountered with the ai-statistics plugin in Higress, specifically when handling streaming requests. We'll explore the root cause, the observed symptoms, and the proposed solution to optimize memory usage. If you're experiencing similar memory spikes or are working with Higress and AI-driven statistics, this guide is for you!
I. Issue Description: The Case of the Growing Heap
The core issue lies within the ai-statistics plugin's handling of streamingBodyBuffer
. The plugin's current implementation frequently requests memory when caching this buffer. This behavior triggers significant growth in HeapSys
and HeapReleased
, which are key indicators of memory usage in Go. The problem is that the memory allocated by the WebAssembly (WASM) component isn't being returned to the operating system (OS) effectively, leading to sustained high memory consumption. This can lead to performance degradation and even application crashes.
The memory footprint remains elevated even after garbage collection (GC) and explicit attempts to release memory back to the OS. This suggests a deeper issue than just uncollected garbage; the allocated memory isn't being properly managed and returned.
II. Unpacking the Problem: What's Happening Under the Hood
The code snippet below reveals the problematic area. The plugin uses append
to add data to the streamingBodyBuffer
. This might seem straightforward, but it has significant implications for memory management.
if config.shouldBufferStreamingBody {
streamingBodyBuffer, ok := ctx.GetContext(CtxStreamingBodyBuffer).([]byte)
if !ok {
streamingBodyBuffer = data
} else {
streamingBodyBuffer = append(streamingBodyBuffer, data...)
}
ctx.SetContext(CtxStreamingBodyBuffer, streamingBodyBuffer)
}
When using append
on a slice in Go, if the underlying array's capacity isn't large enough to accommodate the new data, Go allocates a new, larger array, copies the existing data, and then adds the new data. This frequent reallocation and copying lead to memory fragmentation and increased memory pressure. The old, smaller arrays become garbage, but they might not be immediately collected, especially if they are large. This is exactly what's happening with the streamingBodyBuffer
.
During pressure testing, HeapSys
rose to approximately 160MB, and HeapReleased
climbed to 140MB, demonstrating the extent of the memory allocation issue. While HeapAlloc
and HeapInuse
remained relatively low, indicating that the allocated memory wasn't necessarily being actively used, the system was still holding onto a significant amount of memory that it couldn't reclaim. The manual calls to runtime.GC()
and debug.FreeOSMemory()
were ineffective in mitigating the memory usage, further highlighting the nature of the problem.
III. The Expected Outcome: Reducing Memory Footprint
The desired outcome is to minimize memory allocation frequency and overall memory consumption. The proposed solution involves using a fixed-size streamingBodyBuffer
. This approach aims to reduce the overhead associated with dynamic memory allocation and copying, leading to a more stable and predictable memory footprint.
By pre-allocating a buffer of a known size, the plugin can avoid the repeated allocation and copying operations caused by append
. This not only reduces memory pressure but also improves performance by eliminating the overhead of memory management.
The key benefit of using a fixed-size buffer is that the memory is allocated upfront, and subsequent operations can reuse the same memory space. This eliminates the need for the garbage collector to clean up numerous small allocations, leading to more efficient memory usage and reduced GC overhead. Furthermore, using a fixed size helps to prevent HeapSys and HeapReleased from constantly increasing, which was observed in the test results, demonstrating the effectiveness of this solution.
IV. Replicating the Issue: A Step-by-Step Guide
To reproduce the issue, follow these steps:
- Configuration: Include the following configuration in your setup:
- apply_to_log: false
key: "question"
value: "[email protected]"
value_source: "request_body"
- apply_to_log: false
key: "answer"
rule: "append"
value: "choices.0.delta.content"
value_source: "response_streaming_body"
- apply_to_log: false
key: "answer"
value: "choices.0.message.content"
value_source: "response_body"
This configuration instructs the ai-statistics plugin to capture and log specific data from the request and response bodies, particularly from streaming responses.
-
Load Testing with Streaming Requests: Initiate pressure testing using streaming requests that return large volumes of data. Tools like
k6
orwrk
can be used to generate the necessary load. The key is to simulate a high volume of streaming data to trigger the memory allocation issue. -
Monitor Memory Growth: Use the provided
RecordMemory
function (see code below) to track memory usage. Pay close attention toHeapSys
,HeapReleased
,HeapAlloc
, andHeapInuse
. The function captures key memory metrics and logs them, allowing you to observe the memory growth pattern.
func RecordMemory() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
// Compute key memory metrics
heapAllocKB := m.HeapAlloc / 1024
heapSysKB := m.HeapSys / 1024
heapInuseKB := m.HeapInuse / 1024
heapIdleKB := m.HeapIdle / 1024
heapReleasedKB := m.HeapReleased / 1024
sysKB := m.Sys / 1024
// Calculate memory usage and fragmentation rate
heapUsagePercent := float64(m.HeapInuse) / float64(m.HeapSys) * 100
heapFragmentationPercent := float64(m.HeapIdle-m.HeapReleased) / float64(m.HeapIdle) * 100
// Calculate GC-related indicators
gcPauseNs := m.PauseTotalNs / 1000000 // Convert to milliseconds
avgGCPauseMs := float64(0)
if m.NumGC > 0 {
avgGCPauseMs = float64(gcPauseNs) / float64(m.NumGC)
}
memstats := fmt.Sprintf(`{"heap_alloc_kb": %d, "heap_sys_kb": %d, "heap_inuse_kb": %d,"heap_idle_kb": %d, "heap_released_kb": %d, "sys_kb": %d, "heap_usage_percent": %.2f, "heap_fragmentation_percent": %.2f, "num_gc": %d,"total_gc_pause_ms": %d, "avg_gc_pause_ms": %.2f, "num_goroutines": %d, "heap_objects": %d}`, heapAllocKB, heapSysKB, heapInuseKB, heapIdleKB, heapReleasedKB, sysKB, heapUsagePercent, heapFragmentationPercent, m.NumGC, gcPauseNs, avgGCPauseMs, runtime.NumGoroutine(), m.HeapObjects)
log.Infof("Memory stats: %s", memstats)
}
This function provides a comprehensive view of the memory landscape, including heap allocation, system memory, memory usage percentages, garbage collection statistics, and the number of goroutines. By monitoring these metrics, you can clearly observe the memory growth pattern and confirm the issue.
V. Additional Insights
Understanding the application's specific requirements and data characteristics is crucial for optimizing memory usage. For example, if the streaming data has a predictable maximum size, the fixed-size buffer can be tuned accordingly. If the data size varies significantly, a more sophisticated approach, such as a memory pool or a hybrid strategy, might be necessary.
VI. Environment Details
- Higress version: 2.1.6
- OS: (Please specify the operating system used)
- Others: (Include any other relevant environment details, such as Go version, WASM runtime, etc.)
Conclusion: Optimizing Memory for AI Statistics
This article has explored a significant memory allocation issue within the ai-statistics plugin when handling streaming requests. By understanding the root cause—the frequent memory allocation caused by append
—and implementing the proposed solution of using a fixed-size buffer, we can significantly reduce memory consumption and improve the overall stability and performance of Higress. Remember to tailor your approach to your specific application needs and data characteristics for optimal results. This helps to ensure that the AI statistics collection runs smoothly without causing excessive memory usage, enhancing the reliability and effectiveness of the system.