Higress AI-Statistics Plugin: Resolving Excessive Memory Allocation In Streaming Requests

by JurnalWarga.com 90 views
Iklan Headers

This article dives deep into a memory allocation issue encountered with the ai-statistics plugin in Higress, specifically when handling streaming requests. We'll explore the root cause, the observed symptoms, and the proposed solution to optimize memory usage. If you're experiencing similar memory spikes or are working with Higress and AI-driven statistics, this guide is for you!

I. Issue Description: The Case of the Growing Heap

The core issue lies within the ai-statistics plugin's handling of streamingBodyBuffer. The plugin's current implementation frequently requests memory when caching this buffer. This behavior triggers significant growth in HeapSys and HeapReleased, which are key indicators of memory usage in Go. The problem is that the memory allocated by the WebAssembly (WASM) component isn't being returned to the operating system (OS) effectively, leading to sustained high memory consumption. This can lead to performance degradation and even application crashes.

The memory footprint remains elevated even after garbage collection (GC) and explicit attempts to release memory back to the OS. This suggests a deeper issue than just uncollected garbage; the allocated memory isn't being properly managed and returned.

II. Unpacking the Problem: What's Happening Under the Hood

The code snippet below reveals the problematic area. The plugin uses append to add data to the streamingBodyBuffer. This might seem straightforward, but it has significant implications for memory management.

	if config.shouldBufferStreamingBody {
		streamingBodyBuffer, ok := ctx.GetContext(CtxStreamingBodyBuffer).([]byte)
		if !ok {
			streamingBodyBuffer = data
		} else {
			streamingBodyBuffer = append(streamingBodyBuffer, data...)
		}
		ctx.SetContext(CtxStreamingBodyBuffer, streamingBodyBuffer)
	}

When using append on a slice in Go, if the underlying array's capacity isn't large enough to accommodate the new data, Go allocates a new, larger array, copies the existing data, and then adds the new data. This frequent reallocation and copying lead to memory fragmentation and increased memory pressure. The old, smaller arrays become garbage, but they might not be immediately collected, especially if they are large. This is exactly what's happening with the streamingBodyBuffer.

During pressure testing, HeapSys rose to approximately 160MB, and HeapReleased climbed to 140MB, demonstrating the extent of the memory allocation issue. While HeapAlloc and HeapInuse remained relatively low, indicating that the allocated memory wasn't necessarily being actively used, the system was still holding onto a significant amount of memory that it couldn't reclaim. The manual calls to runtime.GC() and debug.FreeOSMemory() were ineffective in mitigating the memory usage, further highlighting the nature of the problem.

III. The Expected Outcome: Reducing Memory Footprint

The desired outcome is to minimize memory allocation frequency and overall memory consumption. The proposed solution involves using a fixed-size streamingBodyBuffer. This approach aims to reduce the overhead associated with dynamic memory allocation and copying, leading to a more stable and predictable memory footprint.

By pre-allocating a buffer of a known size, the plugin can avoid the repeated allocation and copying operations caused by append. This not only reduces memory pressure but also improves performance by eliminating the overhead of memory management.

The key benefit of using a fixed-size buffer is that the memory is allocated upfront, and subsequent operations can reuse the same memory space. This eliminates the need for the garbage collector to clean up numerous small allocations, leading to more efficient memory usage and reduced GC overhead. Furthermore, using a fixed size helps to prevent HeapSys and HeapReleased from constantly increasing, which was observed in the test results, demonstrating the effectiveness of this solution.

IV. Replicating the Issue: A Step-by-Step Guide

To reproduce the issue, follow these steps:

  1. Configuration: Include the following configuration in your setup:
- apply_to_log: false
  key: "question"
  value: "[email protected]"
  value_source: "request_body"
- apply_to_log: false
  key: "answer"
  rule: "append"
  value: "choices.0.delta.content"
  value_source: "response_streaming_body"
- apply_to_log: false
  key: "answer"
  value: "choices.0.message.content"
  value_source: "response_body"

This configuration instructs the ai-statistics plugin to capture and log specific data from the request and response bodies, particularly from streaming responses.

  1. Load Testing with Streaming Requests: Initiate pressure testing using streaming requests that return large volumes of data. Tools like k6 or wrk can be used to generate the necessary load. The key is to simulate a high volume of streaming data to trigger the memory allocation issue.

  2. Monitor Memory Growth: Use the provided RecordMemory function (see code below) to track memory usage. Pay close attention to HeapSys, HeapReleased, HeapAlloc, and HeapInuse. The function captures key memory metrics and logs them, allowing you to observe the memory growth pattern.

func RecordMemory() {
	var m runtime.MemStats
	runtime.ReadMemStats(&m)

	// Compute key memory metrics
	heapAllocKB := m.HeapAlloc / 1024
	heapSysKB := m.HeapSys / 1024
	heapInuseKB := m.HeapInuse / 1024
	heapIdleKB := m.HeapIdle / 1024
	heapReleasedKB := m.HeapReleased / 1024
	sysKB := m.Sys / 1024

	// Calculate memory usage and fragmentation rate
	heapUsagePercent := float64(m.HeapInuse) / float64(m.HeapSys) * 100
	heapFragmentationPercent := float64(m.HeapIdle-m.HeapReleased) / float64(m.HeapIdle) * 100

	// Calculate GC-related indicators
	gcPauseNs := m.PauseTotalNs / 1000000 // Convert to milliseconds
	avgGCPauseMs := float64(0)
	if m.NumGC > 0 {
		avgGCPauseMs = float64(gcPauseNs) / float64(m.NumGC)
	}

	memstats := fmt.Sprintf(`{"heap_alloc_kb": %d, "heap_sys_kb": %d, "heap_inuse_kb": %d,"heap_idle_kb": %d, "heap_released_kb": %d, "sys_kb": %d, "heap_usage_percent": %.2f, "heap_fragmentation_percent": %.2f, "num_gc": %d,"total_gc_pause_ms": %d, "avg_gc_pause_ms": %.2f, "num_goroutines": %d, "heap_objects": %d}`, heapAllocKB, heapSysKB, heapInuseKB, heapIdleKB, heapReleasedKB, sysKB, heapUsagePercent, heapFragmentationPercent, m.NumGC, gcPauseNs, avgGCPauseMs, runtime.NumGoroutine(), m.HeapObjects)

	log.Infof("Memory stats: %s", memstats)
}

This function provides a comprehensive view of the memory landscape, including heap allocation, system memory, memory usage percentages, garbage collection statistics, and the number of goroutines. By monitoring these metrics, you can clearly observe the memory growth pattern and confirm the issue.

V. Additional Insights

Understanding the application's specific requirements and data characteristics is crucial for optimizing memory usage. For example, if the streaming data has a predictable maximum size, the fixed-size buffer can be tuned accordingly. If the data size varies significantly, a more sophisticated approach, such as a memory pool or a hybrid strategy, might be necessary.

VI. Environment Details

  • Higress version: 2.1.6
  • OS: (Please specify the operating system used)
  • Others: (Include any other relevant environment details, such as Go version, WASM runtime, etc.)

Conclusion: Optimizing Memory for AI Statistics

This article has explored a significant memory allocation issue within the ai-statistics plugin when handling streaming requests. By understanding the root cause—the frequent memory allocation caused by append—and implementing the proposed solution of using a fixed-size buffer, we can significantly reduce memory consumption and improve the overall stability and performance of Higress. Remember to tailor your approach to your specific application needs and data characteristics for optimal results. This helps to ensure that the AI statistics collection runs smoothly without causing excessive memory usage, enhancing the reliability and effectiveness of the system.