Fragment Shader 전환
ChosenPros
- Async dependency removed, pipeline simplified, RT format freedom
Cons
- Requires full replacement

Migrated Compute Shader-based Bloom to Fragment Shader
Pros
Cons
Pros
Cons
Why Fragment Shader 전환: Chosen: Fragment Shader migration Removing frame latency to fix the bug erases Compute's performance advantage. Took the opportunity to clean up the History RT, GC allocations, and other accumulated tech debt.
Replaced Bloom Passes 4 (Prefilter), 5 (DownSample), and 6 (UpSample) with Fragment-Shader-based equivalents.
ComputePass → RasterRenderPass chainDispatchCompute → DrawProcedural blitenableRandomWrite = falseRemoved the permanent COSBloomHistoryFrameRT RT system. The Bloom result is now handed off directly through the bloomResultTexture field.
→ Frees 4.15 MB always-resident memory and removes the 1-frame latency.
Replaced per-frame new arrays with fixed-size field caches:
PreMiscPass: fully skipped when miscActivated == false.
(Once Distortion/RadialBlur are confirmed unused, the pass will be removed.)
Tools: Memory Profiler / AGI · Build: Dev · Scene:
| Device | GPU | API | RenderTexture | COSPostProcessing GPU time | ||
|---|---|---|---|---|---|---|
| Before | After | Before | After | |||
| Galaxy S21 | Mali-G78 | Vulkan 1.1.0 | 95.0 MB | 79.9 MB (-15.1 MB) | 8.368 ms | 1.263 ms (−85%) |
Galaxy S21


Galaxy S21


RT memory
| RT | Before | After | Notes |
|---|---|---|---|
| HistoryTexture × 2 | 10.0 MB | 0 MB | System removed |
| TempRTBloom0 | 5.0 MB | 1.0 MB | Format + resolution |
| TempRTBloom1 | 1.3 MB | 395.5 KB | |
| TempRTBloom2 | 342.9 KB | 52.9 KB | |
| Bloom RT total | 16.6 MB | 1.4 MB (-91%) | |
| Total RenderTexture | 95.0 MB | 79.9 MB (-15.1 MB) |
GPU performance (Android GPU Inspector)
| Item | Before | After |
|---|---|---|
| COSPostProcessing GPU time | 8.368 ms | 1.263 ms (−85%) |
| Base resolution | 1/2 | 1/4 |
| DownSample samples | 5/pixel | 4/pixel |
| UpSample samples | 8/pixel | 4/pixel |
| UpSample Interpolator | 18 floats | 8 floats |
Tradeoffs
The Async buffer-overlap structure of the Compute Shader was the root cause of buffer collisions in dialog / screen-overlay environments.
Removing the frame latency to fix the bug erased Compute's performance advantage. Taking that opportunity, accumulated structural debt — History RT, GC allocations, dead code — was cleaned up by migrating to a Fragment Shader.
Result: COSPostProcessing GPU time −85% (8.37ms → 1.26ms),
Bloom RT memory −91% (16.6 MB → 1.4 MB), 3 rendering bugs fixed.
Visual quality remained within the acceptable range per QA review.
A case where bug response triggered refactoring — digging into the root cause often opens far larger improvement room than a localized fix.