Add D3D12 GPU backend.#1437
Conversation
…GPU enum conversions.
… vendor build config for Windows.
…th create and import methods in D3D12GPU.
…12_backend # Conflicts: # CMakeLists.txt # include/tgfx/gpu/Backend.h # src/gpu/Backend.cpp # vendor.json
…rt and command queue insert.
…abandoned-session reclaim path.
… DXBC for use by D3D12 pipelines.
…oses binding metadata for the upcoming render pass.
…ss descriptor heaps and inflight submission tracking enabling 199 of 435 tests to pass.
…ock and submission paths once the device is lost.
… readback copies through a transient staging buffer to satisfy D3D12 row pitch requirements.
…et sampled textures to COMMON when a render pass ends to keep our resource state tracker aligned with D3D12 implicit decay rules.
…he auto-breadcrumb history and any page-fault info next to the failure that triggered them.
…nel and dispatch it from D3D12CommandEncoder::generateMipmapsForTexture for textures created with mip levels.
…d-code stripping so D3D12 PSO creation accepts the cross-compiled HLSL input signature.
…via ResolveSubresource at render-pass end matching Vulkan's pResolveAttachments behaviour.
…ce::MakeFrom on a sampling-only backend handle no longer triggers device-removal.
…enderable so backend-agnostic createTexture gating no longer rejects depth attachment textures.
…a process-wide CBV/SRV/UAV ring and an append-only Sampler heap so command list bind-once and mipmap dispatches reuse the same descriptor budget.
…ncoders and transient upload paths so reclaimed submissions Reset existing objects instead of paying the CreateCommandAllocator and CreateCommandList cost on every frame.
…4MB UPLOAD ring with per-call fallback so the steady-state pixel upload path skips CreateCommittedResource and Map entirely.
…d-of-pass cleanup, and texture-to-texture copy so each phase issues a single ResourceBarrier call instead of one barrier per resource.
… sharing the same uniform-block visibility plus sampler-count layout reuse one ID3D12RootSignature instead of paying SerializeRootSignature and CreateRootSignature per pipeline build.
…rams sharing the same vertex or fragment GLSL skip a redundant GLSL to SPIR-V to HLSL to DXBC compile chain.
… shaders compile and render again.
…rs and include sampler visibility in the root signature shape key.
…th serve both adopted and non-adopted backend imports because COM reference counting already keeps the resource alive for either contract.
…er so multiple UBOs per shader stage get distinct b0, b1 slots and the root signature shape cache stays correct.
…ore single-argument constructors explicit to match the OpenGL, Metal, and Vulkan overloads.
…d bill the wrap-around case in both ring commit paths so a capacity-spanning allocation cannot be misread as an empty ring on the next allocate.
…aging path needed to put static vertex and index data on a default heap to match the Metal TODO.
…PU branch stops dropping command lists while the GPU may still be reading them.
… coded and link the unblock to extending PrimitiveDescriptor with primitive and index format fields.
…s in setIndexBuffer so a size_t underflow cannot publish a four gigabyte buffer view to the GPU.
…AV flag the mipmap generator needs cannot coexist with SampleDesc.Count > 1, matching the existing GL multisample texture guard.
…3D12 sampler heap stops retrying CreateSampler on every draw call.
…resource all target subresource zero so anyone adding array or per-mip variants knows every backend has to move together.
…d rings instead of paying for a CreateDescriptorHeap on every render pass.
…copyable and standard layout, matching the existing VulkanTypes guard, so a future field cannot quietly break the BackendTexture union.
… so an abandoned encoder rolls _currentState back instead of leaving it ahead of the GPU and tripping Before state mismatch on the next pass.
…is lost so inflight slots that will never be retired stop saturating outstanding counters.
…calls releaseFrame without static casting from the base RenderTargetProxy.
…r and call out the cache key update needed if SamplerDescriptor ever exposes a borderColor field across all backends.
shlzxjp
left a comment
There was a problem hiding this comment.
D3D12 后端二次评审:18 条原问题修复确认 + 3 条新发现
非常感谢这次大力度的跟进——14 条实质修复 + 4 条文档说明覆盖了原审全部 18 条问题,其中 P3 / P13 / P17 / P18 是架构级重构而非局部补丁,注释也写得很到位。
本轮二次评审中发现了 3 条 🟢 低优先级 的清理项(N1 / N2 / N3),均为修复过程中遗留的小尾巴,不构成合入阻塞,建议作为 follow-up cleanup。
| // would convince the next allocate() that the ring is empty and hand back slots the GPU is | ||
| // still reading. | ||
| uint32_t head = 0; | ||
| uint32_t tail = 0; |
There was a problem hiding this comment.
【N1 低】tail 字段在 P16 重构后成为死代码
P16 引入 outstandingSlots 后,allocate() 的安全判定不再需要 tail:
# DescriptorRing 中 tail 的所有出现:
D3D12DescriptorRing.cpp:48 tail = 0; // init
D3D12DescriptorRing.cpp:121 tail = ...; // retire(仅写)
D3D12DescriptorRing.cpp:136 tail = 0; // resetForContextLost
allocate() 现在完全靠 _capacity - outstandingSlots 计算 free,不再读 tail;commit() 用的也是 head / committedHead。同样的死代码也存在于 D3D12UploadHeap.h:122 与 D3D12UploadHeap.cpp:73, 148, 169, 179。
影响:仅是字段噪音 + 注释描述与实际语义不一致(如 D3D12DescriptorRing.h:114-115 仍称 (head, tail) pair alone cannot disambiguate,但 tail 实际已不参与判定)。
建议:移除 tail 字段、init/retire/resetForContextLost 中所有写入、以及头文件中提到 tail 的注释(保留对 outstanding 计数器的解释)。两个 ring 一起清理。
| head = 0; | ||
| tail = 0; | ||
| committedHead = 0; | ||
| outstandingSlots = 0; |
There was a problem hiding this comment.
【N2 低】init() 不清理 inflight 队列,re-init 时存在状态错位
bool D3D12DescriptorRing::init(...) {
...
head = 0;
tail = 0;
committedHead = 0;
outstandingSlots = 0;
return true; // ← inflight 未 clear
}D3D12UploadHeap::init(D3D12UploadHeap.cpp:67-76)有同样问题。
影响:当前 codebase 没有 re-init 路径(D3D12GPU::releaseAll 走 clear() 而非 init() 二次调用),所以今天不会触发。但 init() 的语义是"把 ring 重置到刚创建状态",不清理 inflight 是防御性纰漏——一旦未来出现 device lost 后重建的恢复路径,残留的 inflight 会指向已释放的旧 heap。
建议:在两个 init() 末尾加一行 inflight.clear();,保持"init 后即 fresh state"的不变量。
| if (vsReg != 0xFF && fsReg != 0xFF && vsReg != fsReg) { | ||
| LOGE( | ||
| "D3D12RenderPipeline: VertexFragment-visible UBO binding %u maps to mismatched " | ||
| "stage-local registers (vs=b%u, ps=b%u). Reorder uniformBlocks so shared UBOs share " | ||
| "their stage-local index in both stages.", | ||
| entry.binding, static_cast<unsigned>(vsReg), static_cast<unsigned>(fsReg)); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
【N3 低】LOGE 文案对修复方向有误导
if (vsReg != 0xFF && fsReg != 0xFF && vsReg != fsReg) {
LOGE(
"D3D12RenderPipeline: VertexFragment-visible UBO binding %u maps to mismatched "
"stage-local registers (vs=b%u, ps=b%u). Reorder uniformBlocks so shared UBOs share "
"their stage-local index in both stages.",
entry.binding, static_cast<unsigned>(vsReg), static_cast<unsigned>(fsReg));
return false;
}问题:文案建议"Reorder uniformBlocks so shared UBOs share their stage-local index in both stages",但仅靠重排无法解决。例如序列 [Vertex-only(b0), VertexFragment],VertexFragment 在 vertex 必为 b1(vertex-only 已占 b0),在 fragment 必为 b0——重排成 [VertexFragment, Vertex-only(b0)] 后变成 vertex 阶段 b0, b1、fragment 阶段 b0,VertexFragment 在两个 stage 拿到的索引仍可能不一致(取决于 fragment-only 同侪情况)。真正的修复方向是:
- 把 VertexFragment-visible UBO 拆成 vertex-only + fragment-only 两条 entry;或
- 给该 UBO 分配两个独立 root parameter(接受 root signature 大小增加)。
实际触发概率极低(TGFX 内置 UBO 都是 stage-only),但既然这条 LOGE 是给上层调用者的提示,文案应当指向真正可行的解决方案。
建议改为:
"D3D12RenderPipeline: VertexFragment-visible UBO binding %u cannot share a single CBV root "
"parameter when its vertex-stage register (b%u) and fragment-stage register (b%u) differ. "
"Either split it into vertex-only and fragment-only entries, or extend the root signature to "
"emit two CBV root parameters for this binding."
… dead code once outstandingSlots and outstandingBytes started gating allocations.
…future re-init path cannot inherit stale fence records pointing at a just-released heap.
…(split or two root parameters) instead of suggesting an impossible reorder.
…dless CI runners can exercise the D3D12 backend on the WARP software rasterizer.
| * Creates a new D3D12Device from an existing ID3D12Device. The device parameter is a pointer to | ||
| * an ID3D12Device object. Returns nullptr if the device is invalid. | ||
| */ | ||
| static std::shared_ptr<D3D12Device> MakeFrom(void* device); |
There was a problem hiding this comment.
注释未说明 device 参数的具体类型(ID3D12Device*)和所有权语义。调用者不清楚:1)void* 实际指向何种类型;2)tgfx 是否对该对象调用 AddRef,以及何时/是否调用 Release。建议补全,例如:@param device A pointer to an existing ID3D12Device. The caller retains ownership; tgfx internally calls QueryInterface (AddRef) and releases its own reference when the D3D12Device is destroyed.
| // bit-for-bit on even-divided mip levels and follows GPU-driver edge handling on odd ones. The | ||
| // older quincunx (four 0.25-texel offsets) effectively did a 16-tap blur and produced softer | ||
| // mips than the other backends. | ||
| static constexpr const char* kHLSLSource = R"( |
There was a problem hiding this comment.
命名违规:项目规范要求 static constexpr 使用全大写下划线命名,且禁止 k 前缀。建议重命名为 HLSL_SOURCE。
| auto& src = colorAttachments[i]; | ||
| auto& resolveDst = resolveTextures[i]; | ||
| if (src == nullptr) { | ||
| continue; |
There was a problem hiding this comment.
死代码:colorAttachments[i] 在进入此循环前已经过非空校验(beginRenderPass 阶段)。此处 if (src == nullptr) 永远不为真,可直接删除,减少干扰。
| // pipeline's root signature that still places it at b0. | ||
| shapeKey.push_back(static_cast<uint8_t>(entry.visibility & 0xFF)); | ||
| shapeKey.push_back(static_cast<uint8_t>((entry.visibility >> 8) & 0xFF)); | ||
| shapeKey.push_back(ubVertexRegister[i]); |
There was a problem hiding this comment.
shapeKey 此处编码的是原始 entry.visibility 枚举值(可能是 Vertex/Fragment/VertexFragment),但 ToD3D12ShaderVisibility() 会将 VertexFragment 折叠成 D3D12_SHADER_VISIBILITY_ALL,与 Vertex-only 或 Fragment-only 最终产生不同的 root signature。若未来引入其他 visibility 组合(如 All = 0xFF),不同组合可能折叠为相同的 D3D12 visibility,却因 shapeKey 不同而无法共享 root signature 缓存,或反过来两个不同值折叠后相同但 key 不同导致缓存浪费。建议将 visibility 先通过 ToD3D12ShaderVisibility() 转换后再写入 shapeKey,保持缓存键与实际 D3D12 对象一一对应。
| case CullMode::Back: | ||
| return D3D12_CULL_MODE_BACK; | ||
| } | ||
| return D3D12_CULL_MODE_NONE; |
There was a problem hiding this comment.
switch 缺少 default 分支,与同文件其他转换函数(如 ToD3D12CompareFunction)风格不一致。虽然函数末尾有兜底 return,但若枚举未来新增值,编译器不会警告遗漏的 case。建议补 default: 分支,或至少加 DEBUG_ASSERT(false) 明示意外值。ToD3D12FrontCounterClockwise 同理。
| // MakeSamplerKey must include it in the cache key (otherwise two samplers differing only in | ||
| // border colour would collide in samplerCache). Keep the three backends in sync. | ||
| samplerDesc.BorderColor[0] = 0.0f; | ||
| samplerDesc.BorderColor[1] = 0.0f; |
There was a problem hiding this comment.
D3D12GPU::initFeatures() 声明 clampToBorder = true,表示支持任意 border color,但实际实现固定为透明黑(0,0,0,0)。D3D12 允许配置三种 border color(透明黑/不透明黑/不透明白),若上层代码根据 clampToBorder 能力位尝试使用其他 border color,将静默失效。建议在注释里补充说明此实现仅支持透明黑,或将能力声明限制为只声明支持 ClampToBorder 地址模式而非任意颜色。
新增 D3D12 GPU 后端,使 TGFX 在 Windows 上获得与 Vulkan、Metal 平行的原生渲染路径。
主要内容: