<fix>[storage]: zbs vhost coderabbit review hardening#4320
<fix>[storage]: zbs vhost coderabbit review hardening#4320zstack-robot-2 wants to merge 2 commits into
Conversation
CR4 docs fenced-code language tags; CR6 fold empty node-health to Disconnected (not Connected); CR8 atomic upsert for per-protocol host connectivity row; CR9 roll back the output-protocol when most hosts fail to prepare it. CR3/CR7 are ZStack conventions, kept as-is. Resolves: ZSTAC-85515 Change-Id: I1a2b57c6e700ab82aa27780e24288f26e972d851
Guards ExternalPrimaryStorage.doAddProtocol: when most hosts fail to prepare a protocol, the API must fail and the output-protocol registration must roll back so an unusable protocol is not exposed. Resolves: ZSTAC-85515 Change-Id: If1f61defa16a7a4445121767d05d37e7c56f48e1
|
Warning
|
| Layer / File(s) | Summary |
|---|---|
updateHostProtocolStatus 改为原子 upsert storage/.../ExternalPrimaryStorageFactory.java |
新增 javax.persistence.Query 导入;将原"先查询再 persist/update"流程替换为 INSERT ... ON DUPLICATE KEY UPDATE 原生 SQL upsert,消除并发重复行竞争,lastOpDate 仅在 status 变化时更新,日志改为 "upserted protocol connectivity"。 |
checkHostStatus 空健康信息处理 plugin/.../ExternalPrimaryStorageKvmFactory.java |
新增 VolumeProtocol 导入;success(NodeHealthy) 回调中对 healthy == null 或空列表场景显式设置 Disconnected 并记录错误,非空时仍执行协议逐项更新与 allMatch 判定。 |
doAddProtocol 多数主机失败回滚 storage/.../ExternalPrimaryStorage.java |
host 准备失败时将错误加入 errorCodeList;WhileDoneCompletion 中若失败数超过主机总数一半则调用新增的 unregisterOutputProtocol 删除 PrimaryStorageOutputProtocolRefVO 记录并失败回调,否则保留注册继续。 |
多数主机失败回滚回归测试 test/.../ZbsVhostVolumeCase.groovy |
新增 testAddProtocolRollsBackWhenMajorityHostsFail():注入 DEPLOY_CLIENT_PATH 模拟多数失败,执行 addStorageProtocol 并断言 API 失败且协议引用行已回滚,最后恢复注入并解绑主存储。test() 中新增对该方法的调用。 |
API 文档代码围栏格式修正
| Layer / File(s) | Summary |
|---|---|
API 文档示例代码围栏补全 docs/zbs-vhost/40-frontend-protocol-api.md |
在协议枚举(text)及 §3.3、§4、§5、§5.1、§6 的请求示例(http)代码块开头补全缺失的 Markdown 围栏语言标识,无描述性内容变更。 |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
🐰 小兔子翻看代码树,
upsert 一击消竞争迷,
空健康信息不再忽视,
多数失败则优雅撤离,
文档围栏也整整齐齐~
代码清洁,世界美丽! 🌸
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | 标题准确地反映了主要变更内容:针对 ZBS Vhost 功能进行代码审查相关的修复和硬化改进,符合本 PR 实现 CodeRabbit 评论建议的核心目标。 |
| Description check | ✅ Passed | 描述充分说明了 PR 背景、具体变更项(CR4、CR6、CR8、CR9)及其对应代码位置和改进意义,完全相关于代码改动内容,并包含测试验证说明。 |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
sync/jin.ma/fix/zbs-vhost-coderabbit
Comment @coderabbitai help to get the list of available commands.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java (1)
69-79: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win修正
hostStatus的键维度,否则状态比较会长期失真。Line 71 当前把
hostUuid作为 map key,但 Line 271 用的是extPs.getUuid()(psUuid)取值,导致比较条件几乎总是失配,updateHostStatus(...)会被反复触发。🔧 建议修复
- private Map<String, PrimaryStorageHostStatus> getHostStatus(List<ExternalPrimaryStorageVO> extPss) { + private Map<String, PrimaryStorageHostStatus> getHostStatus(String hostUuid, List<ExternalPrimaryStorageVO> extPss) { return Q.New(PrimaryStorageHostRefVO.class) - .select(PrimaryStorageHostRefVO_.hostUuid, PrimaryStorageHostRefVO_.status) + .select(PrimaryStorageHostRefVO_.primaryStorageUuid, PrimaryStorageHostRefVO_.status) + .eq(PrimaryStorageHostRefVO_.hostUuid, hostUuid) .in(PrimaryStorageHostRefVO_.primaryStorageUuid, extPss.stream().map(ExternalPrimaryStorageVO::getUuid).collect(Collectors.toList())) .listTuple().stream() .collect(Collectors.toMap( t -> t.get(0, String.class), t -> t.get(1, PrimaryStorageHostStatus.class), (o, n) -> n )); } @@ - Map<String, PrimaryStorageHostStatus> hostStatus = getHostStatus(extPss); + Map<String, PrimaryStorageHostStatus> hostStatus = getHostStatus(host.getUuid(), extPss);Also applies to: 239-273
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java` around lines 69 - 79, The getHostStatus method currently uses hostUuid as the map key when building the result from PrimaryStorageHostRefVO, but the code that consumes this map at line 271 looks up values using primaryStorageUuid (from extPs.getUuid()). This mismatch causes the status comparison to consistently fail and triggers repeated calls to updateHostStatus. Change the map key in the Collectors.toMap call from hostUuid (t.get(0, String.class)) to primaryStorageUuid, and update the select clause in the query to include or replace hostUuid with primaryStorageUuid so that the correct field is available as the map key.
🧹 Nitpick comments (2)
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy (1)
670-732: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win把恢复与解绑放入
finally,避免失败路径污染测试环境。Line 727-731 的清理逻辑当前依赖前置断言全部通过。建议用
try/finally固化恢复路径,避免失败后残留挂载状态或模拟器状态影响后续清理。🧪 建议修复
- deployClientShouldFail.set(true) - expectApiFailure({ - addStorageProtocol { - uuid = ps.uuid - outputProtocol = VolumeProtocol.Vhost.toString() - } - }) { - assert delegate != null : "addStorageProtocol must fail when the only connected host cannot be prepared" - } - - assert !Q.New(PrimaryStorageOutputProtocolRefVO.class) - .eq(PrimaryStorageOutputProtocolRefVO_.primaryStorageUuid, ps.uuid) - .eq(PrimaryStorageOutputProtocolRefVO_.outputProtocol, VolumeProtocol.Vhost.toString()) - .isExists() : \ - "Vhost output protocol left exposed after a majority of hosts failed to prepare it; " + - "doAddProtocol must roll back PrimaryStorageOutputProtocolRefVO on majority-failure (CR9)" - - // restore so a later add of Vhost (and env teardown) behaves normally - deployClientShouldFail.set(false) - detachPrimaryStorageFromCluster { - primaryStorageUuid = ps.uuid - clusterUuid = cluster.uuid - } + deployClientShouldFail.set(true) + try { + expectApiFailure({ + addStorageProtocol { + uuid = ps.uuid + outputProtocol = VolumeProtocol.Vhost.toString() + } + }) { + assert delegate != null : "addStorageProtocol must fail when the only connected host cannot be prepared" + } + + assert !Q.New(PrimaryStorageOutputProtocolRefVO.class) + .eq(PrimaryStorageOutputProtocolRefVO_.primaryStorageUuid, ps.uuid) + .eq(PrimaryStorageOutputProtocolRefVO_.outputProtocol, VolumeProtocol.Vhost.toString()) + .isExists() : \ + "Vhost output protocol left exposed after a majority of hosts failed to prepare it; " + + "doAddProtocol must roll back PrimaryStorageOutputProtocolRefVO on majority-failure (CR9)" + } finally { + deployClientShouldFail.set(false) + detachPrimaryStorageFromCluster { + primaryStorageUuid = ps.uuid + clusterUuid = cluster.uuid + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy` around lines 670 - 732, The cleanup logic at the end of the testAddProtocolRollsBackWhenMajorityHostsFail method (resetting deployClientShouldFail and detaching the primary storage) currently executes only if all preceding assertions pass. Wrap the main test logic in a try block and move the cleanup statements (deployClientShouldFail.set(false) and detachPrimaryStorageFromCluster) into a finally block to ensure the recovery and detachment always occur regardless of whether assertions fail, preventing test environment pollution and state leakage to subsequent tests.storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)
2430-2449: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win回滚时建议同步清理 host-protocol 状态行,避免遗留“幽灵协议状态”。
当前 Line 2443-2449 只删除
PrimaryStorageOutputProtocolRefVO,但在此前失败分支已经写入过ExternalPrimaryStorageHostProtocolRefVO。多数失败回滚后,这些状态行可能与“协议未暴露”状态不一致。♻️ 建议补齐回滚清理
private void unregisterOutputProtocol(String psUuid, String protocol) { SQL.New("delete from PrimaryStorageOutputProtocolRefVO ref" + " where ref.primaryStorageUuid = :psUuid and ref.outputProtocol = :protocol") .param("psUuid", psUuid) .param("protocol", protocol) .execute(); + + SQL.New("delete from ExternalPrimaryStorageHostProtocolRefVO ref" + + " where ref.primaryStorageUuid = :psUuid and ref.protocol = :protocol") + .param("psUuid", psUuid) + .param("protocol", protocol) + .execute(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java` around lines 2430 - 2449, The unregisterOutputProtocol method currently only deletes PrimaryStorageOutputProtocolRefVO records but leaves orphaned ExternalPrimaryStorageHostProtocolRefVO records that were created during the failed initialization attempts. This creates inconsistent state where host-protocol associations remain even though the protocol is unregistered. Extend the SQL delete operation in the unregisterOutputProtocol method to also clean up the ExternalPrimaryStorageHostProtocolRefVO records matching the same primaryStorageUuid and outputProtocol parameters, ensuring complete rollback consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java`:
- Around line 69-79: The getHostStatus method currently uses hostUuid as the map
key when building the result from PrimaryStorageHostRefVO, but the code that
consumes this map at line 271 looks up values using primaryStorageUuid (from
extPs.getUuid()). This mismatch causes the status comparison to consistently
fail and triggers repeated calls to updateHostStatus. Change the map key in the
Collectors.toMap call from hostUuid (t.get(0, String.class)) to
primaryStorageUuid, and update the select clause in the query to include or
replace hostUuid with primaryStorageUuid so that the correct field is available
as the map key.
---
Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 2430-2449: The unregisterOutputProtocol method currently only
deletes PrimaryStorageOutputProtocolRefVO records but leaves orphaned
ExternalPrimaryStorageHostProtocolRefVO records that were created during the
failed initialization attempts. This creates inconsistent state where
host-protocol associations remain even though the protocol is unregistered.
Extend the SQL delete operation in the unregisterOutputProtocol method to also
clean up the ExternalPrimaryStorageHostProtocolRefVO records matching the same
primaryStorageUuid and outputProtocol parameters, ensuring complete rollback
consistency.
In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy`:
- Around line 670-732: The cleanup logic at the end of the
testAddProtocolRollsBackWhenMajorityHostsFail method (resetting
deployClientShouldFail and detaching the primary storage) currently executes
only if all preceding assertions pass. Wrap the main test logic in a try block
and move the cleanup statements (deployClientShouldFail.set(false) and
detachPrimaryStorageFromCluster) into a finally block to ensure the recovery and
detachment always occur regardless of whether assertions fail, preventing test
environment pollution and state leakage to subsequent tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4be603f5-9689-468e-b4eb-6cc8079b7e69
📒 Files selected for processing (5)
docs/zbs-vhost/40-frontend-protocol-api.mdplugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.javastorage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.javastorage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.javatest/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy
背景
!10172 合并后复核其 coderabbit 评论,发现多条是 force-push 后被 GitLab 判 outdated 的假 resolved,并未真正落地。本 MR 补齐真缺口 + 加回归用例。
变更(逐条对真实代码核过)
40-frontend-protocol-api.md:6 处 fenced code block 补语言标识(MD040)。ExternalPrimaryStorageKvmFactory.checkHostStatus:node-health 空集合先判,折叠为Disconnected,避免allMatch空集合返回 true 误写Connected(且防 null NPE)。ExternalPrimaryStorageFactory.updateHostProtocolStatus:find()+persist/update两步式改原子INSERT ... ON DUPLICATE KEY UPDATE(@Transactional),消除并发建重复行窗口;lastOpDate仅状态变化时 bump 以保「最近变化时间」语义(唯一键 schema 已有)。ExternalPrimaryStorage.doAddProtocol:per-host 准备失败时收集错误,超过半数主机失败则completion.fail并回滚PrimaryStorageOutputProtocolRefVO(不暴露多数不可用的协议);≤半数仍注册走 ping 自愈。驳回项(ZStack 既定惯例,保留不改):CR3 升级脚本
0000-00-00默认值(178 脚本中 73 处惯例);CR7 SDKQuery*Result.inventoriesraw List(生成器统一约定)。CR1/CR2/CR5 在 !10172 已落地/重构后失效。测试
ZbsVhostVolumeCase.testAddProtocolRollsBackWhenMajorityHostsFail:注入DEPLOY_CLIENT_PATH失败让唯一连通主机准备失败 → 断言 addStorageProtocol API 失败 + 协议行被回滚(CR9)。ZbsVhostVolumeCase全过(Tests run: 1, Failures: 0, Errors: 0),CR6/CR8 改动对既有用例零回归。sync from gitlab !10280