Skip to content

<fix>[storage]: zbs vhost coderabbit review hardening#4320

Open
zstack-robot-2 wants to merge 2 commits into
feature-zbs-vhostfrom
sync/jin.ma/fix/zbs-vhost-coderabbit
Open

<fix>[storage]: zbs vhost coderabbit review hardening#4320
zstack-robot-2 wants to merge 2 commits into
feature-zbs-vhostfrom
sync/jin.ma/fix/zbs-vhost-coderabbit

Conversation

@zstack-robot-2

Copy link
Copy Markdown
Collaborator

背景

!10172 合并后复核其 coderabbit 评论,发现多条是 force-push 后被 GitLab 判 outdated 的假 resolved,并未真正落地。本 MR 补齐真缺口 + 加回归用例。

变更(逐条对真实代码核过)

  • CR4 40-frontend-protocol-api.md:6 处 fenced code block 补语言标识(MD040)。
  • CR6 ExternalPrimaryStorageKvmFactory.checkHostStatus:node-health 空集合先判,折叠为 Disconnected,避免 allMatch 空集合返回 true 误写 Connected(且防 null NPE)。
  • CR8 ExternalPrimaryStorageFactory.updateHostProtocolStatusfind()+persist/update 两步式改原子 INSERT ... ON DUPLICATE KEY UPDATE@Transactional),消除并发建重复行窗口;lastOpDate 仅状态变化时 bump 以保「最近变化时间」语义(唯一键 schema 已有)。
  • CR9 ExternalPrimaryStorage.doAddProtocol:per-host 准备失败时收集错误,超过半数主机失败则 completion.fail 并回滚 PrimaryStorageOutputProtocolRefVO(不暴露多数不可用的协议);≤半数仍注册走 ping 自愈。

驳回项(ZStack 既定惯例,保留不改):CR3 升级脚本 0000-00-00 默认值(178 脚本中 73 处惯例);CR7 SDK Query*Result.inventories raw List(生成器统一约定)。CR1/CR2/CR5 在 !10172 已落地/重构后失效。

测试

  • 新增 ZbsVhostVolumeCase.testAddProtocolRollsBackWhenMajorityHostsFail:注入 DEPLOY_CLIENT_PATH 失败让唯一连通主机准备失败 → 断言 addStorageProtocol API 失败 + 协议行被回滚(CR9)。
  • 全量 premium 编译 BUILD SUCCESS;本地 IT ZbsVhostVolumeCase 全过(Tests run: 1, Failures: 0, Errors: 0),CR6/CR8 改动对既有用例零回归。

sync from gitlab !10280

CR4 docs fenced-code language tags; CR6 fold empty node-health to
Disconnected (not Connected); CR8 atomic upsert for per-protocol host
connectivity row; CR9 roll back the output-protocol when most hosts fail
to prepare it. CR3/CR7 are ZStack conventions, kept as-is.

Resolves: ZSTAC-85515

Change-Id: I1a2b57c6e700ab82aa27780e24288f26e972d851
Guards ExternalPrimaryStorage.doAddProtocol: when most hosts fail
to prepare a protocol, the API must fail and the output-protocol
registration must roll back so an unusable protocol is not exposed.

Resolves: ZSTAC-85515

Change-Id: If1f61defa16a7a4445121767d05d37e7c56f48e1
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Could not fetch remote config from http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml: TimeoutError: The operation was aborted due to timeout
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

重写 updateHostProtocolStatus 为原生 SQL upsert 以消除并发竞争;修复 checkHostStatus 对空健康响应未处理的缺陷;在 doAddProtocol 中新增多数主机失败时回滚协议注册的逻辑及辅助方法;增加对应回归测试;文档示例代码块补全 Markdown 代码围栏。

Changes

协议状态持久化与注册回滚

Layer / File(s) Summary
updateHostProtocolStatus 改为原子 upsert
storage/.../ExternalPrimaryStorageFactory.java
新增 javax.persistence.Query 导入;将原"先查询再 persist/update"流程替换为 INSERT ... ON DUPLICATE KEY UPDATE 原生 SQL upsert,消除并发重复行竞争,lastOpDate 仅在 status 变化时更新,日志改为 "upserted protocol connectivity"。
checkHostStatus 空健康信息处理
plugin/.../ExternalPrimaryStorageKvmFactory.java
新增 VolumeProtocol 导入;success(NodeHealthy) 回调中对 healthy == null 或空列表场景显式设置 Disconnected 并记录错误,非空时仍执行协议逐项更新与 allMatch 判定。
doAddProtocol 多数主机失败回滚
storage/.../ExternalPrimaryStorage.java
host 准备失败时将错误加入 errorCodeListWhileDoneCompletion 中若失败数超过主机总数一半则调用新增的 unregisterOutputProtocol 删除 PrimaryStorageOutputProtocolRefVO 记录并失败回调,否则保留注册继续。
多数主机失败回滚回归测试
test/.../ZbsVhostVolumeCase.groovy
新增 testAddProtocolRollsBackWhenMajorityHostsFail():注入 DEPLOY_CLIENT_PATH 模拟多数失败,执行 addStorageProtocol 并断言 API 失败且协议引用行已回滚,最后恢复注入并解绑主存储。test() 中新增对该方法的调用。

API 文档代码围栏格式修正

Layer / File(s) Summary
API 文档示例代码围栏补全
docs/zbs-vhost/40-frontend-protocol-api.md
在协议枚举(text)及 §3.3、§4、§5、§5.1、§6 的请求示例(http)代码块开头补全缺失的 Markdown 围栏语言标识,无描述性内容变更。

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 小兔子翻看代码树,
upsert 一击消竞争迷,
空健康信息不再忽视,
多数失败则优雅撤离,
文档围栏也整整齐齐~
代码清洁,世界美丽! 🌸

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确地反映了主要变更内容:针对 ZBS Vhost 功能进行代码审查相关的修复和硬化改进,符合本 PR 实现 CodeRabbit 评论建议的核心目标。
Description check ✅ Passed 描述充分说明了 PR 背景、具体变更项(CR4、CR6、CR8、CR9)及其对应代码位置和改进意义,完全相关于代码改动内容,并包含测试验证说明。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/jin.ma/fix/zbs-vhost-coderabbit

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java (1)

69-79: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

修正 hostStatus 的键维度,否则状态比较会长期失真。

Line 71 当前把 hostUuid 作为 map key,但 Line 271 用的是 extPs.getUuid()psUuid)取值,导致比较条件几乎总是失配,updateHostStatus(...) 会被反复触发。

🔧 建议修复
-    private Map<String, PrimaryStorageHostStatus> getHostStatus(List<ExternalPrimaryStorageVO> extPss) {
+    private Map<String, PrimaryStorageHostStatus> getHostStatus(String hostUuid, List<ExternalPrimaryStorageVO> extPss) {
         return Q.New(PrimaryStorageHostRefVO.class)
-                .select(PrimaryStorageHostRefVO_.hostUuid, PrimaryStorageHostRefVO_.status)
+                .select(PrimaryStorageHostRefVO_.primaryStorageUuid, PrimaryStorageHostRefVO_.status)
+                .eq(PrimaryStorageHostRefVO_.hostUuid, hostUuid)
                 .in(PrimaryStorageHostRefVO_.primaryStorageUuid,
                         extPss.stream().map(ExternalPrimaryStorageVO::getUuid).collect(Collectors.toList()))
                 .listTuple().stream()
                 .collect(Collectors.toMap(
                         t -> t.get(0, String.class),
                         t -> t.get(1, PrimaryStorageHostStatus.class),
                         (o, n) -> n
                 ));
     }
@@
-        Map<String, PrimaryStorageHostStatus> hostStatus = getHostStatus(extPss);
+        Map<String, PrimaryStorageHostStatus> hostStatus = getHostStatus(host.getUuid(), extPss);

Also applies to: 239-273

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java`
around lines 69 - 79, The getHostStatus method currently uses hostUuid as the
map key when building the result from PrimaryStorageHostRefVO, but the code that
consumes this map at line 271 looks up values using primaryStorageUuid (from
extPs.getUuid()). This mismatch causes the status comparison to consistently
fail and triggers repeated calls to updateHostStatus. Change the map key in the
Collectors.toMap call from hostUuid (t.get(0, String.class)) to
primaryStorageUuid, and update the select clause in the query to include or
replace hostUuid with primaryStorageUuid so that the correct field is available
as the map key.
🧹 Nitpick comments (2)
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy (1)

670-732: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

把恢复与解绑放入 finally,避免失败路径污染测试环境。

Line 727-731 的清理逻辑当前依赖前置断言全部通过。建议用 try/finally 固化恢复路径,避免失败后残留挂载状态或模拟器状态影响后续清理。

🧪 建议修复
-        deployClientShouldFail.set(true)
-        expectApiFailure({
-            addStorageProtocol {
-                uuid = ps.uuid
-                outputProtocol = VolumeProtocol.Vhost.toString()
-            }
-        }) {
-            assert delegate != null : "addStorageProtocol must fail when the only connected host cannot be prepared"
-        }
-
-        assert !Q.New(PrimaryStorageOutputProtocolRefVO.class)
-                .eq(PrimaryStorageOutputProtocolRefVO_.primaryStorageUuid, ps.uuid)
-                .eq(PrimaryStorageOutputProtocolRefVO_.outputProtocol, VolumeProtocol.Vhost.toString())
-                .isExists() : \
-                "Vhost output protocol left exposed after a majority of hosts failed to prepare it; " +
-                "doAddProtocol must roll back PrimaryStorageOutputProtocolRefVO on majority-failure (CR9)"
-
-        // restore so a later add of Vhost (and env teardown) behaves normally
-        deployClientShouldFail.set(false)
-        detachPrimaryStorageFromCluster {
-            primaryStorageUuid = ps.uuid
-            clusterUuid = cluster.uuid
-        }
+        deployClientShouldFail.set(true)
+        try {
+            expectApiFailure({
+                addStorageProtocol {
+                    uuid = ps.uuid
+                    outputProtocol = VolumeProtocol.Vhost.toString()
+                }
+            }) {
+                assert delegate != null : "addStorageProtocol must fail when the only connected host cannot be prepared"
+            }
+
+            assert !Q.New(PrimaryStorageOutputProtocolRefVO.class)
+                    .eq(PrimaryStorageOutputProtocolRefVO_.primaryStorageUuid, ps.uuid)
+                    .eq(PrimaryStorageOutputProtocolRefVO_.outputProtocol, VolumeProtocol.Vhost.toString())
+                    .isExists() : \
+                    "Vhost output protocol left exposed after a majority of hosts failed to prepare it; " +
+                    "doAddProtocol must roll back PrimaryStorageOutputProtocolRefVO on majority-failure (CR9)"
+        } finally {
+            deployClientShouldFail.set(false)
+            detachPrimaryStorageFromCluster {
+                primaryStorageUuid = ps.uuid
+                clusterUuid = cluster.uuid
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy`
around lines 670 - 732, The cleanup logic at the end of the
testAddProtocolRollsBackWhenMajorityHostsFail method (resetting
deployClientShouldFail and detaching the primary storage) currently executes
only if all preceding assertions pass. Wrap the main test logic in a try block
and move the cleanup statements (deployClientShouldFail.set(false) and
detachPrimaryStorageFromCluster) into a finally block to ensure the recovery and
detachment always occur regardless of whether assertions fail, preventing test
environment pollution and state leakage to subsequent tests.
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java (1)

2430-2449: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

回滚时建议同步清理 host-protocol 状态行,避免遗留“幽灵协议状态”。

当前 Line 2443-2449 只删除 PrimaryStorageOutputProtocolRefVO,但在此前失败分支已经写入过 ExternalPrimaryStorageHostProtocolRefVO。多数失败回滚后,这些状态行可能与“协议未暴露”状态不一致。

♻️ 建议补齐回滚清理
 private void unregisterOutputProtocol(String psUuid, String protocol) {
     SQL.New("delete from PrimaryStorageOutputProtocolRefVO ref" +
             " where ref.primaryStorageUuid = :psUuid and ref.outputProtocol = :protocol")
             .param("psUuid", psUuid)
             .param("protocol", protocol)
             .execute();
+
+    SQL.New("delete from ExternalPrimaryStorageHostProtocolRefVO ref" +
+            " where ref.primaryStorageUuid = :psUuid and ref.protocol = :protocol")
+            .param("psUuid", psUuid)
+            .param("protocol", protocol)
+            .execute();
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`
around lines 2430 - 2449, The unregisterOutputProtocol method currently only
deletes PrimaryStorageOutputProtocolRefVO records but leaves orphaned
ExternalPrimaryStorageHostProtocolRefVO records that were created during the
failed initialization attempts. This creates inconsistent state where
host-protocol associations remain even though the protocol is unregistered.
Extend the SQL delete operation in the unregisterOutputProtocol method to also
clean up the ExternalPrimaryStorageHostProtocolRefVO records matching the same
primaryStorageUuid and outputProtocol parameters, ensuring complete rollback
consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java`:
- Around line 69-79: The getHostStatus method currently uses hostUuid as the map
key when building the result from PrimaryStorageHostRefVO, but the code that
consumes this map at line 271 looks up values using primaryStorageUuid (from
extPs.getUuid()). This mismatch causes the status comparison to consistently
fail and triggers repeated calls to updateHostStatus. Change the map key in the
Collectors.toMap call from hostUuid (t.get(0, String.class)) to
primaryStorageUuid, and update the select clause in the query to include or
replace hostUuid with primaryStorageUuid so that the correct field is available
as the map key.

---

Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 2430-2449: The unregisterOutputProtocol method currently only
deletes PrimaryStorageOutputProtocolRefVO records but leaves orphaned
ExternalPrimaryStorageHostProtocolRefVO records that were created during the
failed initialization attempts. This creates inconsistent state where
host-protocol associations remain even though the protocol is unregistered.
Extend the SQL delete operation in the unregisterOutputProtocol method to also
clean up the ExternalPrimaryStorageHostProtocolRefVO records matching the same
primaryStorageUuid and outputProtocol parameters, ensuring complete rollback
consistency.

In
`@test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy`:
- Around line 670-732: The cleanup logic at the end of the
testAddProtocolRollsBackWhenMajorityHostsFail method (resetting
deployClientShouldFail and detaching the primary storage) currently executes
only if all preceding assertions pass. Wrap the main test logic in a try block
and move the cleanup statements (deployClientShouldFail.set(false) and
detachPrimaryStorageFromCluster) into a finally block to ensure the recovery and
detachment always occur regardless of whether assertions fail, preventing test
environment pollution and state leakage to subsequent tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4be603f5-9689-468e-b4eb-6cc8079b7e69

📥 Commits

Reviewing files that changed from the base of the PR and between 1443509 and f87b7fc.

📒 Files selected for processing (5)
  • docs/zbs-vhost/40-frontend-protocol-api.md
  • plugin/externalStorage/src/main/java/org/zstack/externalStorage/primary/kvm/ExternalPrimaryStorageKvmFactory.java
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorageFactory.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsVhostVolumeCase.groovy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants