Agent skill extraction can dead-letter when SkillClusterUpdated fires before agent_case is visible in LanceDB

## Summary

`extract_agent_skill` can enter `dead_letter` when `SkillClusterUpdated` is emitted before the newly written `agent_case` is visible through LanceDB.

In an isolated Linux/Docker run with a synthetic agent trajectory, EverOS successfully generated and indexed an `agent_case`, but the downstream `extract_agent_skill` strategy retried too early and exhausted its retries with `_CaseNotYetIndexedError`.

## Why this matters

Agent memory is one of EverOS's differentiating capabilities. In this case:

- `extract_agent_case` succeeded.
- `trigger_skill_clustering` succeeded.
- `agent_case` Markdown existed.
- `agent_case` was later visible in LanceDB.
- `/search` could retrieve the `agent_case`.
- But `extract_agent_skill` still ended in `dead_letter` because it checked LanceDB before the case was visible.

That means the case-to-skill chain can fail even though the case itself is valid and eventually searchable.

## Reproduction shape

Synthetic agent trajectory:

1. User reports a focused test failure.
2. Assistant/agent runs a focused test.
3. Tool returns a trace.
4. Assistant identifies root cause.
5. Assistant patches the minimal code path.
6. Tool reports focused test pass.
7. Assistant runs the broader test module.
8. Assistant summarizes the reusable debugging lesson.

The sample was sufficient to generate an `agent_case` with quality score `0.98`.

## Observed evidence

`agent_case` result:

```text
agent_case: 1
agent_skill: 0
agent_case search: keyword/vector/hybrid all retrieved the case
```

OME records:

```text
extract_agent_case: success
trigger_skill_clustering: success
extract_agent_skill: dead_letter
```

Failure message:

```text
_CaseNotYetIndexedError: AgentCase entry_id=ac_20260609_00000001 not in LanceDB yet; retrying
```

Relevant code locations:

```text
src/everos/memory/strategies/trigger_skill_clustering.py
src/everos/memory/strategies/extract_agent_skill.py
```

`trigger_skill_clustering` emits `SkillClusterUpdated`, and `extract_agent_skill` then calls `_load_target_case`, which raises `_CaseNotYetIndexedError` if the case is not yet visible in LanceDB.

## Expected behavior

One of the following would make the chain more reliable:

- Ensure `SkillClusterUpdated` is emitted only after `agent_case` is indexed and visible.
- Let `extract_agent_skill` load the target case from Markdown or the event payload when LanceDB is not yet caught up.
- Increase/rework retry scheduling so eventual LanceDB visibility is actually covered.
- Treat `_CaseNotYetIndexedError` as a delayed dependency rather than a normal retry that can quickly exhaust into `dead_letter`.

## Environment

```text
EverOS: 1.0.0 source checkout
Runtime: Docker Linux runtime
Python: 3.12
Data: synthetic agent trajectory only
```

No real user memory or secrets were used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent skill extraction can dead-letter when SkillClusterUpdated fires before agent_case is visible in LanceDB #275

Summary

Why this matters

Reproduction shape

Observed evidence

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent skill extraction can dead-letter when SkillClusterUpdated fires before agent_case is visible in LanceDB #275

Description

Summary

Why this matters

Reproduction shape

Observed evidence

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions