[vector] Support raw fallback for vector search#8302
Open
JingsongLi wants to merge 1 commit into
Open
Conversation
0221d5c to
c3f0b4f
Compare
leaves12138
requested changes
Jun 20, 2026
leaves12138
left a comment
Contributor
There was a problem hiding this comment.
Thanks for the PR. I found two correctness issues in the raw-fallback path that should be fixed before merging:
- Scalar pre-filter can drop valid rows when the scalar index only partially overlaps a vector-indexed range.
VectorScanImplattaches any scalar index whose row range intersects the vector range, andAbstractVectorRead.preFilterthen turns the scalar-index result into a globalincludeRowIdsbitmap. Rows inside the vector-indexed range but outside the scalar-index coverage are therefore excluded from vector search even though they still need to be evaluated by the residual table filter. I reproduced this with a vector index covering[0, 9], a btree index onidcovering only[3, 7], and filterid >= 8; the query should return row8, but returns an empty result. The test command was:
mvn -pl paimon-core -am -DskipITs -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true -DfailIfNoTests=false -Dtest=VectorSearchBuilderTest#testPartialScalarPreFilterMustNotDropUnindexedScalarRows test
- Raw-only fallback uses the wrong metric when there is no vector index split. In
FULL/DETAILmode, a table can have raw ranges but no vector index files yet. In that caseglobalIndexeris null andrawSearchMetricfalls back tol2, ignoring the configured vector-index metric. For a cosine table with vectors[100, 0]and[0.9, 0.1], querying[1, 0]should return row0, but raw-only search returns row1because it ranks by L2. I reproduced this with:
mvn -pl paimon-core -am -DskipITs -Dcheckstyle.skip -Drat.skip=true -Dspotless.check.skip=true -DfailIfNoTests=false -Dtest=VectorSearchBuilderTest#testFullModeRawOnlyUsesConfiguredMetric test
I think the first issue needs a conservative scalar pre-filter: either only use scalar indexes when their coverage is complete for the vector split, or add the scalar-index-uncovered portions of the vector split back to the candidate bitmap so the residual filter can still be applied. For the second issue, raw search needs to derive the metric from the configured vector index type/options even when no index file exists, rather than defaulting to L2.
c3f0b4f to
b3a9cc1
Compare
b3a9cc1 to
7a5af13
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Support vector search over unindexed raw rows when
global-index.search-modeis configured asfullordetail, while keeping the defaultfastmode index-only. This improves vector search freshness after new data is written before the vector global index is rebuilt.Changes
VectorGlobalIndexerinterface so vector index implementations can expose their metric for raw score computation.full/detailmodes.Testing
mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest=VectorSearchBuilderTest testmvn -pl paimon-common,paimon-core,paimon-lumina,paimon-vector,paimon-spark/paimon-spark-common -am -Pfast-build -DskipTests -DfailIfNoTests=false compilemvn -pl paimon-common,paimon-core,paimon-lumina,paimon-vector,paimon-spark/paimon-spark-common -DskipTests spotless:checkgit diff --check origin/master..HEADNotes
mvnflinkis not available in this local environment (mvnflink not found), so verification used Maven directly.