Summary
The MariaDB and pgvector embedding stores build metadata-filter SQL by string-concatenating
filter keys (and, in MariaDB, string values) directly into the query without adequate
escaping. A crafted metadata key in EmbeddingSearchRequest.filter() can break out of its SQL
context and inject arbitrary SQL into the statements executed by the stores' search and
removeAll(Filter) operations.
Details
pgvector — JSON mode (default, COMBINED_JSON / COMBINED_JSONB). JSONFilterMapper
places the key inside a single-quoted SQL literal (the JSON key of the ->> operator) with no
escaping:
(metadata->>'<key>')::text
A key containing a single quote breaks out, e.g.
metadataKey("')::text IS NOT NULL OR pg_sleep(1) IS NOT NULL --") injects a live pg_sleep(1)
(observable as a delay; exploitable for blind data extraction).
pgvector — column mode (COLUMN_PER_KEY). ColumnFilterMapper used the key as a bare,
unquoted, unvalidated SQL identifier (<key>::<type>), so a key such as 1=1 OR true --
injects directly.
MariaDB — JSON mode (default). JSONFilterMapper placed the key inside the JSON path literal
'$.<key>' unescaped (same break-out mechanism). Additionally, MariaDbFilterMapper.formatValue()
escaped ' but not \; because MariaDB treats backslash as an escape character by default, a
string value ending in a backslash could also break out of its literal.
MariaDB — column mode (COLUMN_PER_KEY). ColumnFilterMapper fell back to the raw,
unescaped key when the driver could not quote it as an identifier (e.g. a
character).
The filter key is the runtime injection surface; both stores' search() (including pgvector's
HYBRID mode) and removeAll(Filter) are affected. Add/upsert operations a
parameterized and not affected.
Impact
Applications that allow attacker-influenced metadata filter keys (e.g. use
LLM-generated filters) to reach these stores are exposed to SQL injection: blind data
exfiltration, denial of service via sleep functions, and — through `remove
deletion of arbitrary rows. Applications using only hard-coded, developer-defined filter keys
are not reachable.
Patches
Fixed in langchain4j-mariadb and langchain4j-pgvector 1.16.3-beta26:
- JSON filter keys are escaped before being embedded in the SQL string lit
quotes doubled, correct for PostgreSQL standard_conforming_strings = on; MariaDB: backslash
and single quote).
- MariaDB string values escape both
\ and '.
- Column-mode keys are validated/quoted as identifiers and rejected when u
concatenated as raw SQL.
Workarounds
- Do not pass untrusted input as metadata filter keys.
- Restrict filter keys to a known allow-list at the application layer.
References
- pgvector:
JSONFilterMapper, ColumnFilterMapper
- MariaDB:
JSONFilterMapper, MariaDbFilterMapper, ColumnFilterMapper
References
Summary
The MariaDB and pgvector embedding stores build metadata-filter SQL by string-concatenating
filter keys (and, in MariaDB, string values) directly into the query without adequate
escaping. A crafted metadata key in
EmbeddingSearchRequest.filter()can break out of its SQLcontext and inject arbitrary SQL into the statements executed by the stores' search and
removeAll(Filter)operations.Details
pgvector — JSON mode (default,
COMBINED_JSON/COMBINED_JSONB).JSONFilterMapperplaces the key inside a single-quoted SQL literal (the JSON key of the
->>operator) with noescaping:
A key containing a single quote breaks out, e.g.
metadataKey("')::text IS NOT NULL OR pg_sleep(1) IS NOT NULL --")injects a livepg_sleep(1)(observable as a delay; exploitable for blind data extraction).
pgvector — column mode (
COLUMN_PER_KEY).ColumnFilterMapperused the key as a bare,unquoted, unvalidated SQL identifier (
<key>::<type>), so a key such as1=1 OR true --injects directly.
MariaDB — JSON mode (default).
JSONFilterMapperplaced the key inside the JSON path literal'$.<key>'unescaped (same break-out mechanism). Additionally,MariaDbFilterMapper.formatValue()escaped
'but not\; because MariaDB treats backslash as an escape character by default, astring value ending in a backslash could also break out of its literal.
MariaDB — column mode (
COLUMN_PER_KEY).ColumnFilterMapperfell back to the raw,unescaped key when the driver could not quote it as an identifier (e.g. a
character).
The filter key is the runtime injection surface; both stores'
search()(including pgvector'sHYBRID mode) and
removeAll(Filter)are affected. Add/upsert operations aparameterized and not affected.
Impact
Applications that allow attacker-influenced metadata filter keys (e.g. use
LLM-generated filters) to reach these stores are exposed to SQL injection: blind data
exfiltration, denial of service via sleep functions, and — through `remove
deletion of arbitrary rows. Applications using only hard-coded, developer-defined filter keys
are not reachable.
Patches
Fixed in
langchain4j-mariadbandlangchain4j-pgvector1.16.3-beta26:quotes doubled, correct for PostgreSQL
standard_conforming_strings = on; MariaDB: backslashand single quote).
\and'.concatenated as raw SQL.
Workarounds
References
JSONFilterMapper,ColumnFilterMapperJSONFilterMapper,MariaDbFilterMapper,ColumnFilterMapperReferences