Embeddings normalization fixes (#14284)

* Use cosine distance metric for vec tables

* Only apply normalization to multi modal searches

* Catch possible edge case in stddev calc

* Use sigmoid function for normalization for multi modal searches only

* Ensure we get model state on initial page load

* Only save stats for multi modal searches and only use cosine similarity for image -> image search
This commit is contained in:
Josh Hawkins
2024-10-11 13:11:11 -05:00
committed by GitHub
parent d4b9b5a7dd
commit 8a8a0c7dec
5 changed files with 41 additions and 26 deletions

View File

@@ -42,12 +42,12 @@ class SqliteVecQueueDatabase(SqliteQueueDatabase):
self.execute_sql("""
CREATE VIRTUAL TABLE IF NOT EXISTS vec_thumbnails USING vec0(
id TEXT PRIMARY KEY,
thumbnail_embedding FLOAT[768]
thumbnail_embedding FLOAT[768] distance_metric=cosine
);
""")
self.execute_sql("""
CREATE VIRTUAL TABLE IF NOT EXISTS vec_descriptions USING vec0(
id TEXT PRIMARY KEY,
description_embedding FLOAT[768]
description_embedding FLOAT[768] distance_metric=cosine
);
""")