Description
It looks like pgvect needs to be built for the host machine within the container upon first boot. Otherwise, "illegal instruction" errors will crash the database.
I reproduce the error with the following steps:
UPDATE document_chunks
SET embedding = pgml.embed('intfloat/e5-large-v2', 'passage: ' || chunk_text),
embedding_model = 'intfloat/e5-large-v2'
WHERE chunk_id = 1;
error communicating with database: unexpected end of file
SELECT pgml.embed('intfloat/e5-large-v2', 'passage: ' || chunk_text)
FROM document_chunks
WHERE chunk_id = 1;
generates the embedding as expected.
I've pulled /var/log/postgresql/postgresql-15-main.log and found the following error response:
2024-06-06 23:21:44.980 UTC [22] LOG: server process (PID 3633) was terminated by signal 4: Illegal instruction
2024-06-06 23:21:44.980 UTC [22] DETAIL: Failed process was running:
UPDATE document_chunks
SET
embedding = pgml.embed('intfloat/e5-large-v2', 'passage: ' || chunk_text),
embedding_model = 'intfloat/e5-large-v2'
WHERE
chunk_id = 1;
I'm running a brand new database in Ubuntu using the following command:
docker run -d --name postgresml -v /storage/data/postgresml_data/_data2/:/var/lib/postgresql/ --gpus "device=1" -p 5433:5432 -p 8000:8000 ghcr.io/postgresml/postgresml:2.8.2 bash -c "sudo -u postgresml bash -c 'while true; do sleep 1000; done'"
After some discussion with Lev on Discord, a stack trace reported the following error within the worker thread:
Thread 1 "postgres" received signal SIGILL, Illegal instruction.
0x00007cb54708e8ae in array_to_vector (fcinfo=<optimized out>) at src/vector.c:501
501 result->x[i] = DatumGetFloat4(elemsp[i]);
Solution / workaround: Compile pgvector within the docker container. It was found that rebuilding from source fixed this issue.
Recommended action: Please update the docker image so it performs a fresh compile of pgvector to ensure proper support for the system processor. (I'm running on an 8 year old i7 extreme processor, which must have been incompatible)