Issues with michaelf34/infinity:latest-cpu + Qwen3-Embedding-8B

Posted by Patentsmatter@reddit | LocalLLaMA | View on Reddit | 2 comments

I tried building a docker container to have infinity use the Qwen3-Embedding-8B model in a CPU-only setting. But once the docker container starts, the CPU (Ryzen 9950X, 128GB DDR5) is fully busy even without any embedding requests. Is that normal, or did I configure something wrong?

Here's the Dockerfile:

FROM michaelf34/infinity:latest-cpu RUN pip install --upgrade transformers accelerate

Here's the docker-compose:

version: '3.8' services: infinity: build: . ports: - "7997:7997" environment: - DISABLE_TELEMETRY=true - DO_NOT_TRACK: 1 - TOKENIZERS_PARALLELISM=false - TRANSFORMERS_CACHE=.cache volumes: - ./models:/models:ro - ./cache:/.cache restart: unless-stopped command: infinity-emb v2 --model-id /models/Qwen3-Embedding-8B

Startup command was:

docker run -d -p 7997:7997 --name qwembed-cpu -v $PWD/models:/models:ro -v ./cache:/app/.cache qwen-infinity-cpu v2 --model-id /models/Qwen3-Embedding-8B --engine torch

[-]

Silentoplayz@reddit

I had a similar issue earlier, but when Docker was building to latest-rocm tagged image of infinity. Mind you, this was my first attempt trying to use Infinity for its use case. While pulling the image's layers, it somehow managed to used up the remaining 50GB of my OS drive down to 0 bytes left. I needed to remove the docker image manually using the CLI to retrieve the storage back.

Emergency_Fuel_2988@reddit

I get this error, with n, How did you get past it?

free(): double free detected in tcache 2