Qwen 3 coder next ud-q8-xl F16 filling up the two orin rpc mesh!

Posted by braydon125@reddit | LocalLLaMA | View on Reddit | 10 comments

running great and as you can see here llama.cpp -fit is doing a great job at splitting this evenly . the largest piece of traffic between these two during initial tensor transfer was <5Gbps

Reply to Post

10 Comments

[-]

Artistic_Okra7288@reddit

I've had a ton of issues with llama.cpp rpc-server lately. Are you not crashing all the time (maybe it's my hardware)? I have a few nodes on a 10G network and when it works, it works well, but it's been crashing on recent releases.

[-]

braydon125@reddit (OP)

Nvitop is the truth bro

[-]

braydon125@reddit (OP)

Go on about your crashes and I'll be able to tell you more. But honestly if its not an OOM I'm pretty rock solid

[-]

ManufacturerWeird161@reddit

I ran that same model split across dual 4090s and also saw surprisingly low inter-card traffic, around 4-5Gbps during the initial load. The llama.cpp tensor splitting is impressively efficient.

[-]

braydon125@reddit (OP)

It is more a limit of single top packet traffic, not anything to do with nic or model really

[-]

braydon125@reddit (OP)

Shameless plug for my NIC monitoring tool used to monitor tensor traffic https://github.com/CCSLdirector/Netwave-network-monitor

[-]

ClimateBoss@reddit

whats the llama.cpp command for RPC on 2 computers?

[-]

braydon125@reddit (OP)

On the worker node i run ./rpc-server -p 50052 -h 0.0.0.0 And the host i run my launch cli like usual but with the --rpc flag with HOST IP:50052

[-]

ClimateBoss@reddit

is 1gb ethernet good enough or need more? rpc-server is compiled with llama-server or some other installation?

[-]

braydon125@reddit (OP)

You have to build llama.cpp with the rpc flag and uh I cant say for sure but I imagine 1gbe would not be sufficient