[Dr. Ian Cutress] Jim Keller's Big Quiet Box of AI

[-]

auradragon1@reddit

No price, no specs, no performance figures vs competition.

[-]

RetdThx2AMD@reddit

Here is a comment I made about this product 9 months ago when they "launched" this product. Not sure why Ian is hyping it now as new, unless the "launch" 9 months ago was not really a launch.

"The n300 with two wormhole chips on a PCIe board, uses 300W, and costs $1400 ( https://tenstorrent.com/hardware/wormhole ). The performance is not great at 466 (FP8) and 131(FP16) TFLOPs. It makes way more sense to just buy a 4090 or even a 7900XTX. They are shoving this wormhole processor out because they have to in order to keep a business case alive, but I have no doubt they will lose money on it. On the performance side a single MI300X or H100 beats the whole TT-Box things they are offering. I'll be surprised if they can get anywhere near the value proposition of GeoHot's tinybox, while losing money on every unit sold."

[-]

auradragon1@reddit

I don't understand who the target audience are for something like Wormhole.

It's clearly not able to do any meaningful training. It's an inference chip only. If you're going to use it for local inference, Macbooks offer more value, portability, and generally will make a much better computer. Lots of engineers code on Macs, and use the Apple Silicon GPU to do inference as well. So an M4 Max 128GB would be great for local LLMs and coding.

Then if you're looking for a desktop local LLM machine, the M3 Ultra 512GB for $9500 is a far better value than this $15k Quiet Box machine. M3 Ultra has a faster CPU, 6x more power efficient, and can run Deepseek R1 672b Q4 at 19 tokens/s, a much better model than anything Quiet Box can run.

[-]

ghenriks@reddit

It’s about getting hardware out so developers can start developing software that runs on the hardware

The limited production runs mean price/performance aren’t optimal but that is the trade off when introducing new hardware to the market

It’s no different than the currently available RISC-V dev boards that are poor performance for too much money for the mass market but are the only way to get the porting and testing and debugging done for RISV-V versions of operating systems and software

[-]

auradragon1@reddit

It’s about getting hardware out so developers can start developing software that runs on the hardware

You have to give people a reason to want to develop software for a platform?

What's in it for developers? It's clearly not price/performance. So why would developers care? Why would CUDA developers suddenly switch to some small shop, low value hardware when they're making a load of money writing CUDA code?

[-]

xternocleidomastoide@reddit

It is usual for most HW startups, the initial HW is for evaluation purposes by developers. You can get performance estimations from there, to see if it makes sense for the intended audience.

The first generations of their stuff is done on older nodes. So they are not representative of any specific price/performance.

Usually the goal is to show that the team can execute the HW, and that there is some validity to the roadmap, which is usually what the customers are looking at.

[-]

auradragon1@reddit

Customers looks for value. You have to offer something even in first gen.

[-]

xternocleidomastoide@reddit

First generation from a startup is a bit different than first generation from a stablished outfit.

The first generation tends to be geared heavily towards validation for the investors.

[-]

auradragon1@reddit

Novelty hardware still needs an advantage. What is the advantage?

[-]

xternocleidomastoide@reddit

I have no idea, I haven't followed this specific startup. I am simply relaying just how HW startups tend to operate.

The first generations are mainly to keep the investors happy and secure further fundings rounds. By showing that the team can execute, that the concept/product has merit, try to gather developer interest, increase confidence in the roadmap, etc.

HW startups are extremely risky, and require lots of capital investment, so there is a lot of pressure in terms of validation, exit strategies, etc.

I don't have any opinion on Tenstorrent's value proposition, since as I said, they are out of my radar of interest.

[-]

auradragon1@reddit

The reason investors invest in a hardware startup is because it thinks it can do something different than the incumbents. Usually, it's speed, performance, price, power, SDK, or a combination of them. So what is Tenstorrent's? That's my point.

[-]

Noble00_@reddit (OP)

It's all there unless I'm misunderstood your statement.

https://tenstorrent.com/en/hardware/tt-quietbox

$15,000 USD, specs are there: Epyc 8124p, x4 TT-Wormhole n300 4x24GB, aggregate 2.3 TB/s MBW

There is a performance demo in the video: Llama2 70b, 32 batch, 10.4 t/s per user. They also stated in the video you can find more on the GitHub with more performance figures and this is the one I believe:

https://github.com/tenstorrent/tt-metal2

[-]

auradragon1@reddit

I meant in the video.

The video doesn't talk about any of that stuff.

[-]

Noble00_@reddit (OP)

Timestamp around 11:28 or you can just watch it starting from 5:08 which has it running in the background.

Not entirely sure about the quantization, nor am I too knowledgeable on LMs. But it seems to load the whole weight of the model, I forgot to mention there is 512GB DDR5-4800 RDIMMs. Again, this is 32 concurrent batches running. That said I won't argue on the topic of this vs a Mac as I'm not knowledgeable on the topic, but I feel like there is more too it than just t/s,

Yeah, messed up the links sorry.

https://github.com/tenstorrent/tt-metal

For what's it worth, Llama3.1 70B runs at 486.4 t/s (in total, 32 batch?).

[-]

auradragon1@reddit

Timestamp around 11:28

Am I crazy? Or did she not mention anything about tokens/s at 11:28 and after? She only mentioned some sort of link.

or you can just watch it starting from 5:08

Hardly can see what's going on.

My point is that Ian should have done a better job with the video.

For what's it worth, Llama3.1 70B runs at 486.4 t/s (in total, 32 batch?).

Any numbers for single run?

[-]

ResponsibleJudge3172@reddit

Ian is a huge Keller fanboy. Look how many times he features Keller

[-]

ghenriks@reddit

Tenstorrent announced the next generation of their hardware today with the release of Dev hardware again

Blackhole PCIe cards at this link https://tenstorrent.com/hardware/blackhole

They also released a bunch of developer oriented videos on YouTube

[-]

justgord@reddit

Some nice innovations :

fast ethernet connects between adjacent boards
shared exponent BFxx data formats : 8bit exponent shared with 16 mantissa, and mantissa can be 2 4 6 16 bits.

[-]

wfd@reddit

Tenstorrent bet againt HBM and thought LLM already reached a size limit of GPT 3.5.

Now their products struggle to find customers.

[-]

auradragon1@reddit

Their bandwidth is really slow. The ASICs seem to have the raw TFLOPs but the memory bandwidth is abysmal.

[-]

Glittering_Power6257@reddit

Wait. Is, this not an April Fools video?