Deepsilicon runs neural nets with 5x less RAM and ~20x faster. They are building SW and custom silicon for it

Posted by hamada0001@reddit | LocalLLaMA | View on Reddit | 43 comments

Apparently "representing transformer models as ternary values (-1, 0, 1) eliminates the need for computationally expensive floating-point math".

Seems a bit too easy so I'm skeptical. Thoughts on this?