Advice to Tenstorrent If you want to get acquired / become scam IP licensing co...I can't help you. If you want to win AI compute, read on === This is your 7th stack? Plz bro one more stack this stack will be good i promise bro bro bro plz one more make it all back one trade You can't build a castle on a shit swamp. LLK is the wrong approach. === Tenstorrent advantage is in more programmability wrt GPUs. Hardware shapes model arch. If you don't expose that programmability, you are guaranteed to lose. sfpi_elu is a problem. You aren't going to get better deals on tapeouts/IP than NVIDIA/AMD. You need some advantage. But but but it's all open source. === If you want a dataflow graph compiler, build a dataflow graph compiler. This is not 6 layers of abstraction, it's 3 (and only 2 you have to build). 1. frontend <PyTorch, ONNX, tensor.py> 2. compiler 3. runtime/driver === Start with 3. The driver is fine. The runtime should JUST BE A RUNTIME. I better never see mention of a elu. Make the runtime expose hardware in a application agnostic way. Compilation, dispatch, queuing, etc... As long as LLK sits under tt-metalium, you aren't doing this. CUDA is a simple C API for this. I advise doing the same. === Now for 2. tinygrad is this, but you don't have to use it. MLIR/LLVM is probably fine. ELU still should not be here!!!! This should deal with memory placement, op scheduling, kernel fusion. Not ELU. This is not easy. But importing 6 abstraction layers of cruft doesn't fix that!!!! === Now for 1. self.elu() needs to have same perf as self.relu() - alpha*(1-self.exp()).relu() If it doesn't, you messed up. Only once it does are you ready to write elu. HINT for how to write ELU: def elu(self): return self.relu() - alpha*(1-self.exp()).relu() HINT is not a hint, it's the actual code.
First seen: 2025-05-25 23:46
Last seen: 2025-05-26 03:46