@TheAhmadOsman

Communities: x/LocalLLaMA

#	Tweet	Community	Topic	Views ▼	Ratio	Engagement	Posted
1	[image] You don’t “run a model” You run Kernels The model is just a graph The Inference Engine is scheduler / optimizer / executor But the actual work? That happens in the Kernels - MatMul Kernels - Attention Kernels - RMSNorm Kernels - KV cache Kernels - Quantized linear Kernels -	x/LocalLLaMA	—	126.9K	2.3x	1.1K	Apr 26
2	[image] Just some numbers so you don’t get misled RTX 3090 (7 years old) > 24GB VRAM > Bandwidth: 936.2 GB/s > Bi-directional NVLink 112GB/s RTX PRO 4000 > 24GB VRAM > Bandwidth: 672 GB/s > No Bi-directional NVLink, > need 32 Gen. 5 PCIe Lanes to pool 2 at 64GB/s	x/LocalLLaMA	—	42.1K	0.8x	286	Apr 6
3	[image] How am I connecting the DGX Spark cluster > Mikrotik CRS804-4DDQ 1.6Tbps switch > (4) 400G QSFP-DD to 2x 200G QSFP56 Each DGX Spark has a ConnectX-7 supporting 200Gbps Each cable out of the switch goes into 2 DGX Sparks This allows 8x DGX Sparks cluster at the full 1.6Tbps	x/LocalLLaMA	—	33.0K	0.8x	214	Feb 19
4	[image] X is killing x/LocalLLaMA Community tonight We built the new permanent home - links DOT theahmadosman DOT com/discord-server That's where you go now	x/LocalLLaMA	—	29.6K	0.5x	105	May 30
5	[text] Most people think VRAM = model size and that’s why their runs crash GPU memory math is complex and so are the implications Here’s how it actually works in a nutshell ↓	x/LocalLLaMA	—	23.7K	0.5x	239	Apr 3
6	[image] ~2,000 miles from home - SSH into the mothership - WireGuard tunnel like I’m on LAN - Resolve everything over private DNS - Routes via reverse proxies - Dispatch jobs to GPUs - Sync state across agents - Search conversations, traces, artifacts, logs Read how below	x/LocalLLaMA	—	21.9K	0.4x	238	May 1
7	[text] What am I working on? Condensing everything I do into one place: - local AI / LLMs - inference + benchmarking - hardware + cluster builds - LLM research + notes - agent workflows - real-world perf (tokens/sec, concurrency, thermals) All into a single, searchable, indexable	x/LocalLLaMA	—	17.4K	0.3x	256	Apr 14
8	[image] Bay Area folks, I’m putting together a casual get-together in San Francisco this Saturday, May 9th Pull up to talk local AI, GPUs, infra, opensource, agents, homelabs, and whatever else the GPU-poisoned brain wants to discuss Limited capacity, first come first served	x/LocalLLaMA	—	17.1K	0.3x	115	May 6
9	[image] which one of you is this?	x/LocalLLaMA	—	12.1K	0.2x	235	Apr 2
10	[video] New quick video, more coming up Goal isn't just to show how large models perform but also to show how small and medium models scale up as you multiply # of nodes and how Tensor Parallelism performs on Unified Memory hardware	x/LocalLLaMA	—	10.8K	0.2x	57	Apr 9
11	[image] RTX PRO 6000 (96GB VRAM, ~$15K) GIVEAWAY FAQ Q: Cost to enter? A: $0. Free. Q: Do I have to register for GTC? A: Yes, virtual attendance is COMPLETELY FREE Q: Where do I enter? A: Tap the link in my bio, there’s a clear button on the page Q: How do I increase my chances? A:	x/LocalLLaMA	—	9.5K	0.2x	89	Mar 7
12	[text] Tip of the day Always tell your agent to think through the plan comprehensively covering any and all relevant things, and using best practices and up-to-date information as of April 13th, 2026 (replace with today's date)	x/LocalLLaMA	—	9.1K	0.2x	109	Apr 14

Ahmad ✓