← Dashboard
TheAhmadOsman

Ahmad ✓

60.1K followers
5 tweets
Communities: x/LocalLLaMA
# Tweet Community Topic Views Ratio Engagement Posted
1
[image] You don’t “run a model” You run Kernels The model is just a graph The Inference Engine is scheduler / optimizer / executor But the actual work? That happens in the Kernels - MatMul Kernels - Attention Kernels - RMSNorm Kernels - KV cache Kernels - Quantized linear Kernels -
x/LocalLLaMA 126.9K 2.3x 1.1K Apr 26
2
[image] Just some numbers so you don’t get misled RTX 3090 (7 years old) > 24GB VRAM > Bandwidth: 936.2 GB/s > Bi-directional NVLink 112GB/s RTX PRO 4000 > 24GB VRAM > Bandwidth: 672 GB/s > No Bi-directional NVLink, > need 32 Gen. 5 PCIe Lanes to pool 2 at 64GB/s
x/LocalLLaMA 42.1K 0.8x 286 Apr 6
3
[image] How am I connecting the DGX Spark cluster > Mikrotik CRS804-4DDQ 1.6Tbps switch > (4) 400G QSFP-DD to 2x 200G QSFP56 Each DGX Spark has a ConnectX-7 supporting 200Gbps Each cable out of the switch goes into 2 DGX Sparks This allows 8x DGX Sparks cluster at the full 1.6Tbps
x/LocalLLaMA 33.0K 0.8x 214 Feb 19
4
[image] X is killing x/LocalLLaMA Community tonight We built the new permanent home - links DOT theahmadosman DOT com/discord-server That's where you go now
x/LocalLLaMA 29.6K 0.5x 105 May 30
5
[text] Most people think VRAM = model size and that’s why their runs crash GPU memory math is complex and so are the implications Here’s how it actually works in a nutshell ↓
x/LocalLLaMA 23.7K 0.5x 239 Apr 3
6
[image] ~2,000 miles from home - SSH into the mothership - WireGuard tunnel like I’m on LAN - Resolve everything over private DNS - Routes via reverse proxies - Dispatch jobs to GPUs - Sync state across agents - Search conversations, traces, artifacts, logs Read how below
x/LocalLLaMA 21.9K 0.4x 238 May 1
7
[text] What am I working on? Condensing everything I do into one place: - local AI / LLMs - inference + benchmarking - hardware + cluster builds - LLM research + notes - agent workflows - real-world perf (tokens/sec, concurrency, thermals) All into a single, searchable, indexable
x/LocalLLaMA 17.4K 0.3x 256 Apr 14
8
[image] Bay Area folks, I’m putting together a casual get-together in San Francisco this Saturday, May 9th Pull up to talk local AI, GPUs, infra, opensource, agents, homelabs, and whatever else the GPU-poisoned brain wants to discuss Limited capacity, first come first served
x/LocalLLaMA 17.1K 0.3x 115 May 6
9
[image] which one of you is this?
x/LocalLLaMA 12.1K 0.2x 235 Apr 2
10
[video] New quick video, more coming up Goal isn't just to show how large models perform but also to show how small and medium models scale up as you multiply # of nodes and how Tensor Parallelism performs on Unified Memory hardware
x/LocalLLaMA 10.8K 0.2x 57 Apr 9
11
[image] RTX PRO 6000 (96GB VRAM, ~$15K) GIVEAWAY FAQ Q: Cost to enter? A: $0. Free. Q: Do I have to register for GTC? A: Yes, virtual attendance is COMPLETELY FREE Q: Where do I enter? A: Tap the link in my bio, there’s a clear button on the page Q: How do I increase my chances? A:
x/LocalLLaMA 9.5K 0.2x 89 Mar 7
12
[text] Tip of the day Always tell your agent to think through the plan comprehensively covering any and all relevant things, and using best practices and up-to-date information as of April 13th, 2026 (replace with today's date)
x/LocalLLaMA 9.1K 0.2x 109 Apr 14