Communities:
x/LocalLLaMA
| # | Tweet | Community | Topic | Views ▼ | Ratio | Engagement | Posted |
|---|---|---|---|---|---|---|---|
| 1 | [image] You don’t “run a model”
You run Kernels
The model is just a graph
The Inference Engine is scheduler / optimizer / executor
But the actual work? That happens in the Kernels
- MatMul Kernels
- Attention Kernels
- RMSNorm Kernels
- KV cache Kernels
- Quantized linear Kernels
- | x/LocalLLaMA | — | 126.9K | 2.3x | 1.1K | Apr 26 |
| 2 | [image] Just some numbers so you don’t get misled
RTX 3090 (7 years old)
> 24GB VRAM
> Bandwidth: 936.2 GB/s
> Bi-directional NVLink 112GB/s
RTX PRO 4000
> 24GB VRAM
> Bandwidth: 672 GB/s
> No Bi-directional NVLink,
> need 32 Gen. 5 PCIe Lanes to pool 2 at 64GB/s | x/LocalLLaMA | — | 42.1K | 0.8x | 286 | Apr 6 |
| 3 | [image] How am I connecting the DGX Spark cluster
> Mikrotik CRS804-4DDQ 1.6Tbps switch
> (4) 400G QSFP-DD to 2x 200G QSFP56
Each DGX Spark has a ConnectX-7 supporting 200Gbps
Each cable out of the switch goes into 2 DGX Sparks
This allows 8x DGX Sparks cluster at the full 1.6Tbps | x/LocalLLaMA | — | 33.0K | 0.8x | 214 | Feb 19 |
| 4 | [image] X is killing x/LocalLLaMA Community tonight
We built the new permanent home
- links DOT theahmadosman DOT com/discord-server
That's where you go now | x/LocalLLaMA | — | 29.6K | 0.5x | 105 | May 30 |
| 5 | [text] Most people think VRAM = model size
and that’s why their runs crash
GPU memory math is complex
and so are the implications
Here’s how it actually works in a nutshell ↓ | x/LocalLLaMA | — | 23.7K | 0.5x | 239 | Apr 3 |
| 6 | [image] ~2,000 miles from home
- SSH into the mothership
- WireGuard tunnel like I’m on LAN
- Resolve everything over private DNS
- Routes via reverse proxies
- Dispatch jobs to GPUs
- Sync state across agents
- Search conversations, traces, artifacts, logs
Read how below | x/LocalLLaMA | — | 21.9K | 0.4x | 238 | May 1 |
| 7 | [text] What am I working on?
Condensing everything I do into one place:
- local AI / LLMs
- inference + benchmarking
- hardware + cluster builds
- LLM research + notes
- agent workflows
- real-world perf (tokens/sec, concurrency, thermals)
All into a single, searchable, indexable | x/LocalLLaMA | — | 17.4K | 0.3x | 256 | Apr 14 |
| 8 | [image] Bay Area folks, I’m putting together a casual get-together in San Francisco this Saturday, May 9th
Pull up to talk local AI, GPUs, infra, opensource, agents, homelabs, and whatever else the GPU-poisoned brain wants to discuss
Limited capacity, first come first served | x/LocalLLaMA | — | 17.1K | 0.3x | 115 | May 6 |
| 9 | [image] which one of you is this? | x/LocalLLaMA | — | 12.1K | 0.2x | 235 | Apr 2 |
| 10 | [video] New quick video, more coming up
Goal isn't just to show how large models perform but also to show how small and medium models scale up as you multiply # of nodes and how Tensor Parallelism performs on Unified Memory hardware | x/LocalLLaMA | — | 10.8K | 0.2x | 57 | Apr 9 |
| 11 | [image] RTX PRO 6000 (96GB VRAM, ~$15K) GIVEAWAY FAQ
Q: Cost to enter?
A: $0. Free.
Q: Do I have to register for GTC?
A: Yes, virtual attendance is COMPLETELY FREE
Q: Where do I enter?
A: Tap the link in my bio, there’s a clear button on the page
Q: How do I increase my chances?
A: | x/LocalLLaMA | — | 9.5K | 0.2x | 89 | Mar 7 |
| 12 | [text] Tip of the day
Always tell your agent to think through the plan comprehensively covering any and all relevant things, and using best practices and up-to-date information as of April 13th, 2026 (replace with today's date) | x/LocalLLaMA | — | 9.1K | 0.2x | 109 | Apr 14 |