x/LocalLLaMA

Columns:

#	Tweet	User	Followers	Views ▼	Ratio	Engagement	Posted
1	[image] You don’t “run a model” You run Kernels The model is just a graph The Inference Engine is scheduler / optimizer / executor But the actual work? That happens in the Kernels - MatMul Kernels - Attention Kernels - RMSNorm Kernels - KV cache Kernels - Quantized linear Kernels -	@TheAhmadOsman ✓	56.3K	126.9K	2.3x	1.1K	Apr 26
2	[text] 1/ 🧵 I just cracked open the Claude Code source — and what I found isn’t “just a smarter terminal chat.” It’s a full-blown behavioral observatory running in your machine. 1. Keyword sniffers. 2. Hesitation trackers. 3. Hidden trigger words. 4. Telemetry that fingerprints	@UsmanReads ✓	357	53.3K	149.3x	541	Mar 31
3	[image] Just some numbers so you don’t get misled RTX 3090 (7 years old) > 24GB VRAM > Bandwidth: 936.2 GB/s > Bi-directional NVLink 112GB/s RTX PRO 4000 > 24GB VRAM > Bandwidth: 672 GB/s > No Bi-directional NVLink, > need 32 Gen. 5 PCIe Lanes to pool 2 at 64GB/s	@TheAhmadOsman ✓	52.2K	42.1K	0.8x	286	Apr 6
4	[image] How am I connecting the DGX Spark cluster > Mikrotik CRS804-4DDQ 1.6Tbps switch > (4) 400G QSFP-DD to 2x 200G QSFP56 Each DGX Spark has a ConnectX-7 supporting 200Gbps Each cable out of the switch goes into 2 DGX Sparks This allows 8x DGX Sparks cluster at the full 1.6Tbps	@TheAhmadOsman ✓	43.3K	33.0K	0.8x	214	Feb 19
5	[image] X is killing x/LocalLLaMA Community tonight We built the new permanent home - links DOT theahmadosman DOT com/discord-server That's where you go now	@TheAhmadOsman ✓	60.1K	29.6K	0.5x	105	May 30
6	[image] okay let me say this out loud again. if you want to run local models on a single RTX 3090, your best option right now is qwen 3.5 27B dense Q4_K_M. 35 tok/s, flat from 4K to 300K+ context, zero speed degradation. thinking mode works. 262K native context on 24GB. slower than MoE	@sudoingX ✓	20.2K	27.9K	1.4x	650	Mar 28
7	[text] Most people think VRAM = model size and that’s why their runs crash GPU memory math is complex and so are the implications Here’s how it actually works in a nutshell ↓	@TheAhmadOsman ✓	51.8K	23.7K	0.5x	239	Apr 3
8	[image] ~2,000 miles from home - SSH into the mothership - WireGuard tunnel like I’m on LAN - Resolve everything over private DNS - Routes via reverse proxies - Dispatch jobs to GPUs - Sync state across agents - Search conversations, traces, artifacts, logs Read how below	@TheAhmadOsman ✓	56.7K	21.9K	0.4x	238	May 1
9	[text] What am I working on? Condensing everything I do into one place: - local AI / LLMs - inference + benchmarking - hardware + cluster builds - LLM research + notes - agent workflows - real-world perf (tokens/sec, concurrency, thermals) All into a single, searchable, indexable	@TheAhmadOsman ✓	54.4K	17.4K	0.3x	256	Apr 14
10	[image] Bay Area folks, I’m putting together a casual get-together in San Francisco this Saturday, May 9th Pull up to talk local AI, GPUs, infra, opensource, agents, homelabs, and whatever else the GPU-poisoned brain wants to discuss Limited capacity, first come first served	@TheAhmadOsman ✓	57.5K	17.1K	0.3x	115	May 6
11	[text] Help me spread this, I am on a role and need to squeeze every last bit out of it.	@0xSero ✓	32.3K	16.7K	0.5x	333	Mar 20
12	[image] which one of you is this?	@TheAhmadOsman ✓	51.5K	12.1K	0.2x	235	Apr 2
13	[video] New quick video, more coming up Goal isn't just to show how large models perform but also to show how small and medium models scale up as you multiply # of nodes and how Tensor Parallelism performs on Unified Memory hardware	@TheAhmadOsman ✓	53.2K	10.8K	0.2x	57	Apr 9
14	[image] RTX PRO 6000 (96GB VRAM, ~$15K) GIVEAWAY FAQ Q: Cost to enter? A: $0. Free. Q: Do I have to register for GTC? A: Yes, virtual attendance is COMPLETELY FREE Q: Where do I enter? A: Tap the link in my bio, there’s a clear button on the page Q: How do I increase my chances? A:	@TheAhmadOsman ✓	46.0K	9.5K	0.2x	89	Mar 7
15	[text] Tip of the day Always tell your agent to think through the plan comprehensively covering any and all relevant things, and using best practices and up-to-date information as of April 13th, 2026 (replace with today's date)	@TheAhmadOsman ✓	54.3K	9.1K	0.2x	109	Apr 14
16	[text] I meant to post this in here, I will be posting weekly models meant for limited hardware budgets, RN I am learning how to deal with the 30b~ class	@0xSero ✓	37.1K	6.0K	0.2x	94	Mar 28