← Topics

x/LocalLLaMA

members
15 tweets
Columns:
# Tweet User Followers Views Ratio Engagement Posted
1
[image] You don’t “run a model” You run Kernels The model is just a graph The Inference Engine is scheduler / optimizer / executor But the actual work? That happens in the Kernels - MatMul Kernels - Attention Kernels - RMSNorm Kernels - KV cache Kernels - Quantized linear Kernels -
@TheAhmadOsman 56.3K 126.9K 2.3x 1.1K Apr 26
2
[text] 1/ 🧵 I just cracked open the Claude Code source — and what I found isn’t “just a smarter terminal chat.” It’s a full-blown behavioral observatory running in your machine. 1. Keyword sniffers. 2. Hesitation trackers. 3. Hidden trigger words. 4. Telemetry that fingerprints
@UsmanReads 357 53.3K 149.3x 541 Mar 31
3
[image] Just some numbers so you don’t get misled RTX 3090 (7 years old) > 24GB VRAM > Bandwidth: 936.2 GB/s > Bi-directional NVLink 112GB/s RTX PRO 4000 > 24GB VRAM > Bandwidth: 672 GB/s > No Bi-directional NVLink, > need 32 Gen. 5 PCIe Lanes to pool 2 at 64GB/s
@TheAhmadOsman 52.2K 42.1K 0.8x 286 Apr 6
4
[image] How am I connecting the DGX Spark cluster > Mikrotik CRS804-4DDQ 1.6Tbps switch > (4) 400G QSFP-DD to 2x 200G QSFP56 Each DGX Spark has a ConnectX-7 supporting 200Gbps Each cable out of the switch goes into 2 DGX Sparks This allows 8x DGX Sparks cluster at the full 1.6Tbps
@TheAhmadOsman 43.3K 33.0K 0.8x 214 Feb 19
5
[image] X is killing x/LocalLLaMA Community tonight We built the new permanent home - links DOT theahmadosman DOT com/discord-server That's where you go now
@TheAhmadOsman 60.1K 29.6K 0.5x 105 May 30
6
[image] okay let me say this out loud again. if you want to run local models on a single RTX 3090, your best option right now is qwen 3.5 27B dense Q4_K_M. 35 tok/s, flat from 4K to 300K+ context, zero speed degradation. thinking mode works. 262K native context on 24GB. slower than MoE
@sudoingX 20.2K 27.9K 1.4x 650 Mar 28
7
[text] Most people think VRAM = model size and that’s why their runs crash GPU memory math is complex and so are the implications Here’s how it actually works in a nutshell ↓
@TheAhmadOsman 51.8K 23.7K 0.5x 239 Apr 3
8
[image] ~2,000 miles from home - SSH into the mothership - WireGuard tunnel like I’m on LAN - Resolve everything over private DNS - Routes via reverse proxies - Dispatch jobs to GPUs - Sync state across agents - Search conversations, traces, artifacts, logs Read how below
@TheAhmadOsman 56.7K 21.9K 0.4x 238 May 1
9
[text] What am I working on? Condensing everything I do into one place: - local AI / LLMs - inference + benchmarking - hardware + cluster builds - LLM research + notes - agent workflows - real-world perf (tokens/sec, concurrency, thermals) All into a single, searchable, indexable
@TheAhmadOsman 54.4K 17.4K 0.3x 256 Apr 14
10
[image] Bay Area folks, I’m putting together a casual get-together in San Francisco this Saturday, May 9th Pull up to talk local AI, GPUs, infra, opensource, agents, homelabs, and whatever else the GPU-poisoned brain wants to discuss Limited capacity, first come first served
@TheAhmadOsman 57.5K 17.1K 0.3x 115 May 6
11
[text] Help me spread this, I am on a role and need to squeeze every last bit out of it.
@0xSero 32.3K 16.7K 0.5x 333 Mar 20
12
[image] which one of you is this?
@TheAhmadOsman 51.5K 12.1K 0.2x 235 Apr 2
13
[video] New quick video, more coming up Goal isn't just to show how large models perform but also to show how small and medium models scale up as you multiply # of nodes and how Tensor Parallelism performs on Unified Memory hardware
@TheAhmadOsman 53.2K 10.8K 0.2x 57 Apr 9
14
[image] RTX PRO 6000 (96GB VRAM, ~$15K) GIVEAWAY FAQ Q: Cost to enter? A: $0. Free. Q: Do I have to register for GTC? A: Yes, virtual attendance is COMPLETELY FREE Q: Where do I enter? A: Tap the link in my bio, there’s a clear button on the page Q: How do I increase my chances? A:
@TheAhmadOsman 46.0K 9.5K 0.2x 89 Mar 7
15
[text] Tip of the day Always tell your agent to think through the plan comprehensively covering any and all relevant things, and using best practices and up-to-date information as of April 13th, 2026 (replace with today's date)
@TheAhmadOsman 54.3K 9.1K 0.2x 109 Apr 14
16
[text] I meant to post this in here, I will be posting weekly models meant for limited hardware budgets, RN I am learning how to deal with the 30b~ class
@0xSero 37.1K 6.0K 0.2x 94 Mar 28