Local AI

Columns:

#	Tweet	User	Followers	Views ▼	Ratio	Engagement	Posted
1	[text] Breakthroughs: 1. Turboquant merged into vLLM 75% vram reduction for kvcache near losslsss 2. Someone merged M2.7 & M2.5 & got it to perform better than M2.7 3. 40% faster prefill on AMD strix halo (128gb for MoEs w 10B active params) 4. Megatrain 100B model trained on 1 GPU	@0xSero ✓	43.7K	54.2K	1.2x	1.3K	Apr 16
2	[text] Here’s what I’d recommend if you’re just getting started in AI, local or otherwise. 1. Work with the compute you have, even the dumbest LLMs can be useful if you treat them as a node in your system. Some basic problems of what could be useful to get you started - tag all	@0xSero ✓	40.7K	23.1K	0.6x	753	Apr 3
3	[image] Best harnesses for local models: 1. Droid: - Very good performance, forces the models to behave, you can wire in all your local LLMs very easily w BYOK - Allows you to use your local models as orchestrators/subagents so you can benefit from Cloud as models as well - Practically	@0xSero ✓	41.6K	19.5K	0.5x	478	Apr 4
4	[image] Guide to running BIG B0Is on your small hardware. 1. Use REAPs: up to 50% savings 2. Use quantisations: 75% savings - AWQ / GPTQ / W4A16 / FP8 = FAST inference - GGUF / EXL3 = Slow but just works - MLX = Best for apple 3. Use 8bit KV cache: 50-75% savings	@0xSero ✓	43.4K	14.6K	0.3x	353	Apr 14
5	[text] Local AI is a human right, our children, families, neighbours, friends, and fellow humans deserve privacy and freedom.	@0xSero ✓	42.2K	6.2K	0.1x	202	Apr 7