| # | Tweet | Community | Topic | Views ▼ | Ratio | Engagement | Posted |
|---|---|---|---|---|---|---|---|
| 1 | [image] okay let me say this out loud again. if you want to run local models on a single RTX 3090, your best option right now is qwen 3.5 27B dense Q4_K_M. 35 tok/s, flat from 4K to 300K+ context, zero speed degradation.
thinking mode works. 262K native context on 24GB. slower than MoE | x/LocalLLaMA | — | 27.9K | 1.4x | 650 | Mar 28 |
| 2 | [text] hermes agent is already the best on local models. but i'm working on more edges to make it fly even harder.
before that, if your agent keeps crashing on local inference here's what to check:
> max_turns: default is tuned for fast frontier models. bump from 30 to 50. slow local | Hermes Agent | — | 21.0K | 0.7x | 420 | May 16 |
| 3 | [image] are you on v0.5.0 too? | Hermes Agent | — | 13.0K | 0.6x | 287 | Mar 29 |
| 4 | [image] teknium just shipped 7 pluggable memory providers. this is massive. your agent can now remember you across sessions with the backend YOU choose.
run 'hermes update' right now and then 'hermes memory setup' to pick your provider. if you're on local only, holographic uses SQLite | Hermes Agent | — | 11.3K | 0.5x | 109 | Apr 3 |