From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of t
From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-stream decode on the 5090 (beats llama.cpp, at-or-ahead of vLLM on NVFP4). 100% written by Claude Code.
Marketplace
Independent
Category
engineering
More like this
Browse engineering agents →
Refrax
Command-Line Agentic Refactoring of Java Code
Free
engineeringOpencode Plan Manager
A simple collection of tools for better plan management by AI agents on OpenCode.
Free
engineeringTabnine
Privacy-first AI code completion for enterprise teams
$12/mo
engineeringKitwork
Automate kit workflows effortlessly with a lightweight, high-performance, fast, and flexible engine for cloud or self-hosted environments.
Free