vLLM
First-class integration via vLLM's load_format extension. Snapshot a live vLLM engine, restore into a fresh process, prefix-cache hash table rebuilt on the way back. RFC #34303 in flight upstream for sleep-mode integration.
RFC #34303 ↗from vllm import LLMfrom thaw_vllm import fork parent = LLM(model="meta-llama/Llama-3.1-8B", load_format="thaw") children = fork(parent, n=4) # PCIe Gen5 line-rate