LLM推理优化
vLLM的Paged Attention:
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
S-LoRA:
S-LoRA: Serving Thousands of Concurrent LoRA Adapters (arxiv.org)
vLLM的Paged Attention:
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog
S-LoRA:
S-LoRA: Serving Thousands of Concurrent LoRA Adapters (arxiv.org)