Global Trend Radar
Web: vllm.ai US web_search 2026-05-05 11:36

vLLM

元記事を開く →

分析結果

カテゴリ
AI
重要度
66
トレンドスコア
30
要約
vLLMは、高スループットかつメモリ効率の良いLLM(大規模言語モデル)の推論および提供エンジンです。誰でも簡単、迅速、コスト効率良くLLMを提供できることを目指しています。
キーワード
vLLM Menu Theme The High-Throughput and Memory-Efficient inference and serving engine for LLMs Easy, fast, and cost-efficient LLM serving for everyone. Get Started Documentation Easy Deploy the widest range of open-source models on any hardware. Includes a drop-in OpenAI-compatible API for instant integration. Fast Maximize throughput with PagedAttention. Advanced scheduling and continuous batching ensure peak GPU utilization. Cost Efficient Slash inference costs by maximizing hardware efficiency. We make high-performance LLMs affordable and accessible to everyone. Quick Start Select your preferences and run the install command. Stable represents the most currently tested and supported version of vLLM. Nightly is available if you want the latest builds. 📦 Requires Python 3.10+. Python 3.12+ recommended. ⚡ We recommend uv for faster and more reliable installation. 🔧 For other platforms, see docs.vllm.ai 🎉 See what's new in 🔍 Find which release contains a PR Build Stable Nightly Platform CUDA ROCm XPU CPU Package Python (uv) Python Docker CUDA Version CUDA 13.0 CUDA 12.9 Run this Command: uv pip install vllm --torch-backend auto 💡 Compatible with all CUDA 13.x versions (13.0 - 13.1) · Troubleshooting Looking for older versions? Sponsors vLLM is a community project. Our compute resources for development and testing are supported by the following organizations. Thank you for your support! Cash Donations a16z Sequoia Capital Skywork AI ZhenFund Compute Resources Alibaba Cloud AMD Anyscale AWS Crusoe Cloud Google Cloud IBM Intel Lambda Lab Nebius Novita AI NVIDIA Red Hat Roblox RunPod UC Berkeley Slack Sponsor Inferact — Stars — ⭐ — Contributors — 👥 PyTorch Foundation We collect donation through GitHub and OpenCollective . We plan to use the fund to support the development, maintenance, and adoption of vLLM. Universal Compatibility One engine, endless possibilities. Run any model on any hardware. Hardware Unified API across platforms NVIDIA CUDA GPU Popular AMD ROCm GPU Huawei Ascend NPU AWS Neuron Accelerator Google Cloud TPU IBM Spyre Accelerator Intel Gaudi XPU CPU Apple Apple Silicon View all supported hardware Open Models Latest trending open-source models, optimized & production-ready DeepSeek DeepSeek V4 DeepSeek V3.2 DeepSeek R1 Google Gemma 4 Gemma 3 Meta Llama 4 Scout Llama 4 Maverick Minimax MiniMax M2.7 MiniMax M2.5 MiniMax M2.1 Mistral AI Mistral Small 4 Mistral Large 3 MoonshotAI Kimi K2.6 Kimi K2.5 Kimi K2 NVIDIA Nemotron 3 Super Nemotron 3 Nano Qwen Qwen3.6 Qwen3.5 Qwen3 StepFun Step-3.5-Flash Z-AI GLM 5.1 GLM 5 GLM 4.7 View all supported models Everyone welcome! Got questions? We're here to help. Whether you're just getting started or debugging a complex deployment, our community is open to everyone. No question is too basic! Fast & friendly responses Active maintainers Join Slack Real-time help & discussions Visit Forum Searchable Q&A knowledge base GitHub Issues Bug reports & feature requests Resources Explore recipes, benchmarks, and roadmap Recipes Example notebooks and tutorials recipes.vllm.ai Performance Benchmarks and comparisons perf.vllm.ai Roadmap Project roadmap and milestones roadmap.vllm.ai

類似記事(ベクトル近傍)