All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
iMotions Software Platform
Openvino Docker Quick Start
Salam 119 Ai Decoded
Vllm
Review
LLM Split Inference
K80 LLM Inference
Fastest Back End for a 5070 Ti LLM
Metatrading Ai Cost
Qm8 Turn
Vllm Off
Vllm
Windows
Vllm
in Runpod Pod Tutorial
Vllm
vs Llamacpp vs
Vllm
Tutorial
Inference Models
LLM Video Generation
Kimi K2
Vllm
LLM Speed Comparison
Spitransvergexk
VLM
Ai Models Pics
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
iMotions Software Platform
Openvino Docker Quick Start
Salam 119 Ai Decoded
Vllm
Review
LLM Split Inference
K80 LLM Inference
Fastest Back End for a 5070 Ti LLM
Metatrading Ai Cost
Qm8 Turn
Vllm Off
Vllm
Windows
Vllm
in Runpod Pod Tutorial
Vllm
vs Llamacpp vs
Vllm
Tutorial
Inference Models
LLM Video Generation
Kimi K2
Vllm
LLM Speed Comparison
Spitransvergexk
VLM
Ai Models Pics
🌵 Speculative Speculative DecodingWhat if your draft model could speculate while the target model is still verifying? That's the idea behind Speculative Speculative Decoding (SSD). I've been… | Maxime Labonne | 15 comments
15 views
2 months ago
linkedin.com
Measuring Qwen3.6-27B NVFP4 MTP on vLLM: ~190 tok/s TG on Dual RTX PRO 6000 Blackwell Max-Q
1 week ago
loftllc.dev
Vienna vLLM Meetup Live Stream - March 11, 2026 | Ajit Joshi
15.3K views
2 months ago
linkedin.com
Speculative Decoding — Think Fast⚡, Then Think Right✅
Apr 13, 2025
substack.com
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
Aug 1, 2024
qualcomm.com
Faster LLMs: Accelerate Inference with Speculative Decoding
11 months ago
ibm.com
17:15
Multi-Token Prediction (MTP): Accelerating Local Models with no Quality Loss
1.4K views
1 week ago
YouTube
Onchain AI Garage
6:13
Speculative Decoding: Make AI 2-3x Faster for Free | Tech Decoded
3 views
1 month ago
YouTube
Toc am
8:37
Multi-Token Prediction: Why Your GPU Runs LLMs 3x Faster
4 views
1 week ago
YouTube
Devsplainers
3:08
What is Speculative Decoding ?
38 views
2 weeks ago
YouTube
DeepManim
7:09
Don't use speculative decoding until you watch this
7 views
3 weeks ago
YouTube
DigitalOcean
9:05
DFlash Just Hit Google TPUs — 3x Faster LLM Inference is Now Real
3K views
2 weeks ago
YouTube
Fahd Mirza
8:43
DFlash Drafter for Gemma 4 26B - Official Speculative Decoding is Here: Run Locally
4.8K views
2 weeks ago
YouTube
Fahd Mirza
1:31
ContextForge — AMD AI Hackathon 2026
1 week ago
YouTube
Pablo Manuel Suarez
40:19
Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
753 views
2 months ago
YouTube
Modal
8:27
600 Toks/Second Gemma4-26B —The Setting That Actually Wins (vLLM + Dflash Speculative Decoding)
3.4K views
1 week ago
YouTube
Tech-Practice
5:04
Speculative Decoding: 2-3x Faster LLMs for Free
1 views
1 month ago
YouTube
The AI Century
0:36
The regex trick that beats structured output #ai #coding #performance
1.3K views
1 month ago
YouTube
Jimi V. (Bitswired)
0:38
3.5K+ Stars • AI/ML | DFlash — Faster LLM Inference via Block Diffusion #shorts
1.1K views
1 week ago
YouTube
neural-nexus
0:31
Speculative Decoding • LLM Acceleration Patterns
1 views
1 month ago
YouTube
Technical Interview Essentials A–Z
0:48
5 AI Terms Devs Are Quietly Searching More — April 2026
194 views
3 weeks ago
YouTube
Colony-AI
9:32
Qwen3.6-27B NVFP4+MTP vLLM Benchmark TG 190tok/s — RTX PRO 6000 Blackwell Max-Q x 2
303 views
2 weeks ago
YouTube
ksh3
12:45
Speculative Decoding & Inference Speed — 2-3x Faster LLMs With Zero Quality Loss
1 week ago
YouTube
Jeff Heidelberger
2:35
别盲跟!SPEED-Bench 实测 Speculative Decoding 在 vLLM 值不值得
4 views
2 months ago
YouTube
AI 决策内参
4:56
NeMo RL:Speculative Decoding 把 8B rollout 提速到 1.8×,235B 估计可达2.5×
5 views
2 weeks ago
YouTube
智用
2:08
How ChatGPT Serves 100M Users in Real Time ⚡ (LLM Inference, Explained)
4 views
2 weeks ago
YouTube
Priya Bansal
3:54
2026-04-30|後端工程師的 AI 推論工程選型:從 batching 到 workload-specific runtime
1 week ago
YouTube
TodayShip
0:26
Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculati
10K views
1 week ago
x.com
Avi Chawla
1:09:25
Lecture 22 - Hacker s Guide to Speculative Decoding in VLLM
1 views
3 months ago
bilibili
安得广厦千万间678
0:10
Google just made Gemma 4 up to 3x faster. Zero quality loss.Multi-Token Prediction (MTP) drafters use speculative decoding:→ A lightweight drafter predicts several tokens at once→ The main model verifies them all in one pass→ Same output quality, up to 3x less wait timeWhere it matters:→ 26B MoE and 31B Dense on consumer GPUs→ E2B and E4B on edge/mobile devices→ Coding assistants, agents, voice appsApache 2.0. Available now on Hugging Face, Kaggle, Ollama, vLLM, SGLang.Gemma 4 hit 60M downloads
68 views
2 weeks ago
x.com
Ramesh Dontha 🦉
See more
More like this
Feedback