lord
The technical terms there are later explained and diagrammed, and the recommendations derived from something close to first principles (e.g. roofline analysis).
Do you have benchmarks for the SGLang vs vLLM latency and throughput question? Not to challenge your point, but I’d like to reproduce these results and fiddle with the configs a bit, also on different models & hardware combos.
(happy modal user btw)
OCD-driven fix: The correct Latin quote is "Gallia est omnis divisa in partes tres".