Joshua V. Dillon

I am a machine learning researcher based in Mountain View, CA.

My work spans generative AI and information theory and includes training foundational models at Google and Luma as well as academically notable research in probabilistic modeling, variational inference, and uncertainty estimation. I created TensorFlow Probability (2017; 4.4k Github stars, O(1M) downloads/month still). I co-created the prototype that would become Veo (2024). I contributed to Gemini (2024) and VideoPoet (2023; ICML Best Paper). I designed and wrote the auction mechanism used by ContentAds (2013–2019). My best known paper is the Deep Variational Information Bottleneck (2017), for which I co-developed the idea and math with my dear friend Alex Alemi.

I was a Staff Research Scientist in Google Research and Google DeepMind for a combined total of 13 years. Most recently, I led the foundational model pre-training team at Luma.

I received my Ph.D. from the Georgia Institute of Technology, advised by Professor Guy Lebanon. My thesis, Stochastic m-Estimators for Controlling Accuracy-Cost Tradeoffs in Machine Learning, proposed and proved statistical properties of what would much later become known as the BERT loss. I was awarded the DHS Fellowship in Data Analysis and Visual Analytics (2010–2012). I also was awarded the Marshall Sherfield Fellowship for American researchers visiting the UK (accepted to Cambridge, UK under Zoubin Ghahramani) but ultimately chose sunny California instead.

Joshua V. Dillon

Selected Publications

My research focuses on probabilistic machine learning, variational inference, and uncertainty estimation. For a complete list, see my Google Scholar (11,600+ citations, h-index 25).

Deep VIB
Deep Variational Information Bottleneck
AA Alemi, I. Fischer, JV Dillon, K. Murphy
ICLR, 2017  ·  2,704 citations

A variational approach to the information bottleneck principle, providing a tractable, deep-learning-compatible framework for learning compressed, relevant representations.

Likelihood Ratios OOD
Likelihood Ratios for Out-of-Distribution Detection
J. Ren, PJ Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, JV Dillon, B. Lakshminarayanan
NeurIPS, 2019  ·  985 citations

Using likelihood ratios to correct for background statistics, enabling reliable out-of-distribution detection with deep generative models.

TF Distributions
TensorFlow Distributions
JV Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan, D. Moore, B. Patton, et al.
arXiv, 2017  ·  724 citations

A library of probability distributions and bijectors for TensorFlow, forming the foundation of TensorFlow Probability. Enables efficient, composable probabilistic computation on accelerators.

Fixing ELBO
Fixing a Broken ELBO
A. Alemi, B. Poole, I. Fischer, JV Dillon, RA Saurous, K. Murphy
ICML, 2018  ·  709 citations

An information-theoretic analysis revealing that the standard ELBO objective is broken, and proposing principled fixes via rate-distortion theory.

Sequential Doc Viz
Sequential Document Visualization
G. Lebanon, Y. Mao, JV Dillon
IEEE TVCG, 2007  ·  55 citations

Methods for visualizing the sequential structure of documents using techniques from information geometry.

Stochastic CL
Stochastic Composite Likelihood
JV Dillon, G. Lebanon
JMLR, 2010  ·  33 citations

A family of point estimators resolving the computation-accuracy tradeoff in maximum likelihood, with consistency proofs and asymptotic variance formulas.

Weighted Ensemble SSL
Weighted Ensemble Self-Supervised Learning
Y. Ruan, S. Singh, W. Morningstar, AA Alemi, S. Ioffe, I. Fischer, JV Dillon
arXiv, 2022  ·  32 citations

Combining ensemble methods with self-supervised learning through weighted aggregation.

Joint Distributions
Joint Distributions for TensorFlow Probability
D. Piponi, D. Moore, JV Dillon
arXiv, 2020  ·  20 citations

A flexible API for specifying joint distributions in TensorFlow Probability, enabling compositional probabilistic modeling.

Sample What You Can't Compress
Sample What You Can't Compress
V. Birodkar, G. Barcik, J. Lyon, S. Ioffe, D. Minnen, JV Dillon
arXiv, 2024  ·  10 citations

A framework connecting sampling and compression in generative modeling.

VIB Half Bayes
VIB is Half Bayes
AA Alemi, W. Morningstar, B. Poole, I. Fischer, JV Dillon
arXiv, 2021  ·  4 citations

Showing the variational information bottleneck is equivalent to a half-Bayesian treatment.

Speed is Confidence
Speed is Confidence
JV Dillon
arXiv, 2025

Exploring the relationship between speed and confidence in machine learning systems.