Joshua V. Dillon

I am a machine learning researcher based in Mountain View, CA.

My work spans generative AI and information theory and includes training foundational models at Google and Luma as well as academically notable research in probabilistic modeling, variational inference, and uncertainty estimation. I created TensorFlow Probability (2017; 4.4k Github stars, O(1M) downloads/month still). I co-created the prototype that would become Veo (2024). I contributed to Gemini (2024) and VideoPoet (2023; ICML Best Paper). I designed and wrote the auction mechanism used by ContentAds (2013–2019). My best known paper is the Deep Variational Information Bottleneck (2017), for which I co-developed the idea and math with my dear friend Alex Alemi.

I was a Staff Research Scientist in Google Research and Google DeepMind for a combined total of 13 years. Most recently, I led the foundational model pre-training team at Luma.

I received my Ph.D. from the Georgia Institute of Technology, advised by Professor Guy Lebanon. My thesis, Stochastic m-Estimators for Controlling Accuracy-Cost Tradeoffs in Machine Learning, proposed and proved statistical properties of what would much later become known as the BERT loss. I was awarded the DHS Fellowship in Data Analysis and Visual Analytics (2010–2012). I also was awarded the Marshall Sherfield Fellowship for American researchers visiting the UK (accepted to Cambridge, UK under Zoubin Ghahramani) but ultimately chose sunny California instead.

Selected Publications

My research focuses on probabilistic machine learning, variational inference, and uncertainty estimation. For a complete list, see my Google Scholar (11,600+ citations, h-index 25).

Deep Variational Information Bottleneck

AA Alemi, I. Fischer, JV Dillon, K. Murphy

ICLR, 2017 · 2,704 citations

A variational approach to the information bottleneck principle, providing a tractable, deep-learning-compatible framework for learning compressed, relevant representations.

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, JV Dillon, B. Lakshminarayanan, J. Snoek

NeurIPS, 2019 · 2,583 citations

A large-scale empirical study of predictive uncertainty methods under dataset shift, finding that ensembles are the most reliable approach.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, ..., JV Dillon, et al.

Technical Report, 2025 · 2,040 citations

Google DeepMind's frontier multimodal model with advanced reasoning, long context, and agentic capabilities.

Likelihood Ratios for Out-of-Distribution Detection

J. Ren, PJ Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, JV Dillon, B. Lakshminarayanan

NeurIPS, 2019 · 985 citations

Using likelihood ratios to correct for background statistics, enabling reliable out-of-distribution detection with deep generative models.

TensorFlow Distributions

JV Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan, D. Moore, B. Patton, et al.

arXiv, 2017 · 724 citations

A library of probability distributions and bijectors for TensorFlow, forming the foundation of TensorFlow Probability. Enables efficient, composable probabilistic computation on accelerators.

Fixing a Broken ELBO

A. Alemi, B. Poole, I. Fischer, JV Dillon, RA Saurous, K. Murphy

ICML, 2018 · 709 citations

An information-theoretic analysis revealing that the standard ELBO objective is broken, and proposing principled fixes via rate-distortion theory.

VideoPoet: A Large Language Model for Zero-Shot Video Generation

D. Kondratyuk, L. Yu, X. Gu, J. Lezama, J. Huang, G. Schindler, R. Hornung, ..., JV Dillon, et al.

arXiv, 2023 · 448 citations / Best Paper

A large language model capable of zero-shot video generation across a variety of video generation tasks.

NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

M. Hoffman, P. Sountsov, JV Dillon, I. Langmore, D. Tran, S. Vasudevan

arXiv, 2019 · 158 citations

Using normalizing flows to reparameterize Hamiltonian Monte Carlo, neutralizing pathological posterior geometries.

Density of States Estimation for Out-of-Distribution Detection

W. Morningstar, C. Ham, A. Gallagher, B. Lakshminarayanan, A. Alemi, JV Dillon

AISTATS, 2021 · 126 citations

Using density of states estimation from statistical physics for reliable out-of-distribution detection.

Uncertainty in the Variational Information Bottleneck

AA Alemi, I. Fischer, JV Dillon

arXiv, 2018 · 123 citations

Exploring the uncertainty properties that naturally arise from the variational information bottleneck framework.

The Locally Weighted Bag of Words Framework for Documents

G. Lebanon, Y. Mao, JV Dillon

JMLR, 2007 · 102 citations

A framework for representing sequential text using locally weighted bag of words, applied to classification, segmentation, summarization, and visualization.

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

J. Swiatkowski, K. Roth, BS Veeling, L. Tran, JV Dillon, J. Snoek, S. Mandt, et al.

ICML, 2020 · 76 citations

A compact parameterization for Gaussian mean field posteriors that scales Bayesian neural networks efficiently.

Hydra: Preserving Ensemble Diversity for Model Distillation

L. Tran, BS Veeling, K. Roth, J. Swiatkowski, JV Dillon, J. Snoek, S. Mandt, et al.

arXiv, 2020 · 73 citations

A method for distilling ensembles into a single model while preserving the diversity of the ensemble.

Sequential Document Visualization

G. Lebanon, Y. Mao, JV Dillon

IEEE TVCG, 2007 · 55 citations

Methods for visualizing the sequential structure of documents using techniques from information geometry.

tfp.mcmc: Modern Markov Chain Monte Carlo Tools Built for Modern Hardware

J. Lao, C. Suter, I. Langmore, C. Chimisov, A. Saxena, P. Sountsov, D. Moore, RA Saurous, MD Hoffman, JV Dillon

arXiv, 2020 · 51 citations

The MCMC library within TensorFlow Probability, designed for modern hardware with automatic batching and XLA support.

Automatic Differentiation Variational Inference with Mixtures

W. Morningstar, S. Vikram, C. Ham, A. Gallagher, JV Dillon

AISTATS, 2021 · 47 citations

Extending automatic differentiation variational inference to mixture posteriors for improved approximation.

PACm-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime

WR Morningstar, A. Alemi, JV Dillon

AISTATS, 2022 · 43 citations

A PAC-Bayes framework that narrows the empirical risk gap when the Bayesian model is misspecified.

Stochastic Composite Likelihood

JV Dillon, G. Lebanon

JMLR, 2010 · 33 citations

A family of point estimators resolving the computation-accuracy tradeoff in maximum likelihood, with consistency proofs and asymptotic variance formulas.

Weighted Ensemble Self-Supervised Learning

Y. Ruan, S. Singh, W. Morningstar, AA Alemi, S. Ioffe, I. Fischer, JV Dillon

arXiv, 2022 · 32 citations

Combining ensemble methods with self-supervised learning through weighted aggregation.

A Unified Optimization Framework for Robust Pseudo-Relevance Feedback Models

JV Dillon, K. Collins-Thompson

CIKM, 2010 · 29 citations

A flexible optimization framework for constraining probabilistic models with imprecise domain knowledge, applied to robust pseudo-relevance feedback for information retrieval.

Statistical Translation, Heat Kernels, and Expected Distances

JV Dillon, Y. Mao, G. Lebanon, J. Zhang

UAI, 2007 · 26 citations

Machine translation and diffusion kernels for unsupervised metric learning of text.

Joint Distributions for TensorFlow Probability

D. Piponi, D. Moore, JV Dillon

arXiv, 2020 · 20 citations

A flexible API for specifying joint distributions in TensorFlow Probability, enabling compositional probabilistic modeling.

Asymptotic Analysis of Generative Semi-Supervised Learning

JV Dillon, K. Balasubramanian, G. Lebanon

ICML, 2010 · 19 citations

Quantifying the asymptotic accuracy of generative semi-supervised learning based on stochastic composite likelihood.

Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications

M. Streeter, JV Dillon

arXiv, 2022 · 17 citations

Automatic methods for tighter bounding of Taylor remainder series with applications.

Statistical and Computational Tradeoffs in Stochastic Composite Likelihood

JV Dillon, G. Lebanon

AISTATS, 2009 · 17 citations

Examining statistical and computational tradeoffs in stochastic composite likelihood estimation.

Sample What You Can't Compress

V. Birodkar, G. Barcik, J. Lyon, S. Ioffe, D. Minnen, JV Dillon

arXiv, 2024 · 10 citations

A framework connecting sampling and compression in generative modeling.

VIB is Half Bayes

AA Alemi, W. Morningstar, B. Poole, I. Fischer, JV Dillon

arXiv, 2021 · 4 citations

Showing the variational information bottleneck is equivalent to a half-Bayesian treatment.

Sharp Taylor Polynomial Enclosures in One Dimension

M. Streeter, JV Dillon

arXiv, 2023 · 3 citations

Sharp polynomial enclosures for bounding univariate functions via Taylor series.

Speed is Confidence

JV Dillon

arXiv, 2025

Exploring the relationship between speed and confidence in machine learning systems.

Stochastic m-Estimators: Controlling Accuracy-Cost Tradeoffs in Machine Learning

JV Dillon

Ph.D. Dissertation, Georgia Institute of Technology, 2011

Proposes and proves statistical properties of what would later become known as the BERT loss.