蔣兆凱 / Chao-Kai Chiang

Machine learning for imperfect supervision and adaptive decision making.

I develop principled machine learning methods for imperfect supervision, noisy labels, online learning, and bandit feedback.

My work connects theory, algorithms, and practical systems for learning when labels are weak or unreliable, distributions shift, or feedback is partial and sequential.

Google Scholar DBLP

Chao-Kai Chiang Assistant Professor Department of Computer Science National Yang Ming Chiao Tung University

Recruiting

Prospective students

I welcome motivated students interested in principled machine learning, especially weak supervision, noisy labels, distribution shift, online learning, bandit algorithms, and adaptive learning systems. Prospective students are encouraged to email me with a CV and a brief description of their research interests.

Research Vision

Learning reliably from imperfect supervision and adaptive feedback.

Modern machine learning is increasingly deployed in settings where clean labels, stable distributions, and complete feedback are unavailable. My research aims to build a principled foundation for learning under such realistic conditions: weak or noisy supervision, distribution shift, sequential feedback, and adaptive decision making. The long-term goal is to understand the weakest yet learnable forms of supervision and to design algorithms that remain reliable beyond idealized data assumptions.

3,647+ Google Scholar citations

2,979 citations for Federated Multi-Task Learning

327 citations for Online Optimization with Gradual Variations

Research

Research agenda

My current research develops theory and algorithms for machine learning systems that must learn from imperfect supervision, changing environments, and partial feedback. I focus on problems where statistical reliability, optimization behavior, and practical applicability must be studied together.

Weak and noisy supervision

I study weakly supervised learning and learning with label noise, including unified risk analysis, reliable validation under noisy labels, robust imitation learning from vague feedback, and learning under complex supervision transitions.

Online learning and bandits

I develop algorithms and regret analyses for online convex optimization and multi-armed bandits, with interests in best-of-both-worlds guarantees, Thompson sampling, Pareto front identification, and contextual or dueling feedback.

Adaptive learning systems

I connect theoretical insights with practical systems, including federated multi-task learning, budgeted hyper-parameter tuning, LLM routing, and adaptive decision making with preference or bandit feedback.

          17
          WSL/LLN settings unified in recent risk-analysis work
        
          NeurIPS, ICML, COLT
          Publications spanning both theory and applied machine learning
        
          2012
          Mark Fulk Best Student Paper Award at COLT

Weakly supervised learning Learning with label noise Multi-armed bandits Online convex optimization Robust machine learning Foundation model feedback

Selected Publications

Representative papers

Selected first by recent top-conference publications, then by citation impact within the main research areas.

TMLR 2025Weak supervision11 citations

Unified Risk Analysis for Weakly Supervised Learning

Chao-Kai Chiang and Masashi Sugiyama

A unified contamination-decontamination view of weak supervision and risk rewriting across WSL/LLN settings.

AISTATS 2025Domain adaptation6 citations

Domain Adaptation and Entanglement: an Optimal Transport Perspective

Okan Koc, Alexander Soen, Chao-Kai Chiang, and Masashi Sugiyama

Provides an optimal-transport view of domain adaptation and introduces entanglement as a term explaining when alignment fails.

AAAI 2024Bandits1 citation

The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

Jongyeong Lee, Chao-Kai Chiang, and Masashi Sugiyama

Studies how prior choices affect Thompson sampling in multiparameter bandit models.

NeurIPS 2023Weak supervision / imitation learning13 citations

Imitation Learning from Vague Feedback

Xin-Qiang Cai, Yu-Jie Zhang, Chao-Kai Chiang, and Masashi Sugiyama

Applies weakly supervised learning ideas to recover useful imitation signals from vague feedback.

ICML 2023Thompson sampling4 citations

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

Jongyeong Lee, Junya Honda, Chao-Kai Chiang, and Masashi Sugiyama

Analyzes when Thompson sampling is optimal for heavy-tailed Pareto bandits and how truncation restores optimality.

NeurIPS 2017Federated learning2,979 citations

Federated Multi-Task Learning

Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet Talwalkar

A systems-aware federated multi-task learning framework for heterogeneous distributed data.

COLT 2012Online learning327 citationsBest Student Paper

Online Optimization with Gradual Variations

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu

Introduced variation-aware regret analysis for online optimization in gradually changing environments.

COLT 2016Bandits152 citations

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Peter Auer and Chao-Kai Chiang

A best-of-both-worlds bandit algorithm with nearly optimal pseudo-regret across stochastic and adversarial settings.

Publications

Full publication list

The list below is compiled from the uploaded publication list and CV. Use the filters to browse by category.

Publication PDF Google Scholar DBLP

2025JournalTMLR

Unified Risk Analysis for Weakly Supervised Learning

Chao-Kai Chiang and Masashi Sugiyama

Transactions on Machine Learning Research, 2025.

2017JournalJIHMSP

Online Learning Problems against Dynamic Strategies in Gradually Evolving Worlds

Chia-Jung Lee, Chao-Kai Chiang, and Mu-En Wu

Journal of Information Hiding and Multimedia Signal Processing 8(4): 869-879.

2025ConferenceAISTATS

Domain Adaptation and Entanglement: an Optimal Transport Perspective

Okan Koc, Alexander Soen, Chao-Kai Chiang, and Masashi Sugiyama

International Conference on Artificial Intelligence and Statistics, pp. 3034-3042.

2024ConferenceAAAI

The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

Jongyeong Lee, Chao-Kai Chiang, and Masashi Sugiyama

AAAI Conference on Artificial Intelligence, pp. 13383-13390.

2023ConferenceNeurIPS

Imitation Learning from Vague Feedback

Xin-Qiang Cai, Yu-Jie Zhang, Chao-Kai Chiang, and Masashi Sugiyama

Advances in Neural Information Processing Systems, pp. 48275-48292.

2023ConferenceICML

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

Jongyeong Lee, Junya Honda, Chao-Kai Chiang, and Masashi Sugiyama

International Conference on Machine Learning, pp. 18810-18851.

2019ConferenceIJCAI

Hyper-parameter Tuning under a Budget Constraint

Zhiyun Lu, Liyu Chen, Chao-Kai Chiang, and Fei Sha

International Joint Conference on Artificial Intelligence, pp. 5744-5750.

2017ConferenceNeurIPS

Federated Multi-Task Learning

Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet Talwalkar

Advances in Neural Information Processing Systems, pp. 4424-4434.

2016ConferenceCOLT

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Peter Auer and Chao-Kai Chiang

Annual Conference on Learning Theory, pp. 116-120. Alphabetical order; full version available as preprint.

2016ConferenceAISTATS

Pareto Front Identification from Stochastic Bandit Feedback

Peter Auer, Chao-Kai Chiang, Ronald Ortner, and Madalina M. Drugan

International Conference on Artificial Intelligence and Statistics, pp. 939-947.

2015ConferenceRVSP

Resisting Dynamic Strategies in Gradually Evolving Worlds

Chia-Jung Lee, Chao-Kai Chiang, and Mu-En Wu

International Conference on Robot, Vision and Signal Processing, pp. 191-194. Short conference paper.

2014ConferenceACML

Pseudo-reward Algorithms for Contextual Bandits with Linear Payoff Functions

Ku-Chun Chou, Hsuan-Tien Lin, Chao-Kai Chiang, and Chi-Jen Lu

Asian Conference on Machine Learning, pp. 344-359.

2013ConferenceCOLT

Beating Bandits in Gradually Evolving Worlds

Chao-Kai Chiang, Chia-Jung Lee, and Chi-Jen Lu

Annual Conference on Learning Theory, pp. 210-227. Alphabetical order.

2012ConferenceCOLT

Online Optimization with Gradual Variations

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu

Annual Conference on Learning Theory, pp. 6.1-6.20. Mark Fulk Best Student Paper Award.

2010ConferenceSODA

Online Learning with Queries

Chao-Kai Chiang and Chi-Jen Lu

Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 616-629.

2025PreprintarXiv

LLM Routing with Dueling Feedback

Chao-Kai Chiang, Takashi Ishida, and Masashi Sugiyama

arXiv:2510.00841. Under double-blind review in uploaded publication list.

2023PreprintarXiv

Asymptotically Optimal Thompson Sampling Based Policy for the Uniform Bandits and the Gaussian Bandits

Jongyeong Lee, Chao-Kai Chiang, and Masashi Sugiyama

arXiv:2302.14407v1. Preliminary version of the AAAI 2024 paper.

2019PreprintarXiv

Hyper-parameter Tuning under a Budget Constraint

Zhiyun Lu, Chao-Kai Chiang, and Fei Sha

arXiv:1902.00532. Preliminary version of the IJCAI 2019 paper.

2016PreprintarXiv

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Peter Auer and Chao-Kai Chiang

arXiv:1605.08722. Full version of the COLT 2016 paper.

Contact

Get in touch

Department of Computer Science
National Yang Ming Chiao Tung University

chaokai.utokyo@gmail.com Google Scholar DBLP