I am a 5th-year Ph.D. candidate at Shanghai Jiao Tong University (since 2021), advised by Prof. Jianxun Li. I also work closely with Prof. Hao Wang from Stevens Institute of Technology. Currently, I am a research intern at the System Group of Microsoft Research Asia (since Jun 2025).
My research interests lie in Serverless Computing, Efficient LLM Systems, and Cloud Systems.
Research
My research focuses on re-architecting the serverless paradigm for state-of-the-art AI workflows. Serverless computing is critical for AI as it scales resources on demand, eliminating idle GPU waste and reducing costs. However, current serverless platforms are not yet well-suited for AI workloads. My work addresses this gap across the AI stack—from efficient model inference to LLM serving to agentic workflows.
Selected Publications
See Google Scholar for full list.
-
Act While Thinking: Accelerating LLM Agent Serving via Speculative Tool Execution
Under Submission, 2026
[paper] -
xLoRA: Faster and Cheaper LoRA LLM Serving with Serverless Computing
Under Submission, 2025
[paper] [arXiv] -
Accelerating ML Inference via Opportunistic Pre-Loading on Serverless Clusters
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2025
[paper] -
Pre-Warming Is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading
Best Paper Award
ACM Symposium on Cloud Computing (SoCC), 2024
[paper]
Education
- Ph.D. Candidate, Shanghai Jiao Tong University, 2021 - Present
- B.S. , Beijing University of Posts and Telecommunications, 2017 - 2021