I work on the intersection of Machine Learning and Computer Systems. I'm employed by Google as a Software Engineer. I live in San Francisco with my wife and son.
I work at Google. I started my career on the App Engine team, building large scale serving systems and data processing pipelines for observability and real time billing. I then spent a few years at Google Research building SmartChoices, a system for making learned decisions using reinforcement learning. We deployed it in Google's ML compilers, Youtube's CDN, Google Commerce and various ad serving systems. Cumulatively, we had an impact of over $x B through these efforts.
I am currently leading an effort to build useful applications using Large Language Models in Google Labs.
I also have wonderful research collaborations with Lin Zhong, Anurag Khandelwal and Jeff Bilmes.
I obtained an MS in Computer Science at Columbia University, NY under the supervision of Gail Kaiser. I did my undergraduate studies in Maths and Computing at IIT Kharagpur.
Lilly Kumari, Shengjie Wang, Tianyi Zhou, Nikhil Sarda, Anthony Rowe, and Jeff Bilmes, "BumbleBee: Dynamic KV-Cache Streaming Submodular Summarization for Infinite-Context Transformers" The 1st Conference on Language Modeling (COLM), Oct 2024.
In Gim, Guojun Chen, Seung-Seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong, "Prompt Cache: Modular Attention Reuse for Low-Latency Inference" Proc. Conf. Machine Learning and Systems (MLSys), May 2024.
Kumari, L., Shafqat, U., Sarda, N. (2023). Retrieval Augmented Generation for Dialog Modeling. The 3rd NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP 2023).
Golovin, D., Bartok, G., Chen, E., Donahue, E., Huang, T.-K., Kokiopoulou, E., Qin, R., Sarda, N., Sybrandt, J., & Tjeng, V. (2023). SmartChoices: Augmenting Software with Learned Implementations. ArXiv Preprint ArXiv:2304.13033.
Song, Z., Chen, K., Sarda, N., Altınbüken, D., Brevdo, E., Coleman, J., Ju, X., Jurczyk, P., Schooler, R., & Gummadi, R. (2023). {HALP}: Heuristic Aided Learned Preference Eviction Policy for {YouTube} Content Delivery Network. 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), 1149–1163.
Altınbüken, Deniz; Chen, Kevin; Gummadi, Ramki; Ju, Xiao; Sarda, Nikhil; and Song, Zhenyu, Cache Management Using Machine Learning , Technical Disclosure Commons, (February 25, 2022)
Phothilimthana, P. M., Sabne, A., Sarda, N., Murthy, K. S., Zhou, Y., Angermueller, C., Burrows, M., Roy, S., Mandke, K., Farahani, R., & others. (2021). A flexible approach to autotuning multi-pass machine learning compilers. 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 1–16.
Carbune, V., Coppey, T., Daryin, A., Deselaers, T., Sarda, N., & Yagnik, J. (2019). SmartChoices: hybridizing programming and machine learning. Reinforcement Learning for Real Life, ICML 2019 Workshop.
Bell, J., Sarda, N., & Kaiser, G. (2013). Chronicler: Lightweight recording to reproduce field failures. 2013 35th International Conference on Software Engineering (ICSE), 362–371.
Tang, Y., Ames, P., Bhamidipati, S., Bijlani, A., Geambasu, R., & Sarda, N. (2012). CleanOS: Limiting Mobile Data Exposure with Idle Eviction. 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), 77–91.
Mallik, R., Sarda, N., Kargupta, H., & Bandyopadhyay, S. (2011). Distributed data mining for sustainable smart grids. Proceedings of ACM SustKDD, 11, 1-6.
Program Committee for EdgeSys 24, EuroMLSys 24, AIOps 24
nikhilsarda [dot] iitkgp [at] gmail [dot] com