Weijia Shi | Notion

Screenshot 2024-04-15 at 1.13.43 AM.png

Pronouns: she/her

My name in Chinese: 施惟佳

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/cd361d2a-9856-49d3-bfa3-abc0d900c2fa/25231.png" width="40px" /> Github

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f30b1dd3-7384-4ae2-b912-11f36f7e174f/Picture2.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f30b1dd3-7384-4ae2-b912-11f36f7e174f/Picture2.png" width="40px" /> Google Scholar

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f3ef8b9f-d4d4-4dd3-8fe4-b19b251c7eec/Picture1.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/f3ef8b9f-d4d4-4dd3-8fe4-b19b251c7eec/Picture1.png" width="40px" /> Semantic Scholar

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/4a681ff9-9560-425e-a09b-83cefa5ab4e8/twitter-3.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/c48c1754-22f6-4247-b5fe-2f8fc6074fd7/4a681ff9-9560-425e-a09b-83cefa5ab4e8/twitter-3.png" width="40px" /> Twitter

</aside>

👋 Hi!

I am Weijia Shi, a PhD student in Computer Science at the University of Washington advised by Prof. Luke Zettlemoyer and Prof. Noah A. Smith. I have been a visiting research at Meta AI, working with Scott Yih. Prior to UW, I graduated from UCLA with a B.S. in Computer Science and Minor in Math.

🌋 Research Interests

My main research focuses on natural language processing and machine learning. I am particularly interested in retrieval-augmented LMs and trustworthy AI. My goal is to build LMs that are able to communicate with external knowledge and personal data securely and robustly.

<aside> 🌱 What’s NEW

☑️ Office hours: Starting November 2023, I will be holding office hours (1~2 hours a week) dedicated to offering mentorship and advice to undergraduate/master students. If you want to chat about research and grad school application, please fill out the form

☑️ Honored to be selected as 2023 Machine Learning Rising Star ☑️ Two workshops accepted to *CL conferences. Stay tuned!

The 3rd Workshop on Knowledge Augmented Methods for NLP (ACL 2024)
Workshop on Customizable NLP (EMNLP 2024)

☑️ Organized 2nd Workshop on Knowledge Augmented Methods for NLP (KDD 2023)

</aside>

📜 Selected Publications

Please see my Google Scholar or Semantic Scholar profiles for the full list.

(*: equal contribution)

Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models

Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov.

ICLR Oral. 2024. [paper][code]

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

ICLR Spotlight. 2024. [paper][code]

Detecting Pretraining Data from Large Language Models

Weijia Shi**,* Anirudh Ajith*, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer

ICLR. 2024. [paper] [website][code]

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Sewon Min*, Suchin Gururangan*, Eric Wallace, Weijia Shi, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer.

ICLR Spotlight. 2024. [paper][code]

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding.

Weijia Shi,* Xiaochuang Han*, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih.

NAACL. 2024. [paper][code]

REPLUG: Retrieval-Augmented Black-Box Language Models

Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

NAACL. 2024. [paper][code]

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Scott Wen- tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

ACL, 2023. [paper] [website][model (🌟 3M downloads on HuggingFace)]

Toward Human Readable Prompt Tuning: Kubrick’s The Shining is a good movie, and a good prompt too?

Weijia Shi*, Xiaochuang Han*, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, Luke Zettlemoyer

EMNLP, 2023. [paper]

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu Wu*, Yushi Hu*, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

NeurIPS Spotlight, 2023. [paper] [website][code]

kNN-Prompt: Nearest neighbor zero-shot inference.

Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer

EMNLP, 2022. [paper] [code]

💬 Invited Talks

2024/03: Meta AI, AI reading group

Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Google Research

Title: Detecting Pretraining Data from Large Language Models
2024/01: Google, NLP reading group

Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2024/02: Cohere

Title: In-Context Pretraining: Language Modeling Beyond Document Boundaries
2023/12: KAIST, IBS Data Science Group

Title: Detecting Pretraining Data from Large Language Model
2023/03: Microsoft Cognitive Service Research Group

Title: REPLUG: Retrieval-Augmented Black-Box Language Models

🔬 Research Experience

University of Washington, 09/2020–Present

Ph.D. student, supervised by Luke Zettlemoyer and Noah A. Smith

Meta AI, 06/2022–Present

Visiting Researcher, supervised by Scott Yih

University of Pennsylvania, 05/2019–09/2019

Research Intern, supervised by Dan Roth

UCLA, 04/2018–06/2020

Research assistant, supervised by Kai-Wei Chang and Adnan Darwiche