About

Hi! I'm Ziche Liu, a junior undergrad at the Chinese University of Hong Kong, Shenzhen, supervised by Prof. Haizhou Li and closely working with Dr. Feng Jiang with a focus on data selection techniques for LLM fine-tuning.

Previously, I was also supervised by Prof. Benyou Wang and worked on some exicting projects about low-resource language grounding and bias analysis in LLM-as-a-judge.

 

Currently, I'm a visiting student at UC Berkeley. [Cal Vibe Check]

Research

Generally, I have a broad interest in large language and vision models, old-school CV and NLP tricks and reinforcement learning. Practically, I've been exploring high-efficiency training methods in fine-tuning LLMs. Lately, I'm especially fascinated by LLM reasoning capabilities and model-based RL.

 

I am also intrigued by any topics related to human consciousness, memory system and intelligence. Feel free to catch me for a chat!

Publications

Paper Thumbnail

Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models

Ziche Liu*, Rui Ke*, Feng Jiang, Haizhou Li (*=Equal Contribution)

NAACL 2025

Paper / Code / Demo

TL;DR: We propose a three-stage scheme to standardize data selection methods and develop two metrics (efficiency and flexibility) to evaluate the effectiveness of a data selector.

Paper Thumbnail

Humans or llms as the judge? a study on judgement biases

Guiming Chen*, Shunian Chen*, Ziche Liu, Feng Jiang, Benyou Wang (*=Equal Contribution)

EMNLP 2024

Paper / Data

TL;DR: We investigate judgment biases in human and LLM judges, demonstrating their vulnerability through experiments with a curated dataset based on Bloom's Taxonomy.

Paper Thumbnail

AceGPT, Localizing Large Language Models in Arabic

Huang Huang*, Fei Yu*, Jianqing Zhu*, Xuening Sun, Hao Cheng, Song Dingjie, Zhihong Chen, Mosen Alharthi, Bang An, Juncai He, Ziche Liu, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu (*=Equal Contribution)

NAACL 2024

Paper / Code / Demo

TL;DR: We leverage high-quality SFT data and culturally-aware RLAIF to effectively localize LLM for Arabic, addressing cultural sensitivity and local values.

Education & Honors

2024.08 - now: Visiting Student at UC Berkeley

  • BGA Scholarship (10 among all 600 Berkeley Global Access participants)

 

2022.09 - now: Bachelor at the Chinese University of Hong Kong, Shenzhen

  • Undergraduate Research Awards (22nd, 23rd, 24th)
  • Dean's List, Academic Performance Scholarship (AY2022, AY2023)
  • Undergraduate Student Teaching Fellow for PHY1001: Mechanics (AY2024)

 

Fun

When I'm not coding, you'll probably find me:

  • photographing outdoors (insects are truly tiny wonders!)
  • getting lost in sci-fi (Time Debt is my excuse for pulling all-nighters)
  • locked in super cool modern origami (Hold Infinity in the palm of your hand)