SEM: One or more comma‐separated values (e.g. 1.2, 1.4).
BLM: Either a single value (e.g. 1.0) or exactly as many values as SEM (e.g. 1.0, 1.0).
PIR: If provided directly, only one value is allowed (e.g. 0.95).
SDF: Must be exactly one value (e.g. 0.125).
Efficiency Rank
Method
Performance Improvement Ratio (PIR)
Selected Dataset Fraction (SDF)
Actions
Feasibility Indicators
Input Rules:
# Trained LLMs: How many LLMs are trained during construction of your Data Selector.
Steps: How many algorithm steps do you have for a data selection process. For counting method, please refer to paper's appendix A.5. (Only the number before the bracket is used for ranking.)
Rep: Whether your work has open-source codes. Select true or false.
Model Free: If the selector model is changed, will the performance be influenced?
Dataset Free: If the candidate dataset is changed, do you need to re-train the selector?
ChatGPT Free: Does your selection rely on ChatGPT/GPT-4 like proprietary models?
Ranking Algorithm:
Feasibility Rank is based on ranking of each work's Simplicity Rank + Flexibility Rank.
Simplicity Rank is based on ranking of each work's Simplicity Penalty = 2*LLMCount + 1*Steps + 1*NotReproducible.