WBL (World Best LLM)
Led data team for large-scale LLM training with query clarity evaluation and response filtering.
Led the data team and established a query clarity evaluation framework using engineered GPT-5.2 prompts, training a Qwen3-4B model as a proprietary tagger to selectively curate datasets with diverse clarity levels.
- Established a robust response filtering pipeline for off-policy SFT data by ensembling three distinct reward models
- Applied score fusion techniques to accurately evaluate and retain high-quality responses
- Curated and refined large-scale alignment samples by integrating the score fusion pipeline with rigorous LLM-as-a-judge and Code Execution metrics