Youngjoon Jang
M.S. Student at NLP&AI Lab, Korea University
yjoonjang34@gmail.com
I am a Master’s student in NLP&AI Lab advised by Prof. Heuiseok Lim in the Department of Computer Science and Engineering at Korea University. I received the B.S. degree from Hongik University with majors in Mechanical Engineering and Computer Science.
My research focuses on Information Retrieval (IR) and Retrieval-Augmented Generation (RAG) systems. I aim to make AI models more beneficial to humans and society.
I actively contribute to open-source projects including Sentence-Transformers, MTEB, InstructKR, and FlagEmbedding.
News
| Apr 02, 2026 | Paper accepted at SIGIR 2026: “Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval” |
|---|---|
| Mar 02, 2026 | Paper accepted at ICLR 2026 Data-FM Workshop: “LEGALMIDM: Use-Case-Driven Legal Domain Specialization for Korean LLM” |
Selected Publications
- In Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026), 2026
- In Proceedings of the International Conference on Learning Representations (ICLR 2026), 2026
- In ICLR 2026 Data-FM Workshop, 2026
- In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), 2026
- In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL 2025 SRW), 2025
- In Annual Conference on Human & Cognitive Language Technology (HCLT 2025), 2025
- In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024
Projects
WBL: World Best LLM Project
Led the data team for large-scale LLM training with query clarity evaluation and response filtering pipeline.
KURE Project
Trained a SOTA Korean retrieval embedding model and built an evaluation framework for Korean embedding models.
Korean ColBERT & Sparse Retrievers
Trained and open-sourced Korean ColBERT and SPLADE variants achieving SOTA on the Korean Retrieval Benchmark.
KT-Korea University Collaborative Research
Trained a Korean legal domain LLM with Korean legal alignment data.
Pre:Ranker Project
Trained a reranker to reduce the scope of available tools based on a given query.
URACLE-Korea University Collaborative Research
Trained English-Korean cross-lingual retrieval embedding model.
Open Source Contributions
- Contributed to support ListMLE, PListMLE, RankNet Loss on Rerankers.
- Introduced hardness-weighted contrastive learning for hard negatives.
- Implemented CachedSpladeLoss for memory-efficient SPLADE training.
Massive Text Embedding Benchmark (MTEB)
- Added Korean retrieval evaluation datasets.
- Added long context support for OpenAI embedding models.
- Fixed NaN embeddings for Jasper models (float16 → bfloat16).
- Led the Korean Reranker evaluation and leaderboard project.
- Fixed a bug related to knowledge distillation when training.