Youngjoon Jang

M.S. Student at NLP&AI Lab, Korea University

prof_pic.jpg

yjoonjang34@gmail.com

I am a Master’s student in NLP&AI Lab advised by Prof. Heuiseok Lim in the Department of Computer Science and Engineering at Korea University. I received the B.S. degree from Hongik University with majors in Mechanical Engineering and Computer Science.

My research focuses on Information Retrieval (IR) and Retrieval-Augmented Generation (RAG) systems. I aim to make AI models more beneficial to humans and society.

I actively contribute to open-source projects including Sentence-Transformers, MTEB, InstructKR, and FlagEmbedding.

News

Apr 02, 2026 Paper accepted at SIGIR 2026: “Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval”
Mar 02, 2026 Paper accepted at ICLR 2026 Data-FM Workshop: “LEGALMIDM: Use-Case-Driven Legal Domain Specialization for Korean LLM”

Selected Publications

  1. Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, and Heuiseok Lim
    In Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026), 2026
  2. Seongtae Hong, Youngjoon Jang, Jungseob Lee, Hyeonseok Moon, and Heuiseok Lim
    In Proceedings of the International Conference on Learning Representations (ICLR 2026), 2026
  3. Youngjoon Jang, Chanhee Park, Hyeonseok Moon, Young-kyoung Ham, Jiwon Moon, Jinhyeon Kim, JuKyung Jung, and Heuiseok Lim
    In ICLR 2026 Data-FM Workshop, 2026
  4. Seungyoon Lee, Minhyuk Kim, Seongtae Hong, Youngjoon Jang, Dongsuk Oh, and Heuiseok Lim
    In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), 2026
  5. Youngjoon Jang, Junyoung Son, Taemin Lee, Seongtae Hong, Hyeonseok Moon, Andrew Matteson, and Heuiseok Lim
    ArXiv Preprint, 2025
  6. Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park, and Heuiseok Lim
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL 2025 SRW), 2025
  7. Youngjoon Jang, Junyoung Son, Taemin Lee, Seongtae Hong, Jungbae Park, and Heuiseok Lim
    In Annual Conference on Human & Cognitive Language Technology (HCLT 2025), 2025
  8. Seonmin Koo, Jinsung Kim, Youngjoon Jang, Chanjun Park, and Heuiseok Lim
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024

Projects

WBL: World Best LLM Project
Led the data team for large-scale LLM training with query clarity evaluation and response filtering pipeline.

KURE Project
Trained a SOTA Korean retrieval embedding model and built an evaluation framework for Korean embedding models.

Korean ColBERT & Sparse Retrievers
Trained and open-sourced Korean ColBERT and SPLADE variants achieving SOTA on the Korean Retrieval Benchmark.

KT-Korea University Collaborative Research
Trained a Korean legal domain LLM with Korean legal alignment data.

Pre:Ranker Project
Trained a reranker to reduce the scope of available tools based on a given query.

URACLE-Korea University Collaborative Research
Trained English-Korean cross-lingual retrieval embedding model.


Open Source Contributions

Sentence-Transformers

Massive Text Embedding Benchmark (MTEB)

InstructKR

FlagEmbedding