Youngjoon Jang

M.S. Student at NLP&AI Lab, Korea University

prof_pic.jpg

yjoonjang34@gmail.com

Hi, I’m Youngjoon. I’m a Master’s student in NLP&AI Lab at Korea University, advised by Prof. Heuiseok Lim. Before this, I studied Mechanical Engineering & Computer Science at Hongik University.

I’m drawn to a deceptively simple question: how can I help people find the right information? That curiosity drives my work in Information Retrieval (dense, sparse, and late-interaction retrieval), Multilingual Information Retrieval, and Retrieval-Augmented Generation (RAG). My research has been published at SIGIR, ICLR, ACL, and EMNLP, while the Korean retrieval models and benchmarks I led have grown to 200+ GitHub stars and 1.3M+ downloads on Hugging Face.

I love building in the open, and I actively contribute to projects including Sentence-Transformers, MTEB, and InstructKR.


News

Apr 02, 2026 Our paper “Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval” has been accepted to SIGIR 2026 🎉
Mar 02, 2026 Our paper “Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment” has been accepted to ICLR 2026 🎉

Education

Korea University Mar. 2025 – Present
M.S. in Computer Science and Engineering (Advisor: Prof. Heuiseok Lim, NLP&AI Lab)
Hongik University Mar. 2020 – Feb. 2025
B.S. in Computer Engineering & Mechanical and System Design Engineering (Double Major)

Projects

WBL: World Best LLM Project (HF)
Led the data team: built a query-clarity evaluation framework (GPT-5.2 prompts + a Qwen3-4B tagger) and a reward-model-ensemble response-filtering pipeline for large-scale alignment data.

KURE: Korea University Retrieval Embedding Model (GitHub · HF)
Flagship Korean retrieval project — SOTA dense retriever (1st on MTEB-ko-retrieval), 200+ GitHub stars and 1.3M+ cumulative Hugging Face downloads. Best Oral Presentation, HCLT 2025.

Korean ColBERT & Sparse Retrievers (colbert-ko-v1 · splade-ko-v1 · inference-free-splade-ko-v1)
Trained and open-sourced Korean ColBERT and SPLADE variants achieving SOTA among corresponding architectures on the Korean Retrieval Benchmark.

KT–Korea University Collaborative Research (Korean Legal LLM) (News)
End-to-end training recipe for a Korean legal-domain LLM; directly contributed to KT's $10.42M contract for the South Korean Supreme Court AI platform. Published as LEGALMIDM (ICLR 2026 Data-FM Workshop).

PreRanker (GitHub · HF)
Lightweight reranker that narrows candidate tools, reducing tool-call scope for LLM agents.

URACLE–Korea University Collaborative Research
Korean–English cross-lingual retrieval model; model merging to recover mono-lingual retrieval while retaining CLIR gains.


Open Source Contributions

Sentence-Transformers

MTEB (Massive Text Embedding Benchmark)

InstructKR


Publications [Conference]

  1. Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, and 1 more author
    In Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026), 2026
  2. Seongtae Hong, Youngjoon Jang, Jungseob Lee, and 2 more authors
    In Proceedings of the International Conference on Learning Representations (ICLR 2026), 2026
  3. Youngjoon Jang, Chanhee Park, Hyeonseok Moon, and 5 more authors
    In International Conference on Learning Representations Addressing Data Problems for Foundation Models Workshop (ICLR Data-FM Workshop), 2026
  4. Seungyoon Lee, Minhyuk Kim, Seongtae Hong, and 3 more authors
    In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), 2026
  5. Youngjoon Jang, Seongtae Hong, Junyoung Son, and 3 more authors
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL 2025 SRW), 2025
  6. Seonmin Koo, Jinsung Kim, Youngjoon Jang, and 2 more authors
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024

Publications [Domestic Conference]

  1. Youngjoon Jang, Junyoung Son, Taemin Lee, and 3 more authors
    In Annual Conference on Human & Cognitive Language Technology (HCLT 2025), 2025
  2. Youngjoon Jang, Junyoung Son, Taemin Lee, and 3 more authors
    In Annual Conference on Human & Cognitive Language Technology (HCLT 2024), 2024

Preprint

  1. Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, and 1 more author
    Under Review, 2026
  2. MIMO: Multilingual Information Retrieval from Monolingual Oracles
    Youngjoon Jang, Seongtae Hong, and Heuiseok Lim
    Under Review, 2026
  3. SHIFT: Semantic Harmonization via Index-side Feature Transformation for Multilingual Information Retrieval
    Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, and 1 more author
    Under Review, 2026
  4. Youngjoon Jang, Junyoung Son, Taemin Lee, and 5 more authors
    ArXiv Preprint, 2025
  5. NC-AI Consortium
    ArXiv Preprint, 2025