About

Youngjoon Jang

NLP Engineer

About

Hello, I am a Master’s student in NLP&AI Lab. advised by Prof. Heuiseok Lim in the Department of Computer Science and Engineering at Korea University. I received the B.S degree from Hongik University with majors in Mechanical Engineering and Computer Science. I am studying Natural Language Processing, focusing on Information Retrieval (IR) and Retrieval-Augmented Generation (RAG) systems. I aim to make AI models more beneficial to humans and society.

Education

M.S. in Computer Science and Engineering, Korea University
Mar. 2025 - Current

B.S. in Mechanical Engineering and Computer Science, Hongik University
Mar. 2020 - Feb. 2025

Papers

Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging
Youngjoon Jang, Junyoung Son, Taemin Lee, Seongtae Hong, Heuiseok Lim
ArXiv

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems
Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park, Heuiseok Lim
ACL 2025 SRW

Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
Seonmin Koo, Jinsung Kim, Youngjoon Jang, Chanjun Park, Heuiseok Lim
EMNLP 2024

Building Korean Embedding Benchmarks with Large Language Models
Junyoung Son, Youngjoon Jang, Soonwoo Choi, Byeonggoo Lee, Taemin Lee, Heuiseok Lim
Annual Conference on Human & Cognitive Language Technology (HCLT) 2024

KoE5: A New Dataset and Model for Improving Korean Embedding Performance
Youngjoon Jang, Junyoung Son, Chanjun Park, Soonwoo Choi, Byeonggoo Lee, Taemin Lee, Heuiseok Lim
Annual Conference on Human & Cognitive Language Technology (HCLT) 2024

Projects

KULLM DeepResearch Project
Built a Search pipeline in open DeepResearch project.
2025 - Current

KURE Project
Trained a SOTA Korean retrieval embedding model and built a evaluation framework for Korean embedding models.
2024 - Current

KT-Korea University Collaborative Research
Trained a Korean legal domain LLM with Korean legal alignment data.
2024 - 2025

Pre:Ranker Project
Trained a reranker to reduce the scope of available tools based on a given query.
2024 - 2025

URACLE-Korea University Collaborative Research
Trained English-Korean cross-lingual retrieval embedding model. [Paper]
2024 - 2025

Open-source Contributions

sentence-transformers

Contributed to the sentence-transformers library to support ListMLE, PListMLE Loss on Rerankers. [Link]

Massive Text Embedding Benchmark (MTEB)

Contributed to the MTEB library with Korean retrieval evaluation datasets. [Link]
Added long context support for OpenAI embedding models. [Link]
Added support for loading jasper model in bf16 precision. [Link]

FlagEmbedding

Contributed to the FlagEmbedding library fixing a bug related to knowledge distillation when training. [Link]