About
Hello, I am a Master’s student in NLP&AI Lab. advised by Prof. Heuiseok Lim in the Department of Computer Science and Engineering at Korea University. I received the B.S degree from Hongik University with majors in Mechanical Engineering and Computer Science. I am studying Natural Language Processing, focusing on Information Retrieval (IR) and Retrieval-Augmented Generation (RAG) systems. I aim to make AI models more beneficial to humans and society.
Education
M.S. in Computer Science and Engineering, Korea University
Mar. 2025 - Current
B.S. in Mechanical Engineering and Computer Science, Hongik University
Mar. 2020 - Feb. 2025
Papers
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging
Youngjoon Jang, Junyoung Son, Taemin Lee, Seongtae Hong, Heuiseok Lim
ArXiv
From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems
Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park, Heuiseok Lim
ACL 2025 SRW
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
Seonmin Koo, Jinsung Kim, Youngjoon Jang, Chanjun Park, Heuiseok Lim
EMNLP 2024
Building Korean Embedding Benchmarks with Large Language Models
Junyoung Son, Youngjoon Jang, Soonwoo Choi, Byeonggoo Lee, Taemin Lee, Heuiseok Lim
Annual Conference on Human & Cognitive Language Technology (HCLT) 2024
KoE5: A New Dataset and Model for Improving Korean Embedding Performance
Youngjoon Jang, Junyoung Son, Chanjun Park, Soonwoo Choi, Byeonggoo Lee, Taemin Lee, Heuiseok Lim
Annual Conference on Human & Cognitive Language Technology (HCLT) 2024
Projects
KULLM DeepResearch Project
Built a Search pipeline in open DeepResearch project.
2025 - Current
KURE Project
Trained a SOTA Korean retrieval embedding model and built a evaluation framework for Korean embedding models.
2024 - Current
KT-Korea University Collaborative Research
Trained a Korean legal domain LLM with Korean legal alignment data.
2024 - 2025
Pre:Ranker Project
Trained a reranker to reduce the scope of available tools based on a given query.
2024 - 2025
URACLE-Korea University Collaborative Research
Trained English-Korean cross-lingual retrieval embedding model. [Paper]
2024 - 2025
Open-source Contributions
- Contributed to the sentence-transformers library to support ListMLE, PListMLE Loss on Rerankers. [Link]
Massive Text Embedding Benchmark (MTEB)
- Contributed to the MTEB library with Korean retrieval evaluation datasets. [Link]
- Added long context support for OpenAI embedding models. [Link]
- Added support for loading jasper model in bf16 precision. [Link]
- Contributed to the FlagEmbedding library fixing a bug related to knowledge distillation when training. [Link]