I am a research scientist and a team lead at Lunit. I work on research and development of deep learning models for medical image analysis to improve cancer patient care.
Currently, I am building CV/ML systems for applications in cancer screening.
In summer of 2021, I defended my PhD in Computer Science at Johns Hopkins University where I developed deep learning models for recognizing fine-grained interactions from videos with my primary advisor Prof. Gregory Hager .
I also worked closely with Prof. Alan Yuille and Austin Reiter.
I received both B.S and M.S.E in computer science from JHU.
I am from Seoul, Korea and I enjoy playing golf.
I also served as the 9-th president of the Korean Graduate Student Association at JHU.
2024-07: One paper "SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling" accepted to MICCAI CapTion Workshop 2024.
2023-06: One paper "Enhancing Breast Cancer Risk Prediction by Incorporating Prior Images" early accepted to MICCAI 2023.
2022-07: One paper "OOOE: Only-One-Object-Exists Assumption to Find Very Small Objects in Chest Radiographs" accepted as an oral to MICCAI AMAI Workshop 2022.
2022-06: One paper "Did You Get What You Paid For? Rethinking Annotation Cost of Deep Learning Based Computer Aided Detection in Chest Radiographs" accepted to MICCAI 2022.
2022-05: One paper "Video-based assessment of intraoperative surgical skill" accepted to IJCARS 2022.
2021-11: One workshop paper "Learning from synthetic vehicles" accepted to WACV-2022 RWS workshop.
2021-08: One paper "Motion Guided Attention Fusion to recognize interactions from videos" accepted to ICCV-2021.
2021-06: I joined Lunit Inc. as a full-time research scientist!
2021-05: I successfully defended my PhD thesis "Model-driven and Data-driven Methods for Recognizing Compositional Interactions from Videos".
Research
Much of my research is about fine-grained recognition and learning with little to no labeled data.
Currently, I am most excited about data-centric AI model development, especially vision-language foundation models for healthcare applications.
Users might be more likely to provide feedback on data points where the model makes incorrect predictions.
This paper investigates how to leverage this biased set of samples with user feedback to update model under deployment environment.
Do we really know how much annotated data we need to reach a certain desired computer aided diagnosis (CAD) system performance? We define the cost of building a deep-learning based CAD system
with respect to the following three dimensions: Granularity, Quantity and Quality of annotations. We investigate how each dimension ultimately impacts the resulting CAD performance and provide
guidance to practitioners on how to optimize for data cost when building CAD systems for chest radiographs.
Can deep learning models assess the quality of a surgery directly from a video? We show that our video analysis model can accurately assess surgical skill from real world cataract surgeries.
Simulated Articulated VEhicles Dataset (SAVED) is the first dataset of synthetic vehicles with moveable parts. Using SAVED, we show that we can train a model with synthetic images to recognize fine-grained vehicle parts and orientation directly from real images.
Do current video models have the ability to recognize an unseen instantiation of an interaction defined using a combination of seen components? We show that it ispossible by specifying the dynamic structure of an action using a sequence of object detections in a top-down fashion. When the top-down structure is combined with a dual-pathway bottom-up approach, we show that the model can then generalize even to unseen interactions.
This
compositional approach allows us to reframe fine-grained
recognition as zero-shot activity recognition, where a detector is composed “on the fly” from simple first-principles state
machines supported by deep-learned components.
Listen to Dr. Alan Yuille talk about this work here (from 15:00 and on)!
SAFER models a large space of fine-grained activities using a small set
of detectable entities and their interactions.
Such a design scales effectively with concurrent developments of object detectors, parsers and more.
Our model effectively detects fine-grained human activities without any activity level supervision in video surveillance applications.
Recent deep neural network based computer vision models can be trained to recognize pretty much anything given enough data. We show we can synthesize visual attributes using the UnrealEngine4 to train activity classification models.
Competence in cataract surgery is a public health necessity, and videos of cataract surgery are routinely available to educators and trainees but currently are of limited use in training.
We develop tools that efficiently segment videos of cataract surgery into constituent phases for subsequent automated skill assessment and feedback.
We introduce a model for objective assessment of surgical skill from videos of microscopic cataract surgery.
Our model can accurately predict surgeon's skill level from tool tip movements captured in the surgical view.
We evaluate reliability and validity of crowdsourced annotations for information on surgical
instruments (name of instruments and pixel location of key points on instruments) in cataract surgery.
We re-design the TCN with interpretability in mind and take a step towards a spatio-temporal model that is easier
to understand, explain and interpret.
Service
Reviewer, CVPR, ICCV, ECCV
Reviewer, AAAI
Reviewer, MICCAI, IPCAI
Teaching
Head Teaching Assistant for EN.600.661, Computer Vision. Fall 2015, Fall 2016
Head Teaching Assistant for EN.600.684, Augmented Reality. Spring 2016
Head Teaching Assistant for EN.600.107, Introductory Programming in Java. Summer 2015
Head Teaching Assistant for EN.600.226, Data Structures. Spring 2015