Review

[논문리뷰] Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills

본 연구는 LLM 기반 소프트웨어 엔지니어링 에이전트가 고품질 태스크 데이터 부족으로 인해 학습 및 일반화 성능이 제한되는 문제를 해결하고자 합니다. 기존 합성 데이터 생성 방식은 고정된 규칙이나 무작위 버그 주입에 의존하여 에이전트의 실제 취약점이나 학습 진행 상황을 반영하지 못한다는 한계가 있습니다.

#Review #Software Engineering #Large Language Models #Reinforcement Learning #Self-Evolution #Agent Skills #Trace-Driven Learning #Code Repair

2026년 6월 7일

[논문리뷰] SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

본 논문은 실시간으로 변화하는 복잡한 갈등 상황에서 LLM 기반 중재자를 안정적으로 평가할 수 있는 체계적인 방법론이 부재하다는 문제에서 출발한다. 기존 연구들은 몇몇 제한된 도메인에 의존하거나, 중재자의 성능을 전체 대화 맥락에서 평가함으로써 관련 없는 대화 내용에 의한 노이즈를 발생시킨다는 한계가 있다.

#Review #LLM Mediation #Automated Evaluation #Socio-cognitive Adaptation #Agentic Pipeline #Topic-localized Evaluation

2026년 6월 7일

[논문리뷰] SPACENUM: Revisiting Spatial Numerical Understanding in VLMs

본 논문은 VLM이 embodied 환경에서 생성하는 수치적 출력값(예: action magnitude, spatial coordinate)이 실제 공간 정보에 기반하고 있는지에 대해 의문을 제기합니다.

#Review #Vision-Language Models #Spatial Numerical Understanding #Spatial Exploration #Spatial Reasoning #Metric Grounding #Num2Space #Space2Num

2026년 6월 7일

[논문리뷰] SIA: Self Improving AI with Harness & Weight Updates

본 논문은 기존 AI 자기 개선 연구가 Harness(scaffold) 개선과 Test-time training(weight updates)이라는 두 가지 고립된 사일로(silo)로 나뉘어 있는 한계를 해결하고자 한다 .

#Review #Self-Improving Agents #Test-Time Training #Reinforcement Learning #Harness Engineering #Scaffold Generation #LoRA

2026년 6월 7일

[논문리뷰] Robots Need More than VLA and World Models

본 논문은 현재 로봇 학습 분야가 VLA 모델의 스케일링에만 지나치게 의존하고 있으며, 이것만으로는 일반적인 로봇 지능(Generalist robot intelligence)을 달성할 수 없다고 지적한다.

#Review #Robotics #Vision-Language-Action Models #Physical Intelligence #Embodied AI #Grounding #Robot Learning #Data Engines

2026년 6월 7일

[논문리뷰] Reinforcement Learning from Rich Feedback with Distributional DAgger

본 연구는 기존의 RLVR 패러다임이 가진 극심한 희소 보상 문제와 그에 따른 부적절한 신용 할당 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Rich Feedback #Self-Distillation #DAgger #Policy Optimization #Credit Assignment

2026년 6월 7일

[논문리뷰] Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

본 논문은 현대의 Image-to-Video(I2V) 생성 모델이 뛰어난 시각적 품질에도 불구하고 왜 기초적인 물리 법칙을 자주 위반하는가라는 핵심 문제를 해결하고자 합니다.

#Review #Video Generation #Diffusion Models #Physical Consistency #Phase Erosion #Latent Delta Guidance #Spectral Analysis #Training-Free

2026년 6월 7일

[논문리뷰] Parametric Social Identity Injection and Diversification in Public Opinion Simulation

본 논문은 기존의 LLM 기반 공공 의견 시뮬레이션 방식이 가진 심각한 다양성 결여 문제를 해결하고자 한다. 저자들은 기존의 프롬프트 기반 페르소나 방식이 실제 인간의 응답 분포를 모사하지 못하고, 계층적 정보 전달 과정에서 정체성 정보가 유실되는 Diversity Collapse 현상을 발견하였다 .

#Review #Agent-based Modeling #Public Opinion Simulation #Social Diversity #Large Language Models #Hidden State Manipulation

2026년 6월 7일

[논문리뷰] PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

본 논문은 기존의 논문 추천 시스템이 대부분 고정된 후보군을 대상으로 하는 Static Ranking 문제로 프레임워크를 제한하고 있다는 한계를 지적합니다 .

#Review #Scientific Paper Recommendation #User Profiling #Interest Drift #Longitudinal Benchmark #Multi-signal Aggregation #LLM-based Recommendation

2026년 6월 7일

[논문리뷰] OpenSkill: Open-World Self-Evolution for LLM Agents

본 논문은 LLM 에이전트가 배포 후 외부의 정답이나 지도 없이 스스로 학습할 수 있는 'Open-World Self-Evolution' 환경에서의 불확실성을 해결하고자 합니다 .

#Review #Open-World Self-Evolution #LLM Agents #Supervision-Free #Skill Evolution #Virtual Verifier #Knowledge Acquisition #Model Transferability

2026년 6월 7일

[논문리뷰] Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms

본 논문은 딥러닝 모델의 견고성 평가가 특정 공격(Attack-dependent)에 과도하게 의존하고 있으며, 이론적 근거가 부족하다는 점을 해결하고자 한다. 기존의 Lipschitz constant나 CLEVER score와 같은 지표들은 확장성(Scalability)이 낮거나 확률적 해석력이 부족하다는 한계가 있다.

#Review #Model Robustness #Fisher Information Matrix #Spectral Norm #Adversarial Vulnerability #Interpretability #Deep Learning

2026년 6월 7일

[논문리뷰] MMAE: A Massive Multitask Audio Editing Benchmark

본 연구는 instruction-based audio editing 분야의 급격한 발전에도 불구하고, 이를 체계적으로 평가할 수 있는 통합적인 인프라가 부재하다는 문제점을 해결하고자 합니다.

#Review #Audio Editing #Benchmark #Multitask Learning #Rubric-based Evaluation #Instruction Following #Consistency

2026년 6월 7일

[논문리뷰] LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

본 논문은 에이전트 시스템에서 도구 호출과 계획 수립처럼 서로 다른 복잡도를 가진 작업이 수행됨에도 불구하고, 모든 단계에 동일한 연산량을 투입하는 비효율성을 해결하고자 합니다. 기존 LLM 추론 시스템은 고정된 transformer 레이어 구조를 사용하여 모든 토큰에 대해 동일한 컴퓨팅 비용을 소모합니다.

#Review #Layer Skipping #Agentic LLM #LoRA #Adaptive Inference #Straight-Through Estimator #Model Efficiency

2026년 6월 7일

[논문리뷰] LLM Explainability with Counterfactual Chains and Causal Graphs

본 논문은 LLM의 추론 과정이 불투명하여 고위험 영역에서의 신뢰성 확보가 어렵다는 문제를 해결하고자 합니다. 기존의 어텐션 분석이나 특징 기여도(feature attribution) 방식은 본질적으로 상관관계에 기반하고 있어, LLM의 복잡한 추론 메커니즘을 명확하게 설명하는 데 한계가 있습니다.

#Review #LLM Explainability #Causal Graphs #Counterfactual Chains #Concept Discovery #MCMC #Predictive Fidelity

2026년 6월 7일

[논문리뷰] LIMMT: Less is More for Motion Tracking

본 논문은 휴머노이드 모션 트래킹 학습에서 무분별한 데이터 확장(Data Scaling)이 오히려 성능 저하를 초래한다는 문제점을 지적합니다.

#Review #Motion Tracking #Humanoid Robot #Data-Centric AI #Physics-based Simulation #Imitation Learning #Data Curation

2026년 6월 7일

[논문리뷰] How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

본 논문은 chord-symbol 시계열 데이터가 실제 음악 장르의 정체성을 얼마나 담아낼 수 있는지, 그 표현력의 한계는 어디인지를 규명하는 것을 목적으로 한다.

#Review #Chord-symbol modeling #Genre identity #PEFT #LoRA #Music Transformer #Representation boundary

2026년 6월 7일

[논문리뷰] HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

본 논문은 LLM agent 시스템의 Meta-adaptation을 수행할 때 발생하는 '실행 호환성(Executable Compatibility) 결여' 문제를 해결합니다.

#Review #LLM Agents #Meta-Adaptation #Harness-Policy Co-evolution #Agent System Design #Reasoning Policy Alignment

2026년 6월 7일

[논문리뷰] GENEB: Why Genomic Models Are Hard to Compare

본 논문은 현재 유전체 머신러닝 분야가 파편화된 벤치마크와 상호 호환되지 않는 평가 프로토콜로 인해 모델 간의 정당한 비교가 불가능한 문제에 직면해 있다고 지적한다 .

#Review #Genomic Foundation Models #Benchmark #Probing #Cross-Model Evaluation #Architecture #Pretraining #Genomics

2026년 6월 7일

[논문리뷰] Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

본 논문은 확산 모델(Diffusion Model) 학습 시 확신에 기반한 그래디언트 가중치 부여가 모델의 오류를 증폭시킬 수 있다는 기존의 고정관념을 반박하고, 이를 통해 구조적 이점을 얻을 수 있음을 입증합니다.

#Review #Diffusion Models #Belief Space #Music Generation #LoRA #Implicit Curriculum #Entropy #Log-Barrier

2026년 6월 7일

[논문리뷰] Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

본 연구는 기존의 Object insertion 기술이 2D image plane에 국한되어 있어, 사용자가 원하는 물체의 3D pose를 정밀하게 제어하지 못하는 한계를 해결하고자 합니다.

#Review #Object Insertion #Pose-Controllable #Decomposed Visual Proxies #3D-Aware #Diffusion Model #Image Synthesis

2026년 6월 7일