최신 포스트

[논문리뷰] LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering

본 논문은 대규모 언어 모델(LLM) 에이전트가 복잡한 소프트웨어 개발 작업을 수행할 때 필요한 실세계 역량을 평가하기 위한 포괄적인 벤치마크, LoCoBench-Agent 를 제안합니다.

#Review #LLM Agents #Software Engineering #Long-Context #Interactive Benchmark #Tool Usage #Memory Management #Bias-Free Evaluation #Multi-Turn

2025년 11월 17일

[논문리뷰] Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

이 논문은 기존 LLM 기반 소프트웨어 에이전트가 고정된 설계와 값비싼 오프라인 훈련으로 인해 성능이 최적화되지 못하고 특정 벤치마크에 국한되는 한계를 해결하고자 합니다.

#Review #Software Engineering Agents #LLM Agents #Self-Evolution #On-the-Fly Learning #Tool Creation #SWE-bench #Autonomous Systems #Code Generation

2025년 11월 17일

[논문리뷰] Genomic Next-Token Predictors are In-Context Learners

본 연구는 인컨텍스트 학습(ICL)이 인간 언어에 고유한 현상인지, 아니면 대규모 예측 훈련을 통해 다른 시퀀스 도메인에서도 유기적으로 나타날 수 있는지 근본적인 질문을 탐구합니다. 특히, 풍부한 통계적 구조를 가진 대안적인 상징적 도메인인 유전체 시퀀스 에서 ICL의 출현 가능성을 검증하는 것을 목표로 합니다.

#Review #In-Context Learning (ICL)#Genomic Sequences #Next-Token Prediction #Large Language Models (LLMs)#Modality-Agnostic AI #Meta-Learning #Bitstring Program Synthesis #Evo2

2025년 11월 17일

[논문리뷰] Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

본 논문은 대규모 언어 모델(LLM)이 지식 그래프(KG)에서 예측 가능하고 관련성 높은 답변을 넘어, 예상치 못하고 가치 있는('serendipitous') 통찰력을 발견 하는 능력을 평가하는 것을 목표로 합니다.

#Review #Serendipity Discovery #Knowledge Graphs #Drug Repurposing #LLMs #KGQA #RNS Metric #Biomedical AI

2025년 11월 17일

[논문리뷰] AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing

본 논문은 대규모 언어 모델(LLM)이 겪는 전략적 취약성, 사실적 환각, 맞춤화 부족 문제로 인해 난항을 겪는 목표 지향적 설득형 대화(예: 텔레마케팅) 의 신뢰성을 향상시키는 것을 목표로 합니다. 특히, 기존 LLM의 한계를 극복하고 실제 판매 시나리오에 효과적인 AI 에이전트를 개발하고자 합니다.

#Review #Telemarketing #Large Language Models #Persuasive Dialogue #Reinforcement Learning #Bayesian Optimization #Dynamic Prompting #Dialogue Systems

2025년 11월 17일

[논문리뷰] A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

기존 중앙 집중식 RAG(Retrieval Augmented Generation) 시스템의 높은 데이터 관리 비용과 개인 정보 보호 문제를 해결하고자 합니다.

#Review #Decentralized RAG #Blockchain #Smart Contracts #Source Reliability #Large Language Models #Retrieval Augmented Generation #Trustworthy AI

2025년 11월 17일

[Loki] fsGroupChangePolicy=OnRootMismatch로 Pod 시작 속도 향상

Grafana Loki Helm 차트에서 Pod 시작 시 불필요한 chown 재귀 실행을 방지하기 위해 fsGroupChangePolicy를 OnRootMismatch로 설정하여 Pod 시작 시간을 단축한 최적화를 분석합니다.

#Grafana Loki #Kubernetes #Helm #Performance #Pod Startup

2025년 11월 17일

[Triton] gfx1250에서 async_copy multicast 지원

AMD gfx1250 타겟의 async_copy_global_to_local에 cluster load 기반 multicast를 추가하여 CTA간 데이터 공유 지원

#Triton #AMD #Multicast #Async Copy #gfx1250

2025년 11월 16일

[논문리뷰] miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

본 연구는 AI 시스템이 수학 올림피아드 문제에 참여하는 시나리오에서 miniF2F 벤치마크 의 비공식 및 공식 진술 간의 불일치와 오류를 분석하고 해결하는 것을 목표로 합니다.

#Review #Automated Theorem Proving #Autoformalization #Benchmark Dataset #miniF2F #Lean Language #Large Language Models #Mathematical Reasoning #Formal Verification

2025년 11월 16일

[논문리뷰] Workload Schedulers -- Genesis, Algorithms and Differences

본 논문은 현대의 워크로드 스케줄러를 운영체제 프로세스 스케줄러 , 클러스터 시스템 잡 스케줄러 , 빅 데이터 스케줄러 의 세 가지 범주로 분류하고, 각 클래스의 진화 과정, 사용되는 알고리즘, 주요 특징 및 차이점을 분석하는 것을 목표로 합니다.

#Review #Workload Scheduling #Process Scheduling #Job Scheduling #Big Data Processing #Resource Management #Distributed Systems #Scheduling Algorithms #Performance Optimization

2025년 11월 16일

[논문리뷰] Virtual Width Networks

본 논문은 Transformer 모델의 히든 차원을 늘릴 때 발생하는 Quadratic한 계산 비용 문제를 해결하면서도, 더 넓은 표현(wider representations)이 제공하는 이점을 얻는 것을 목표로 합니다.

#Review #Virtual Width Networks #Transformer #Mixture-of-Experts (MoE)#Scaling Laws #Representation Learning #Model Efficiency #Multi-Token Prediction #Hyper-Connections

2025년 11월 16일

[논문리뷰] UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

본 논문은 UI(사용자 인터페이스) 코딩에서 기존 시각 언어 모델(VLM) 의 제한적인 멀티모달 코딩 능력과 단일 턴 생성 패러다임의 한계를 극복하고자 합니다.

#Review #Visual Language Model #UI-to-Code Generation #Interactive UI #UI Editing #UI Polishing #Reinforcement Learning #Multimodal Coding #Test-Time Scaling

2025년 11월 16일

[논문리뷰] Simulating the Visual World with Artificial Intelligence: A Roadmap

본 논문은 비디오 생성 모델이 포괄적인 물리적 세계 모델(Physical World Model) 로 진화하는 과정을 체계적으로 조망하고 로드맵을 제시하는 것을 목표로 합니다.

#Review #World Models #Video Generation #AI Simulation #Generative AI #Physical Plausibility #Interactive AI #Planning #Roadmap

2025년 11월 16일

[논문리뷰] MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

대규모 언어 모델(LLMs) 기반 멀티 에이전트 추론 시스템이 보상 잡음(reward noise) 과 훈련 비효율성 으로 인해 오픈 소스 모델에 일반화되기 어려운 문제를 해결하는 것이 목표입니다.

#Review #Multi-Agent Systems #Reinforcement Learning #LLMs #Pipeline Parallelism #Reasoning #Reward Shaping #Agentic AI

2025년 11월 16일

[논문리뷰] LiteAttention: A Temporal Sparse Attention for Diffusion Transformers

본 논문은 비디오 생성 Diffusion Transformers (DiT)의 Quadratic attention complexity 로 인한 과도한 지연 시간 문제를 해결하고자 합니다.

#Review #Diffusion Transformers #Sparse Attention #Temporal Coherence #Video Generation #Computational Efficiency #FlashAttention #CUDA Kernels

2025년 11월 16일

[논문리뷰] Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

본 설문조사는 대규모 언어 모델(LLM) 을 활용한 과학적 아이디어 생성의 고유한 도전을 다루며, 특히 창의성과 과학적 타당성 사이의 균형을 맞추는 방법을 탐구합니다.

#Review #Large Language Models #Scientific Discovery #Idea Generation #Creativity #Survey #AI in Science #Prompt Engineering #Multi-agent Systems #Evaluation Metrics

2025년 11월 16일

[논문리뷰] HI-TransPA: Hearing Impairments Translation Personal Assistant

본 논문은 청각 장애인이 일상적인 의사소통에서 겪는 어려움, 특히 불분명한 발화로 인한 문제를 해결하고자 합니다.

#Review #Multimodal AI #Hearing Impairment #Audio-Visual Speech Recognition #Curriculum Learning #Omni-Models #Assistive Technology #Lip Reading #Speech Translation

2025년 11월 16일

[논문리뷰] GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

본 논문은 통합 멀티모달 모델(UMMs)의 생성적 추론 능력 을 평가하기 위한 벤치마크 개발을 목표로 합니다. 기존 벤치마크들이 판별적 이해 또는 제약 없는 생성만을 평가하는 한계를 극복하고, 언어 이해와 정밀한 시각 생성을 융합하는 기하학적 생성적 추론 을 종합적으로 측정하고자 합니다.

#Review #Multimodal AI #Generative Reasoning #Geometric Construction #Benchmark #GeoGebra #Code-based Evaluation #Unified Models

2025년 11월 16일

[논문리뷰] From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

본 연구는 도구 증강 언어 모델(TaLMs) 이 외부 도구를 사용할 때 발생하는 추론 환각(reasoning hallucinations) 의 새로운 유형인 Tool-Induced Myopia (TIM) 를 식별하고 특성화하는 것을 목표로 합니다.

#Review #Tool-augmented LLMs #Reasoning Hallucinations #Tool-Induced Myopia (TIM)#Code Interpreter #Mathematical Reasoning #LLM Evaluation #Preference Optimization

2025년 11월 16일

[논문리뷰] Experience-Guided Adaptation of Inference-Time Reasoning Strategies

본 논문은 에이전트형 AI 시스템이 훈련 후 추론 시 상호작용을 기반으로 문제 해결 방식을 적응시키는 근본적인 과제를 해결하고자 합니다.

#Review #Adaptive AI #Inference-Time Adaptation #Reasoning Strategies #Meta-Learning #LLM-based Agents #Dynamic Strategy Generation #Continual Learning #Computational Efficiency

2025년 11월 16일