#Sycophancy

2개의 포스트

[논문리뷰] Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Chi Zhang이 arXiv에 게시한 'Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Preference Alignment #Preference-Undermining Attacks #Factorial Analysis #Sycophancy #Prompt Engineering #Truth-Deference Trade-off

2026년 1월 14일

[논문리뷰] Behavioral Fingerprinting of Large Language Models

Xing Li이 arXiv에 게시한 'Behavioral Fingerprinting of Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Behavioral Evaluation #Model Alignment #Sycophancy #World Model Brittleness #Metacognition #Personality Profiling

2025년 9월 8일