#Bias Mitigation

7개의 포스트

[논문리뷰] SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

Elisa Ricci이 arXiv에 게시한 'SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language Models #CLIP #Debiasing #Sparse Autoencoder #Post-Hoc #Zero-Shot #Feature Disentanglement #Bias Mitigation

2026년 3월 23일

[논문리뷰] Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

arXiv에 게시된 'Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models' 논문에 대한 자세한 리뷰입니다.

#Review #Text-to-Image #LVLM #Social Bias #System Prompts #Bias Mitigation #Meta-Prompting #Fairness #Generative AI

2025년 12월 4일

[논문리뷰] Benchmark Designers Should 'Train on the Test Set' to Expose Exploitable Non-Visual Shortcuts

arXiv에 게시된 'Benchmark Designers Should 'Train on the Test Set' to Expose Exploitable Non-Visual Shortcuts' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal LLMs #Benchmark Design #Non-Visual Shortcuts #Test-Set Stress-Test #Bias Mitigation #Model Evaluation #Benchmark Robustness

2025년 11월 9일

[논문리뷰] SteeringControl: Holistic Evaluation of Alignment Steering in LLMs

Zhun Wang이 arXiv에 게시한 'SteeringControl: Holistic Evaluation of Alignment Steering in LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Alignment #Representation Steering #Benchmark #Behavioral Entanglement #Bias Mitigation #Harmful Generation #Hallucination Control #Modular Framework

2025년 9월 18일

[논문리뷰] AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

Rahul Karthikeyan이 arXiv에 게시한 'AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Bias Mitigation #Large Language Models #Speculative Decoding #Constitutional AI #Fairness #Inference-Time Control #Indian Sociocultural Context

2025년 9월 3일

[논문리뷰] CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

Adriano Koshiyama이 arXiv에 게시한 'CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection' 논문에 대한 자세한 리뷰입니다.

#Review #Sparse Autoencoders #LLM Steering #Feature Selection #Correlation Analysis #AI Safety #Bias Mitigation #Mechanistic Interpretability

2025년 8월 20일

[논문리뷰] BiasGym: Fantastic Biases and How to Find (and Remove) Them

Arnav Arora이 arXiv에 게시한 'BiasGym: Fantastic Biases and How to Find (and Remove) Them' 논문에 대한 자세한 리뷰입니다.

#Review #Bias Mitigation #LLMs #Mechanistic Interpretability #Fine-tuning #Attention Steering #Stereotype Analysis #Safety Alignment

2025년 8월 13일