Note
LLMs exhibit systematic over-correction bias when verifying code. GPT-4o accuracy drops from 52.4% to 11.0% under complex prompts.
Details
- Book/Proceedings
- ASE 2025: 40th IEEE/ACM international conference on automated software engineering
Citation Key
jin2025overcorrection