NuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models

Large language models (LLMs) have demonstrated strong performance across a wide range of tasks, but ensuring their reliability in highly technical domains remains a significant challenge. In nuclear engineering, problem solving often requires not only factual knowledge but also quantitative reasonin...

Read Original Article →

Source

http://arxiv.org/abs/2606.27047v1