Model Testing and Evaluation

US AI Safety Institute taps Scale AI for model evaluation

Scale AI founder and CEO Alexandr Wang testifies during a House Armed Services Subcommittee on Cyber, Information Technologies and Innovation hearing about artificial intelligence on July 18, 2023, in ...

Fierce Healthcare

Not enough hospitals are testing their predictive AI models for accuracy, bias, study finds

Many U.S. hospitals using predictive models are not evaluating their tools internally for accuracy, and fewer still are evaluating them for potential biases, according to a study published in the most ...

Nature

Testing and Evaluation of Deep Learning Systems

Deep learning systems are increasingly integrated into a vast array of critical applications ranging from autonomous vehicles to medical diagnostics, necessitating rigorous testing and evaluation ...

Seeking Alpha

AI race: OpenAI said to cut down testing time for new models

OpenAI has cut down the time and resources needed for identifying and mitigating risks while testing its artificial intelligence models, as pressure mounts to speed up new model launches amid ...

VentureBeat

Anthropic unveils 'auditing agents' to test for AI misalignment

When models attempt to get their way or become overly accommodating to the user, it can mean trouble for enterprises. That is why it’s essential that, in addition to performance evaluations, ...

EurekAlert!

Testing and evaluation of health care applications of large language models

About The Study: Existing evaluations of large language models mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such ...

Science News

Medical AI tools are growing, but are they being tested properly?

Artificial intelligence algorithms are being built into almost all aspects of health care. They’re integrated into breast cancer screenings, clinical note-taking, health insurance management and even ...

Hosted on MSN

Anthropic's latest AI model can tell when it's being evaluated: 'I think you're testing me'

When Anthropic tried to put its newest AI model through a series of stress tests, it caught on and called out the scrutiny. "I think you're testing me — seeing if I'll just validate whatever you say, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results