Thoughts, insights, and deep dives into AI security, differential privacy, and cutting-edge research
Our red-teaming of 9 frontier models with 320 unique adversarial prompts across multiple attack methods shows text-safe models suffer >75% attack success when harmful content shifts to images or audio.
tl;dr --- A survey of current AI agent safety benchmarks reveals a three-layered risk landscape and shows we're failing at all three layers simultaneously.
Blog #4 in the series of Inception of Differential Privacy
Blog #3 in the series of Inception of Differential Privacy
Blog #1 in the series of Inception of Differential Privacy
Explore how InterrogateLLM addresses AI hallucination in a straightforward manner.