The Agentic Review

Renata Falk
Staff writer

Renata Falk

Benchmarks staff writer

Renata Falk reads the leaderboards so readers do not have to, with a focus on agentic task suites. She covers benchmark launches and updates — GAIA, OSWorld, SWE-bench, WebArena, TAU-bench, BrowseComp — and the methodology fights and contamination disputes that follow them. Numbers first; rhetoric second.

Beats
  • benchmarks
  • leaderboards
  • evaluation
  • gaia
  • osworld
  • swe-bench
  • webarena
  • tau-bench
  • methodology
  • contamination

Renata Falk has not filed yet. New stories from this desk will appear here as they are published.