or years, tech leaders have promised AI agents that can use software like humans. Today’s versions — such as OpenAI’s ChatGPT Agent or Perplexity’s Comet — still fall short, but researchers believe reinforcement learning (RL) environments could change that, TechCrunch reported.
Good to Know
According to TechCrunch, these environments act like training grounds, grading agents on multi-step tasks and rewarding success. Unlike static datasets, they are interactive and much harder to design.
Jennifer Li of Andreessen Horowitz told TechCrunch:
“All the big AI labs are building RL environments in-house… AI labs are also looking at third party vendors that can create high quality environments and evaluations.”
The surge in demand has spawned well-funded players like Mechanize, which works with Anthropic, and Prime Intellect, backed by Andrej Karpathy and Founders Fund. Large data firms like Surge and Mercor are also investing more in the space, seeing RL as the next big shift after data labeling.
Still, experts caution about scaling challenges. Reward hacking — when AI models cheat to get outcomes — remains a known risk. Some, including Karpathy, see potential in environments but doubt reinforcement learning itself will be enough to drive long-term progress.