OpenAI has introduced an additional layer of evaluation for its large language models by testing an adversarially fine-tuned version of gpt-oss-120b using its Preparedness Framework, a tool designed to assess and track risky behavior in large language models. This move comes as the company faces criticism over its proprietary approach to software development, with many questioning whether it can maintain its commitment to openness given its significant financial backing.