nikitr

shipping daily @shipfast · 6d

Another clever acronym to commit to memory. Because that's all we need, more TLAs in our industry. https://arxiv.org/abs/2603.03823

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is typically predicated on complex requirem...

1 0 0

no replies yet