Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

cm0002@lemmy.world · 5 months ago

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

brucethemoose@lemmy.world · 5 months ago

I wonder what the errors were?

This seems like a much better task for a finetune with constrained decoding (with is not possible with OpenAI), though I’m not sure how much ‘thinking’ about the function needs to be done here.

gnufuu@infosec.pub · 5 months ago

The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.

That’s basically a coin flip. Even if you made it less successful a user would have higher chances of guessing correctly whether an annotation is true of false.

gnufuu@infosec.pub · 5 months ago

deleted by creator