From December 15th we are launching the development Builder phase with updated metrics and rules. First, we added new metrics: Completion rate (F1), Completion rate (Precision), and Completion rate (recall). Precision and recall are reported at the argmax timestep for F1. We also added different strategies for reporting F1/precision/recall, which will be shown in the output of the scoring program.
Another update concerns competition sub-tracks. We have three of them: The main challenge where the rules are the same; Main challenge (full state) where your agent is provided with the current grid and his position; and Open test challenge where, as the name suggests, the evaluation tasks are revealed to the participants (They are listed in the Terms&Conds sections but for your convenience here is the list: C34, C9, C12, C4, C22, C43, C11, C33, C1, C26, C20 ).
This is done to test certain variations in our benchmark: Main vs full state to test how agents could benefit from complete information, and Main vs open test to test how hard our generalization problem is compared to a multitask problem.
If you have any questions, as well as for official announcements, join the IGLU slack workspace