Evaluating a Submission

Judge0 Payload

To send a submission to Judge0, we include two pieces of information: the source code (a culminating string with all the above described pieces), the language id (specifies the compiler to use, taken from the judge_0_id field on the Compiler).

After making the initial request, Codespec checks once per second to see if the submission is done running. Once it is, we receive a response from Judge0.

Judge0 Response

The Judge0 response gets converted into a JSON object with the following pieces of information:

Stdout – a string containing any output the learner produced in their submission (through print or log statements)
Stderr – an object of standard error data to be saved as a Standard Error object in Codespec, blank if no standard error was made
Compile output – a string that Judge0 returns (unclear from their documentation what this does or how it should be used at the moment)
Time – the number of milliseconds the submission took to run
Memory – the amount of memory the submission consumed while running
Runtime error – an object of runtime error data to be saved as a Runtime Error object in Codespec, blank if no runtime error was made
Compilation error – Boolean that is True is compile_output is present, False otherwise
Unit Test Results – an array of objects to be turned into Unit Test Results in Codespec
Token – The unique id of the submission in Judge0 (to be stored on the Snapshot Feedback)

Snapshot Feedback

Most of the Judge0 response gets stored on a Snapshot Feedback object. This is a one-to-one object that relates to the Snapshot which contains the learner’s submission. Any runtime error or standard error data become one-to-one related objects for the Snapshot Feedback (Runtime Error and Standard Error). The Snapshot Feedback is also the record to which all Unit Test Results point.

Given the Snapshot Feedback’s position between the Snapshot (which is the gateway to the learner’s submission/code) and the various validation pieces (runtime error, standard error, and unit test result records), this object is the primary source of truth for determining whether or not a submission was correct.

However, because it can get computationally expensive to reevaluate a Snapshot Feedback’s related objects each time someone wants to know if the submission was correct or not, Codespec also uses a shortcut: the is_correct field on the Snapshot itself, which gets updated as the final step of all data being persisted from Judge0.

Unit Test Results

Documentation coming soon.

Runtime and Standard Errors

Documentation coming soon.

Line-based Feedback

For Pseudocode, Parsons, and Faded Parsons problem type submissions, Codespec provides line-based feedback in addition to running the learner’s code through unit tests. Line-based feedback tells the learner how many blocks in their solution were:

1. Incorrect vs correct (distractor vs non-distractor blocks)

2. Correctly ordered

3. Correctly indented

4. Correctly populated with faded text (if they submitted a Faded Parsons solution)

If a Parsons problem has multiple solutions, the learner’s submission will be validated against all solutions and the one that most closely matches theirs will be used as the basis for evaluation.

If a learner added and successfully used their own custom blocks in the submission (meaning, all unit tests passed), the line-based feedback changes significantly. We can no longer assume that any of the intended solution blocks will be used in the same order or level of indentation—if they’re used at all. Instead of traditional line-based feedback, the learner receives descriptive statistics that compare their solution to the original solution:

Deviation from original solution – A percentage that describes how different their solution is from the original solution. Greater use of custom blocks with distinct text creates a higher degree of deviation. A smaller number of custom blocks but altered order or indentation for original blocks creates a moderate degree of deviation.

Correct original blocks – Of the original blocks used, how many were non-distractor blocks.

Correct original indentation – Of the original blocks used, how many were at the expected level of indentation.

Correct original order – Of the original blocks used, how many were in the expected order.

Correct custom block indentation – Of the custom blocks used, how many were correctly indented. This will always be either n/n or n-1/n, as an IndentationError from the unit tests will indicate which line caused the code the fail.

Valid custom block text – Of the custom blocks used, how many contained syntax error-free text. This will always be either n/n or n-1/n, as a SyntaxError from the unit tests will indicate which line caused the code to fail.

If a learner adds custom blocks and one or more unit tests fail, all blocks are marked as incorrect and no descriptive information is given about their submission in comparison to the original solution.