Testing Processing Code/Program

edited January 2017 in Using Processing

Is there a way to test if two processing programs are functionally identical?

Background: I am teaching a course teaching Processing course and was looking for an automated way of scoring the student's projects. I am looking for a way to automatically compare the students submitted code to my correct solutions.

Is there a way to make a script that checks if the students program contains specified elements? Also, is there a way to simulate input and automatically check program output state.

Currently I am manually running each students submission (one at a time) and reviewing the source code to test/score their submissions. I would greatly appreciate any advice on how to improve and streamline my grading pipeline.


  • If you want to test if they are completely identical, you can copy your code and their code into documents, and load them into strings (you can iterate through these strings and remove spaces). Then, you can test to see if the strings are identical. For those that are not, you can write some code to tell you where their code is different from yours. Let me know if this helps.

  • Many years ago I created an application to grade simple procedural C++ programs (programming 101 level). Although it worked I didn't use it as a full replacement for teacher assessment rather I used it to identify areas or problems in the student source code.

    It worked like this, the tutor created an exemplar solution to the assignment. The software would analyse this to measure several quality indicators. Examples include

    1) The number of functions used.
    2) The number of variables used
    3) Average length of variable / function identifiers 4) Number of comments (% of source code)
    5) Number of program statements
    6) can't remember the rest :)

    The tutor used this information to generate a marking scheme. For instance if the average length of variable / function identifiers is 5.7 then the tutor would award 100% for identifier lengths in the range 4.2 to 7.0 and zero for length <3.5 and >7.5 The ranges 3.5 to 4.2 and 7.5 to 7.0 would be a linear interpolation 0-100%

    The first line of a students program was a comment with their name and student ID. Each student submission was analysed in the same way as the exemplar and marks awarded according to the marking scheme. As I said it, it was a useful tool to give an overview of the source code but I still tweaked the marks after visual examination of their code.

  • For a functional comparison of the graphical output of a sketch I suggest hiring a student.

  • // number of comments
    // is a metric
    // that is very 
    // easily
    // abused
  • edited January 2017

    @strauss --

    I've looked at some approaches to this in the related problem of personalized assignments: for a given exercise every student is automatically generated a different, related goal, and there is an auto-checker to help them check their own answers (program output) before turning it in. The problem / limitation is that the checkers are domain-specific, so I would need to know a lot more about what kind of programs you are assigning. Many of my examples had visual output and computed the differences between two images.

    This isn't a direct substitute for inspection of and personalized feedback on code -- instead, it was developed as a way of focusing students on concepts rather than giving each the same exercise with the same answer. However, the checkers could be batch-run to give initial indications of meeting requirements and correctness.

    Here is an example:

    1. Exercise1: The exercise is to understand coordinates and the use of rect() for rectangles. Student must draw 3 different RGB rectangles on a 5*5 grid. Each student receives a different image to create.
    2. Generator1.pde: This meta-program is a parameterized version of the correct assignment -- it takes a class list of student names/ids, uses each for randomSeed(), and then generates a target output image of 3 personalized "random" rectangles.
    3. Template1.pde: a partial sketch with some common standardized variable names / empty functions / etc. is given to the students as a starting point. In this case the template already contains standard code for running once and saving an output image on run -- students will focus on the draw loop, and not on learning how to save an image file.
    4. Optional code checker -- this reads the student sketch in as a string and checks for a list of required and forbidden terms -- e.g. in this exercise rect() is required and image() is forbidden. You could make this available to students to self-check their own work.
    5. Optional output checker. In the rectangle assignment it loads two image files (the student target image and the output image from their code) and reports on how pixel-identical they are. A correct assignment should be identical. This is domain-specific -- e.g. an image output comparator, a JSON or text file comparator, etc. You could make this checker available to students to self-check their own work.
    6. If given self-checking, students may be asked to report on their check results when the submit.
    7. The "pipeline" is to run the code and output checkers on a directory of sketches and get a report that a) confirms each sketch contains correct keywords and b) rates the difference between the required output and the generated output.
  • For really simple questions, a process as detailed by either @jeremydouglass or @quark will be more than enough. We need to know which level you're gonna teach to provide more specific answers.

    I suggest that you make sure that your students don't know what method you use for grading them, because, as @koogs illustrates, it will be incredibly easy for students to fool automated systems once they know about it.

  • For some mouse/keyboard based question, you can use Robot API to check automatically if the program works or not, but the fact remains that working/not working cannot be the only criteria that is used for grading programs.

  • edited January 2017

    @strauss --

    One of the nice things about the case of visual output-oriented assignments is that it is generally very hard work (especially for a novice programmer) to fake a correct program without loading the previously provided correct output image as data from a url or file -- and it is trivial to check if a program loads image data. Turning a fake solution (e.g. something that tried to hide a call to loading an image, rather than rendering the solution) would also be egregious cheating -- high risk of getting caught and no plausible deniability -- which most students avoid. Also, consulting other students answers (or solutions from past quarters) can be encouraged, as the concepts are there but they won't satisfy one students specific requirements, which are unique. Cheating is less tempting if it is significantly more work than completing the assignment!

    I tend to take a slightly different approach from the point @Lord_of_the_Galaxy made about @koogs example of fake comments, and the lesson to keep tests secret -- instead I think the idea of an output test as a requirement is an opportunity to introduce the Test Driven Development paradigm (even if intro coders or art coders don't call it such and won't become computer scientists or software engineers). In that scenario, the checker is a requirement / a test / a contract -- it doesn't need to be a secret, as it is part of the assignment itself. For attempting to measure code qualitatively (e.g. how many 'good' comments) automated checking works really poorly. But for measures of what code does (what tests it passes) it works extremely well, and the tests can (and should) be public! If your code passes the test, you (probably / usually) completed the assignment.

    You may care about students learning to code in a pleasing, idiomatic form (comments, camelCase, line breaks, capitalizing CONSTANTS, etc.). This isn't as high a priority for me as concepts, but if you want to make it a part of grading criteria and are looking to do first-pass automated checking then I would suggest requiring them to use a linter program. Tell them that programs with high lint warning counts may be subject to losing points. They can work in the main Processing PDE and do their lint checking elsewhere -- for example, see Atom (free cross-platform) and the linter-processing plugin: https://atom.io/packages/linter-processing . If you need to you can batch-lint a set of .pde files after an assignment is turned in and generate a report (I have not done this or used this plugin, but it is possible).

  • So it seems that we have multiple opinions about students and how learning should occur.

    None of our opinions are completely right, and yet, none can be discarded as wrong. It is now up to you, the OP, to decide which approach is the best.

    Also note that you still haven't provided us with the level of the questions you're giving in order for us to help you any further than giving opinions.

  • edited January 2017

    @jeremydouglass Your answer got me thinking - there is another aspect of the code that no one here (as yet) bothered about.

    How reusable the code is (and, in the related manner, how uncomplicated it is). This is very important in any program because then your students don't need to hard code solutions (which I strongly oppose), and rather will be able to produce code that can change its output with just a few trivial changes (which I strongly recommend).

    Also, there may be overtly complicated ways of achieving the same results when it comes to more complex problems, and this should be discouraged. Instead, the simplest methods should be awarded extra points as long as they're fairly reusable.

    EDIT: Spellings checked

Sign In or Register to comment.