We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Is there a way to test if two processing programs are functionally identical?
Background: I am teaching a course teaching Processing course and was looking for an automated way of scoring the student's projects. I am looking for a way to automatically compare the students submitted code to my correct solutions.
Is there a way to make a script that checks if the students program contains specified elements? Also, is there a way to simulate input and automatically check program output state.
Currently I am manually running each students submission (one at a time) and reviewing the source code to test/score their submissions. I would greatly appreciate any advice on how to improve and streamline my grading pipeline.
Answers
If you want to test if they are completely identical, you can copy your code and their code into documents, and load them into strings (you can iterate through these strings and remove spaces). Then, you can test to see if the strings are identical. For those that are not, you can write some code to tell you where their code is different from yours. Let me know if this helps.
Many years ago I created an application to grade simple procedural C++ programs (programming 101 level). Although it worked I didn't use it as a full replacement for teacher assessment rather I used it to identify areas or problems in the student source code.
It worked like this, the tutor created an exemplar solution to the assignment. The software would analyse this to measure several quality indicators. Examples include
1) The number of functions used.
2) The number of variables used
3) Average length of variable / function identifiers 4) Number of comments (% of source code)
5) Number of program statements
6) can't remember the rest :)
The tutor used this information to generate a marking scheme. For instance if the average length of variable / function identifiers is 5.7 then the tutor would award 100% for identifier lengths in the range 4.2 to 7.0 and zero for length <3.5 and >7.5 The ranges 3.5 to 4.2 and 7.5 to 7.0 would be a linear interpolation 0-100%
The first line of a students program was a comment with their name and student ID. Each student submission was analysed in the same way as the exemplar and marks awarded according to the marking scheme. As I said it, it was a useful tool to give an overview of the source code but I still tweaked the marks after visual examination of their code.
For a functional comparison of the graphical output of a sketch I suggest hiring a student.
@strauss --
I've looked at some approaches to this in the related problem of personalized assignments: for a given exercise every student is automatically generated a different, related goal, and there is an auto-checker to help them check their own answers (program output) before turning it in. The problem / limitation is that the checkers are domain-specific, so I would need to know a lot more about what kind of programs you are assigning. Many of my examples had visual output and computed the differences between two images.
This isn't a direct substitute for inspection of and personalized feedback on code -- instead, it was developed as a way of focusing students on concepts rather than giving each the same exercise with the same answer. However, the checkers could be batch-run to give initial indications of meeting requirements and correctness.
Here is an example:
rect()
for rectangles. Student must draw 3 different RGB rectangles on a 5*5 grid. Each student receives a different image to create.randomSeed()
, and then generates a target output image of 3 personalized "random" rectangles.rect()
is required andimage()
is forbidden. You could make this available to students to self-check their own work.For really simple questions, a process as detailed by either @jeremydouglass or @quark will be more than enough. We need to know which level you're gonna teach to provide more specific answers.
I suggest that you make sure that your students don't know what method you use for grading them, because, as @koogs illustrates, it will be incredibly easy for students to fool automated systems once they know about it.
For some mouse/keyboard based question, you can use Robot API to check automatically if the program works or not, but the fact remains that working/not working cannot be the only criteria that is used for grading programs.
@strauss --
One of the nice things about the case of visual output-oriented assignments is that it is generally very hard work (especially for a novice programmer) to fake a correct program without loading the previously provided correct output image as data from a url or file -- and it is trivial to check if a program loads image data. Turning a fake solution (e.g. something that tried to hide a call to loading an image, rather than rendering the solution) would also be egregious cheating -- high risk of getting caught and no plausible deniability -- which most students avoid. Also, consulting other students answers (or solutions from past quarters) can be encouraged, as the concepts are there but they won't satisfy one students specific requirements, which are unique. Cheating is less tempting if it is significantly more work than completing the assignment!
I tend to take a slightly different approach from the point @Lord_of_the_Galaxy made about @koogs example of fake comments, and the lesson to keep tests secret -- instead I think the idea of an output test as a requirement is an opportunity to introduce the Test Driven Development paradigm (even if intro coders or art coders don't call it such and won't become computer scientists or software engineers). In that scenario, the checker is a requirement / a test / a contract -- it doesn't need to be a secret, as it is part of the assignment itself. For attempting to measure code qualitatively (e.g. how many 'good' comments) automated checking works really poorly. But for measures of what code does (what tests it passes) it works extremely well, and the tests can (and should) be public! If your code passes the test, you (probably / usually) completed the assignment.
You may care about students learning to code in a pleasing, idiomatic form (comments, camelCase, line breaks, capitalizing CONSTANTS, etc.). This isn't as high a priority for me as concepts, but if you want to make it a part of grading criteria and are looking to do first-pass automated checking then I would suggest requiring them to use a linter program. Tell them that programs with high lint warning counts may be subject to losing points. They can work in the main Processing PDE and do their lint checking elsewhere -- for example, see Atom (free cross-platform) and the linter-processing plugin: https://atom.io/packages/linter-processing . If you need to you can batch-lint a set of .pde files after an assignment is turned in and generate a report (I have not done this or used this plugin, but it is possible).
So it seems that we have multiple opinions about students and how learning should occur.
None of our opinions are completely right, and yet, none can be discarded as wrong. It is now up to you, the OP, to decide which approach is the best.
Also note that you still haven't provided us with the level of the questions you're giving in order for us to help you any further than giving opinions.
@jeremydouglass Your answer got me thinking - there is another aspect of the code that no one here (as yet) bothered about.
How reusable the code is (and, in the related manner, how uncomplicated it is). This is very important in any program because then your students don't need to hard code solutions (which I strongly oppose), and rather will be able to produce code that can change its output with just a few trivial changes (which I strongly recommend).
Also, there may be overtly complicated ways of achieving the same results when it comes to more complex problems, and this should be discouraged. Instead, the simplest methods should be awarded extra points as long as they're fairly reusable.
EDIT: Spellings checked