Assess for Learning

Short-Term Memory: Grading Multi-Part Questions Fairly

Anyone who has designed a serious assessment knows the problem. Task two depends on task one. Task three depends on task two. The candidate gets task one slightly wrong, carries the error forward, and now their task two answer is technically incorrect even though their reasoning was perfectly sound given the input they had. Do you mark them down twice? Do you construct elaborate manual workarounds? Or do you pretend the dependency does not exist and let the unfairness slide?

Most platforms pretend. Assess for Learning does not. Short-term memory is the feature that handles task dependencies properly, passing the right context forward so that each task is graded on the candidate’s own reasoning, not on whether an upstream task happened to produce the expected numerical output.

“The candidate should be assessed on their own thinking, not punished recursively for a single upstream error.”

The problem with independent grading of dependent tasks

Treating each task as an isolated unit is the default in most assessment platforms because it is the easiest thing to implement. The grader reads task one, scores it against the expected answer, moves on. The grader reads task two, scores it against the expected answer that assumes task one was correct. If the candidate’s task one answer was wrong, their task two answer will also be wrong against the reference, regardless of whether their method was flawless.

This is not a theoretical concern. It is the daily reality of grading quantitative assessments, case studies, applied problem sets, and any assessment where later tasks build on earlier ones. The candidate who makes a small arithmetic slip on task one and propagates it correctly through the rest of the paper is, in most platforms, penalised multiple times for a single mistake. The candidate who gets task one right by guessing and then produces nonsense on task two is, in most platforms, credited for the one lucky answer.

Neither outcome is defensible. Both are common. The profession has been working around this problem with manual grading conventions (marking “consequential errors”, applying “error carried forward” adjustments, and similar) for decades. What has been missing is a platform that handles it automatically and consistently.

How short-term memory works architecturally

Short-term memory, or STM, is a mechanism inside the Assess for Learning rules engine for passing information between tasks during grading. When task two is configured as dependent on task one, the evaluation criteria for task two can reference an STM slot that holds the candidate’s actual output from task one, not the reference output. The grading process for task two then evaluates the candidate’s reasoning relative to their own earlier answer, not relative to a hypothetical correct answer they never produced.

The effect is significant. If the candidate computed the wrong discount rate in task one but then applied their wrong rate correctly through the rest of the NPV calculation in task two, the STM mechanism passes their wrong rate forward and the task two grading sees that the application was flawless. Task one gets marked down for the error. Task two gets credited for the correct application. The candidate is penalised once for one mistake, which is the only fair outcome.

STMs are not limited to numerical outputs. They can carry textual reasoning, classification decisions, intermediate hypotheses, or any other structured artefact the assessment designer wants to preserve. The implementation is generic: the slot holds whatever the evaluation criteria put into it, and downstream tasks pull from that slot when they need context.

Why this is more than a convenience feature

From a pure mechanics standpoint, STM is a small technical detail inside the rules engine. From a fairness standpoint, it is one of the most consequential design decisions in the platform. It embodies a principle that professional grading has always tried to honour and always struggled to operationalise at scale: the candidate should be assessed on their own thinking, not punished recursively for a single upstream error.

For credentialing bodies, this matters at the level of defensibility. When a candidate appeals a grading decision and asks why they received a low score on task two when their task two reasoning was sound, the answer “our platform could not handle the dependency” is not acceptable. The answer “the dependency was modelled in our grading logic and your reasoning was evaluated on your own work” is the right answer. STM is how that answer becomes true.

For assessment designers, STM unlocks a category of assessment that was previously impractical. Multi-part applied problems with genuine dependencies between steps can now be graded cleanly without elaborate manual adjustment. Case studies can have multiple phases that build on each other. Modelling tasks can have sequential components that compound. The assessment can mirror the structure of real professional work instead of being flattened into independent atomic questions for the sake of the platform.

How it fits into the rules engine

STM sits inside the wider Assess for Learning rules engine, which is where the detailed evaluation logic for every assessment lives. The rules engine executes the grading process layer by layer, starting from the atomic evaluations and building up through task-level and assessment-level aggregations. STM slots are populated as lower-layer evaluations complete and are read by higher-layer evaluations that need upstream context.

This architecture means STM is a native part of the grading flow, not a bolt-on. The assessment designer configures dependencies as part of the normal evaluation criteria. The evaluation copilot can suggest STM usage when it detects task dependencies during rule generation. The precision report captures the STM-aware grading outcomes alongside the rest of the psychometric evidence, so the fairness principle is not just implemented, it is documented and auditable.

For technical teams evaluating the platform, this is one of the places where the depth of the engineering shows. STM is the kind of feature that is invisible to most buyers and decisive for assessment designers who know what to look for.

From implicit unfairness to explicit fairness

Most assessment platforms handle task dependencies badly because they were designed for independent questions and retrofitted for anything more complex. Assess for Learning was designed from the start on the assumption that real assessment involves dependencies, sequences, and context. STM is the mechanism that makes that assumption operational.

If your credentialing programme runs multi-part assessments and you are relying on manual workarounds or accepting the unfairness as an unavoidable cost, the platform you are using is shaping the limits of your assessment design. Short-term memory removes that limit, and with it the quiet unfairness that most programmes have been living with.

Ready to grade dependent tasks the way they deserve to be graded?

Talk to us about how short-term memory and the Assess for Learning rules engine can handle the complexity your current platform cannot.

Explore Assess for Learning

Ready to explore AI for your organisation?

Talk to our team about how Globebyte can help.

More insights