Want Reliable Crew? Create Reliable Assessments.

By Murray Goldberg, CEO, Marine Learning Systems

To repeat the opening words of last month’s Training Tips for Ships, assessments are critical. Without valid and reliable assessments, we cannot determine whether our trainees have absorbed the knowledge and learned the skills necessary for safe and efficient performance. Just as importantly, without valid and reliable assessments we are missing the most important metrics upon which we can measure and improve our training program. And as the saying goes, if we cannot measure it, we cannot manage it. Thus, our assessments require every bit as much thought as the training itself.

There are two measures of assessment quality: validity and reliability. A valid exam is one that tests what we actually want to know. Validity was described in last month’s article. In this article, we complete the picture by discussing exam reliability.

Reliability in assessments is the quality of consistency of outcomes given that the inputs are consistent. That is, if two people with the same knowledge or skills were to be assessed, a reliable assessment would produce the same results in both cases. This sounds easy, but in fact it requires some thought. Let’s look at some of the pitfalls of assessment practices that can impair reliability.

First, assessment questions or scenarios need to be written or explained in clear language that every trainee will equally understand. This can be hard to achieve when the native language of your trainees differs, or when their ability to absorb the question itself varies. If the question itself is not uniformly understood, then part of what is inadvertently being tested is the understanding of the question rather than the knowledge or skill being tested. Similarly, the testing environment should be comfortable, and the test interface should be easy to navigate. If not, the assessment results may reflect the poor testing interface or conditions and produce unreliable assessment outcomes. This will cause your organization to rely on incorrect assessment metrics.
Second, the assessment should ideally present several questions or scenarios for every knowledge or skill being assessed. Every response or demonstration given by a trainee in an assessment is a product of their knowledge or abilities, plus a small amount of chance or randomness (especially in multiple-choice exams). Assessing a competency multiple times smooths out randomness much the same way that flipping a coin 100 times is much more likely to give you a 50/50 distribution than if you flip it only 4 times./li>
Third, the scoring of assessments must be objective. If it is not, this is one of the most likely sources of unreliability in assessments. The exam type plays a key role in achieving objectivity. For example, multiple-choice questions are always scored objectively. There are answers determined to be correct and answers determined to be incorrect, and everyone is scored by the same standard.

But what about long-answer or essay-type exams? And similarly, what about assessments of skills such as navigation or BRM? These kinds of assessments require a highly expert human to observe and judge the writings or performance of the trainees and then produce an assessment result. However, humans, even expert humans, vary tremendously in their beliefs, experience, knowledge, attention to detail, etc. Even one assessor will vary over time. As such, the assessment scores produced are inherently subjective. This makes it difficult to compare trainees against one another, or even to compare the same trainee across multiple performances.

One way to reduce the subjectivity of “expert” human assessments is to construct and adhere to a detailed rubric. A rubric is an explanation of the various possible outcomes from the assessment, along with a guide indicating how that outcome should be scored. The more detail and more complete the rubric, the more consistent the assessment scoring (assuming the assessor adheres closely to the rubric).

Another approach to improving reliability in observational skill assessments such as drills or simulator exercises is to employ technologies that automatically grade the performance based on indicators entered by an observer. This is a new and exciting area that promises to remove subjectivity from assessments of skill and is an approach my company is involved in (and yes - I used the words “exciting” and “assessments” in the same sentence).

Assessment is a critically important area of training that is often not given the thought it deserves. But when well designed, assessments not only ensure your employees are ready to perform, but also contribute important data on which the overall performance of your training can be measured and improved.

Until next time, keep well and sail safely.

The Author:

Photo courtesy Marine Learning Systems

Murray Goldberg is CEO of Marine Learning Systems, maker of MarineLMS. A researcher and developer of learning management systems, his software has been used by millions of people and companies worldwide. Contact Murry here: murray@marinels.com