|
|||||||
|
|
|
|||||
|
|
|||||||
|
Performance Based Tests: A Case Study. In J. R. Issac and K. Batra (Ed.) Cognitive Systems: Reviews & Preview, New Delhi: Phoenix Publishing, 2000, pp.793-805.
Performance Based Tests—A Case Study
By G. Bhatnagar and M. Kandan Centre for Research in Cognitive Systems, NIIT Ltd. Synergy Building, IIT Delhi, New Delhi 110016 Gauravb@niit.com kandanm@niit.com
Paper Presented at International Conference on Cognitive systems 15-17 December 1999
Abstract
Performance based testing has been suggested as an alternative to multiple choice tests. However, it is not clear whether performance based tests can be used for large scale, automated testing. We study the design and automation of one type of performance based test. The performance task we choose is a problem from the subject of Computer Algorithms. In our case-study, we find that by tracking changes in the state of the test software, we can design a test that collects evidence of a student’s competency in performing a task, as well as the student’s competency in applying common problem-solving strategies. This indicates that two crucial aspects of a performance based test—namely, examining the processes as well as the end-results of the problem solving activity of the student—may be automated.
Keywords: Performance Based Assessment; Performance Based Tests; Standardized tests; Problem solving; Rubrics. 1. IntroductionDue to a general dissatisfaction with traditional standardized tests, there has been a concerted effort directed towards finding alternative assessment techniques (see Linn, Baker and Dunbar [7]). Multiple choice tests have been criticized for their inability to measure complex problem solving skills [8], to relate to daily classroom activities [16], to gauge the processes involved in accomplishing the task performance and to examine learners’ application skills rather than superficial learning of the material [17]. Thus the main focus of alternative assessments was tuned towards addressing these drawbacks. Performance based assessment is one alternative assessment technique that has been proposed. One of the primary motivations for performance based testing is the belief that user performance competency is best demonstrated in a live-in-the-application settings. Further, it seems to be compatible with the shift in instructional methodology toward the constructivist paradigm. A performance based test is designed to assess students on what they know, what they are able to do and the learning strategies they employ in the process of demonstrating it.
However, it is not clear whether performance based tests can be used for large scale, automated testing. We study the design and automation of one type of performance based test. The performance task we use choose is a fairly complex problem from the subject of Computer Algorithms. In our case-study, we find that by tracking changes in the state of the test software, we can design a test that collects evidence of a students competency in performing a task, as well as the student’s competency in applying common problem-solving strategies. This indicates that two crucial aspects of a performance based test, namely examining the processes as well as the end-results of the problem solving activity of the student may be automated.
Many people have noted serious limitations of performance based tests and their vulnerability toward subjectivity in scoring and creating or providing the real or more closer to the task environment for assessment purpose (see Bryant [4], and Lukhele, Thissen and Wainer [9]). However, the concerns for subjectivity may be addressed simply by automating the test. The second issue is obviously a bigger problem, and there is no guarantee that ideas from one domain will apply to another. However, our case-study indicates that performance based testing may be applicable in a fairly complicated problem solving environment.
This paper is organized as follows. In section 2 we discuss the concept of performance based testing, and identify the key issues to bear in mind as we design performance based tests. In section 3, we summarize this information and also note a further requirement that comes from the problem of automating the test. Finally, before concluding we present our case study, where we study one type of performance based test, by designing an appropriate set of tasks that can be automated. 2. Performance Based Tests
A wide range of approaches have been labeled performance tests. Although there are no agreed-upon defining characteristics, the term generally refers to assessment tasks that require students to perform an activity or solve problems. In this paper, we have focused on a problem solving task. But there are other assessment tasks that may be included in a performance based test. For example, Bryant [4] suggests assessing portfolios of a student’s work over time, students’ demonstrations, hands-on execution of experiments by students, and a student’s work in simulated environments. In this section, we discuss the concept of performance assessment in order to identify the key issues to keep in mind as we design automated performance based tests.
Mehrens [10] notes that performance testing is not new. Various types of performance based tests were used before the introduction of multiple-choice testing. Performance testing includes:
· Performance tasks, · Rubrics or scoring guides, and · Exemplars of performance. Performance Tasks
· reflect the intended outcomes of the curriculum and the goals of instruction; and, · require students to demonstrate what they know and how they know it;
In addition, performance based tests requires that tasks involve examining the processes as well as the products of student learning. Rubrics and Exemplars
· performance dimensions that are critical to successful task completion; · criteria that reflect all the important outcomes of the performance task; · a rating scale that provides a usable, easily-interpreted score; · criteria that reflect concrete references, in clear language understandable to students, parent, and other teachers; and · A current understanding of what constitutes "excellence" in the subject matter or skill area.
Even though we do not talk about scoring criteria in this paper, we have to ensure that the performance test we design allows the design of scoring procedures satisfying the above criteria. Of-course, without well-laid out criteria to judge performance, assessment is pointless.
While there is some variance in the understanding of the term “performance based testing”, most authors agree on one key factor involved in performance based testing, namely the evaluation of the processes used in accomplishing the tasks. By processes we mean the various steps involved in solving the problem, such as understanding the problem/performance requirement, the problem-solving strategies employed at every stage and the efficiency of the strategy itself. Evaluating the end result—without evaluating how the student arrived at the result—does not provide any clue on how the student will fare in the real world. Further, just the end result leaves little scope for providing any meaningful feedback to the learner. Thus, while it is important to evaluate the end result of each task, it is also important to evaluate the processes involved in accomplishing performance tasks or in problem solving activity.
Assessing problem solving strategies is also a contentious area, with little agreement on techniques or even the strategies used for problem solving. This is even more so when the problem in consideration has more than one equally good solution or more than one efficient way of solving it. In the area of mathematical problem solving, Schoenfield [11] has studied many strategies, and also the factors that affect problem solving such as cognitive resources, heuristics, control or meta-cognition, and beliefs of the problem solver. (See Seldon and Seldon [12] for a summary of findings in this area.) While many of these cannot be tested, we demonstrate how common problem-solving heuristics may be tested using performance based testing.
As we will demonstrate, it is not difficult to track these processes and problem-solving heuristics in a complex environment. Even in a usual classroom-based test environment, these processes can be tracked by designing the test items appropriately and formulating rubrics for scoring that satisfy the above criteria.
Even though performance based tests are widely used in classrooms, automated performance based examination systems have not been implemented. There are two major concerns in the use of performance based testing. Many authors [5, 6, 7, 10, 13] have pointed out that since the ratings of performance assessments depend upon judgments of people, these tests are quite subjective. In fact, even in the early 1900s, Starch and Elliot [14, 15] supported objective tests for this reason. Moreover, since the tests seemed to require manual grading, they were not amenable for conducting large-scale assessment. It is clear that by automating performance-based tests, we will overcome any concerns of both subjectivity and scalability. Nevertheless, it is also apparent that the design of automated performance based testing systems will have to take these concerns into account.
The concern of subjectivity may be addressed by carefully designing the scoring rubrics, the magnitude of the variance due to raters can be kept at levels substantially smaller than other sources of error variance (see, for example [1, 2, 5, 13]).
In summary, we can say that to design problem based tests, we have to ensure that both processes and end-results should be tested. The tests should be designed carefully enough to ensure that proper scoring rubrics can be designed, so that the concerns about subjectivity in performance based tests are addressed. Indeed, this needs to be done anyway in order to automate the test, so that a performance based testing is used widely.3. Automating Performance Based TestsGoing by the complexity of the issues that need to be addressed in designing performance based tests, it is clear that automating the procedure is no easy task. The sets of tasks that comprise a performance based test have to be chosen carefully in order to tackle the issues mentioned in section 2. Moreover, automating the procedure imposes another stringent requirement for the design of the test. In this section, we summarize what we need to keep in mind while designing an automated performance based test.
We have seen that in order to automate a performance based test, we need to identify a set of tasks which all lead to the solution of a fairly complex problem. For the testing software to be able to determine whether a student has completed any particular task, the end of the task should be accompanied by a definite change in the system. The testing software can track this change in the system, to determine whether the student has completed the task. Indeed, a similar condition applies to every aspect of the problem solving activity that we wish to test. In this case, a set of changes in the system can indicate that the student has the desired competency.
Such tracking is used widely by computer game manufacturers, where the evidence of a game player’s competency is tracked by the system, and the game player is taken to the next ‘level’ of the game.
In summary, the following should be kept in mind as we design a performance based test.
· Each performance task/problem that is used in the test should be clearly defined in terms of performance standards not only for the end result but also for the strategies used in various stages of process. · A user need not always end up accomplishing the task; hence it is important to identify important milestones that the test taker reaches while solving the problem. · Having defined the possible strategies, the process and milestones, the selection of tasks that comprise a test should allow the design of good rubrics for scoring. · Every aspect of the problem-solving activity that we wish to test has to lead to a set of changes in the system, so that the testing software can collect evidence of the student’s competency. 4. The Game of Lights—A Case StudyIn this section we study a particular problem and design a performance based test. The problem that we wish to solve is fairly complicated, and would be appropriate as a take-home problem for a course on Computer Algorithms at the graduate student level. Nevertheless, we will see that most of the testing with this problem may be automated. To demonstrate this, we choose two smaller pieces of the problem. One of these pieces demonstrates a test of one aspect of the student’s problem solving ability. The other piece tests whether the student arrives at a critical observation required to solve the problem.
The Problem or performance task may be stated as follows:
The Game of Lights consists of a board of d by d square lights. Each light can take one of two states, on or off. When it is on, the square changes color from gray to yellow. When a light is clicked, it changes its state, and so do all the lights on the board which share a side with the clicked square. At the beginning of the game, all the lights are off. The objective of the game is to switch on all the lights.
Find a “good” algorithm to be able to tell whether a solution exists for the Game of Lights and to find the solution(s), if they exist.
The Game of Lights was originally written with d=5 by Yasunari Hiramatsu in 1996, and made available at many web-sites listing examples of Java code. A solution is given by Bhatnagar and Sharma [3]. Before moving on to the design of the test, we indicate some of the ways a student may find a solution. Note further that the problem has been stated somewhat ambiguously. Which algorithm is “good” depends to a large extent on the computers available for execution of the algorithms, the size of the d we wish to be able to handle and so on. However, the subject of computer algorithms is all about decisions about what is a “good” algorithm. How to solve the Game of Lights
A simple algorithm to find a solution is to do an exhaustive search of all the possible solutions. Assume that we are dealing with a dxd grid of squares. We can represent a solution by putting a 1 in a square if that square has to be clicked, and 0 otherwise. For example, one of the solutions for the 4x4 grid is represented as in Figure 1.
Figure 1. The simplest solution for the 4x4 Game of Lights
Note that to make an exhaustive search of all possibilities when d=4, we will have to check 216 possibilities. Some of these can be cut down by the symmetry considerations, dropping down the number of possibilities a bit, but still, the number of configurations to be checked remains very large, and the algorithm becomes impractical when d becomes big.
A faster algorithm can be designed if we realize that if we know the squared to be opened in the first row of the grid, then we can determine the rest. More precisely, the following idea works:
· Start by clicking on first row of the solution. · In Row 2, click on a square only if the square above is off. This ensures that all the lights of Row 1 are switched on. · Continue in this fashion from Row 3 to Row d.
Magically when we finish with the last row, we don’t need to click anywhere.
In short, the critical observation is: Observation: Suppose the top rows of the solution are already clicked. In the next row in the grid, we need to click on a square only if the light above that square is off.
A little thought should convince us why this is true. Note that we are going row by row, and the clicks do not change the parity of any squares two rows above where the mouse is clicked. Thus, once we finish with any row, we have to ensure that the row above is completely lighted. And this can be accomplished by clicking on a square only if the light above is off.
In view of this observation, the problem reduces to finding which lights in the first row have to be switched on. Now an algorithm can be easily written that does an exhaustive search for finding all possible first rows, and then checking which ones lead to a solution.
An even faster algorithm can be easily developed, where the problem is reduced to solving a set of equations, for which standard polynomial time algorithms are available.
Let F(n,m) denote the number to be placed in the box at the position (n,m), where n is row index and m is column index of the grid, for n, m=1,2,…d. Define the variables x1,x2,…,xd as
F(1,m) := xm, for m=1,2,…,d.
In other words we start by placing x1 to xd in the first row. Our job is to determine all the xm’s. Imagine for a moment that the grid has one extra row. Note that if we start with the first row of the solution, we should not have to click anywhere in the d+1th row of the grid, and but all the lights the d-th row should still be on.
More briefly, the algorithm may now be described as follows:
Step 1. Define the variables x1,x2,…,xd as F(1,m) := xm, for m=1,2,…,d.
Step 2. Calculate F(d+1,m), by using the recursion F(n,m)= F(n-1,m-1)+F(n-1,m)+F(n-1, m+1)+F(n-2,m)+1 (mod 2)
Step 3. Solving the system of equations F(d+1,m)=0 (mod 2), for m=1,2,3,…,d, for the variables x1,x2,…,xd, or if no solutions exist, indicate that no solutions exist.
These are all the algorithms that we describe for solving the problem. It is possible to speed up the algorithm still further by finding clever or parallel methods to do Step 2. But if we do so we will be straying from the purpose of this paper. We now return to the design of a test.
The Test
It is clear that the problem stated involves a lot of thinking by the student, and may consist of many steps before the student can arrive at a suitably “good” algorithm. Further, there are many ways of approaching the problem, and the student may go about the problem in many different ways. Finally, as we mentioned earlier, the problem is stated somewhat ambiguously. This has been done deliberately, since most real-life problems are ambiguous. Further, the extent to which the student pursues the goal of developing a “good” algorithm indicates her maturity in the subject.
We have mentioned earlier that we are interested in checking whether the student has demonstrated problem-solving skills. We also wish to find out whether the student has arrived at the critical ideas or observations required to solve the problem. Finally, there should be clearly defined milestones in solving the problems so that we know whether the student has reached them or not, in whichever way.
The test itself will be divided into a set containing one or more tasks. In what follows, we refer to a “testing software” that is somehow removed from the software that contains the test items, and which tracks various states in the actual test. We found it convenient to separate the two conceptually, since test items will be designed in various contexts and a testing software should be flexible enough to link up with many tests. We proceed with the first set of tasks.
The First Set
We start the test by explaining the student about the overall big problem as stated above, and mention that the test consists of a series of tasks that the student has to do in order to solve the problem.
The first task that the student has to do is to solve the 5x5 game. The Game of Lights with a few options (as shown in Figure 2) is presented to the student on the computer.
Figure 4.2. The Game of Lights
One of the options in this game (shown in Figure 2 as a drop down menu) allows a student to select the value of d. Note that d=5 is shown as the default in the game. This is because the Task as stated asked for a solution of the 5x5 grid.
Note that we can easily program the puzzle in such a manner that if the student can solve the puzzle, the program sends a message to the student (and the testing program) that the student has finished the task. For example, the original program programmed by Yasunari Hiramatsu accomplished this by flashing the yellow lights!
Further, this task gives an opportunity to the student to experiment with the problem a bit. Before trying to find an algorithm for solving this game, it is a good idea to play with it to appreciate how difficult the problem is for a human being to do.
Further, it gives an opportunity to the testing software to test out a common problem solving heuristic. One of the very common heuristics of problem solving is that it may be useful to solve smaller problems first. If the student tried out the 5x5 game, and cannot solve the puzzle immediately, this heuristic suggests that the student should try and solve smaller problems first. We can easily track this by capturing whether the student chooses 2x2 grid, solves it, then moves on to 3x3 grid, and so on. If the student tries smaller grid sizes, solves them and then returns to the 5x5 grid, we can be sure that the student has shown evidence that she has incorporated this problem solving heuristic in her bag of skills.
Another thing for the testing software to capture is whether the student can repeat the solution. Has the student written down or memorized the solutions for d=2,3,4 and 5 for further analysis for solving the problem? We can test this by repeatedly asking the student to solve the same task, or a related task of finding a solution to a smaller grid.
Once the testing software has obtained enough evidence that the student has completed the task, we can move on to item 2. To summarize, in this task we have simply obtained evidence whether the student has tried the heuristic of “trying smaller problems first” before moving on to a harder problem. Similarly, we can design test items to test other problem solving strategies.
The Second Set
The next set of tasks describe a series of activities to find out whether the student has arrived at the critical observation mentioned above in this section.
Before the second set of tasks we ask the student to try solving the Game of Lights for many values of d, and record the solutions somehow. This activity can be done off-line. We are hoping at this time that the student will arrive at the observation outlined above, that to find a solution only the first line is enough. Note that the student can at this stage have already developed some algorithm to automatically solve the game. Then come a series of tasks to test whether the student has managed to figure out the task and whether the student has arrived at the observation above.
This can be easily done by first asking the student about the values of d that the student was able to solve. Suppose the student has solved all the puzzles with d going from 2 to 10. Then we can present the grid with these values of d, and ask the student to open all the lights. If the student is able to open them correctly, we have obtained evidence that the student has been able to solve the puzzle for these values of d, or has written a program to solve it.
Next, we would like to find out a large enough value of d, that the student (or program) is not able to solve. We ask the student to solve the 11x11 puzzle. If the student is able to solve it quickly, we try some other larger value of d. Once we are sure the student does not know the solution (say of d=11), then we can present the puzzle with the first row of a solution already clicked. We will also have to disable the clicking in the first row so that any clicks in the first row do not change the parity.
Now the task is to solve the puzzle, but not click any square in the first row. We track whether the student is going row by row and clicking the right squares in Row 2 and below. If the student does it right we can repeat the process with some other d to make sure. If not, it seems likely that the student will at some point figure it out (since giving the first row of the solution is a strong hint). If this happens, we may perhaps give a smaller score for this part of the test.
To summarize, the performance of the student in the second set of tasks allows us to collect evidence that the student has arrived at an observation that is crucial to finding a nice solution to the problem at hand. This kind of observation or understanding is specific to the problem at hand. Key Design Features of the TestWe end this section by noting some of the key features of the above test. From the above, it should be clear that: ¨ Important milestones that the student has to reach while solving the problem were identified. ¨ We determined quite precisely the aspect of the problem solving process/concepts to be tested, and designed the set of tasks appropriately. ¨ Each single task that is used in the test is clearly defined, and the testing software is able to track its completion. ¨ The tasks just collect evidence about the competency of the student. Like any other test, they do not unambiguously determine whether the student has understood a concept or has displayed understanding. ¨ There is some repetition built into the test to increase the evidence that the student has displayed the required competency.
The performance task we chose indicates that a test can be designed to indicate whether the student can do fairly complex problems. Moreover, it is clear that the features above allow us to decide suitable scoring rubrics for the test.
5. Conclusion
We have seen that a performance based test can be designed and automated for a fairly complex problem. This performance based test can be used to assess aspects of the student’s problem solving strategies, and whether students have solved the problem. Perhaps the most important consideration for automating the test, is to use changes in the state of the test software, to collect evidence that the student has demonstrated the required competency.
References
[1] Baker E. L., ‘The role of domain specifications in improving the technical quality of performance assessment’ (CSE Tech. Rep.). Los Angeles: University of California, Center for Research on Evaluation, Standards, and Student Testing, 1992. [2] Barton P. E., and Coley R.J., ‘Testing in America's Schools Policy Informational Report’, ETS, 1995, 46 p. [3] Bhatnagar, G and Sharma A. ‘Learn Computing: A Game Plan’, Proceedings of the F13 conference, August 1999, NIIT Ltd., New Delhi. [4] Bryant D., ‘A Comparison of Multiple Choice versus Alternative Assessment: Strengths and Limitations’, The New York State Education Department, Albany, New York, 1996 (available at http://www.nysed.gov/rscs/rschmult.html) [5] Dunbar S. B., Koretz D., & Hoover H. D. (1991). ‘Quality control in the development and use of performance assessments’, Applied Measurement in Education, 4, 1991, 289-304. [6] Herman J., ‘Research in cognition and learning: Implications for achievement testing,’ In M. C. Wittrock & E. L. Baker (Eds.), Testing and cognition (pp. 154-165) Englewood Cliffs, NJ: Prentice Hall, 1991. [7] Linn R. L., Baker E. L., & Dunbar S. B., ‘Complex Performance-based assessment: Expectations and validation criteria’, Educational Researcher, 20(8), 1991, 15-21. [8] Lipman M., ‘Some Thoughts on the Formation of Reflective Education’, In Teaching-Thinking Skills: Theory and Practice , pp. 151-161. Edited by J.B. Baron and R. J.Sternberg. New York: W. H. Freeman, 1987. [9] Lukhele R., Thissen D. and Wainer H, ‘On the Relative Value of Multiple Choice Constructed Response and Examinee Selected Items on Two Achievement Tests’, Journal of Educational Measurement. 31 (3), 1994, 234-250 [10] Mehrens W. A., ‘Using performance assessment for accountability purposes’ Educational Measurement: Issues and Practice, 11(1), 1992, 3-20. [11] Schoenfeld A. H., ‘Mathematical Problem Solving’, Academic Press, Orlando FL, 1985. [12] Seldon A., Seldon J., ‘What does it take to be an expert problem solver’, Research Sampler, MAA Online, No. 4. (1997); (available at www.maa.org/t_and_l_/sampler/research_sampler.html). [13] Shavelson R. J., Baxter G. P., & Pine J., ‘Performance assessments: Political rhetoric and measurement reality’, Educational Researcher, 21(4), 1992, 22-27. [14] Starch D., & Elliot E. C., ‘Reliability of grading high school work in English’ School Review, 20, 1912, 442-457. [15] Starch D., & Elliot E. C., ‘Reliability of grading high school work in mathematics’, School Review, 21, 1913, 254-259. [16] Tuckman B. ‘The Essay Test: A Look at the Advantages and Disadvantages’ NASSP Bulletin, 77, 1993, pp. 20-26. [17] Zaremba S., and Schultz M., ‘An Analysis of Traditional Classroom Assessment Techniques and Discussion’, ED 365404, 1993, p.13.
|