«Far East Journal of Experimental and Theoretical Artificial Intelligence Volume 1, Issue 2, 2008, Pages 87-125 Published online: August 12, 2008 This ...»
These students learned more than a control group of students who read a carefully chosen chapter of text focusing on the same physiology and problem-solving tasks. The pre-test, the post-test, the results of the questionnaire, the control group and the learning gains achieved by these students are described in detail in .
MICHAEL S. GLASS and MARTHA W. EVENS Table 4. Pre-test and post-test scores for individual students using CIRCSIM-Tutor (T-tests show that the improvement on Section 1 (DiffRel) and the improvement on Section 2 (DiffCorr) are both significant at the.02 level. The two groups of students above and below the line had exactly the same experience except that the pre-test and post-test were switched.)
turns per session ranges from 9 to 94. Examining these results we see that the students who made few errors in their predictions were asked very few questions. This is different from human tutoring, as in our transcripts of human tutoring the tutors probe for common misconceptions when students produce perfectly correct predictions [30, 44]. Later versions of CIRCSIM-Tutor included questions for use in these circumstances.
Non-alphabetic turns The 1642 student turns include 449 that were not alphabetic. The majority of these were plus signs, minus signs, or zeros in response to questions about values of variables. One student typed the letter “o”, which the spelling correction module correctly mapped into a “0” (zero).
There were a few student utterances consisting of question marks, for which the input understander issued a message explaining the kind of input it wanted to see. These messages are shown in Table 6. There are also three other instances of apparently random punctuation marks, to which the system responded by issuing a new prompt. In all three, the student then came up with the right answer, so we judged that simply issuing a new prompt was an appropriate response. However we are considering making the system produce an error message specifying the desired input in these cases also.
The spelling correction module made 96 attempts at spelling correction and succeeded in all but five. Two of the failures were errors in correcting phrases and in two of the remaining ones the system corrected “in” to “increase” when “inverse” would have been preferable. In one case the system failed to correct “cop”, an unusual abbreviation for cardiac output, to the standard abbreviation “co”. This omission has been corrected.
Most of the successful corrections involved handling of abbreviations.
For example, the system corrected “cvb” to “cbv” for central blood volume and “vp” to “cvp” for central venous pressure. It restored the initial “n” to “ormal” and “eural”. It successfully corrected the phrases: “stroke voulme”, “cardiac ouptut”, “mean adreterial pressure”, and “total periferal resistanse”. It correctly changed “ionotropic state” to “inotropic state” and “contractibility” to “contractility”.
The following example shows one of two instances where a misspelled phrase was mishandled by spelling correction. The system corrected “venour” to “veins”, found the phrase “central veins”, did not even look for “central venous pressure”, and eventually extracted the word “pressure” as an answer to the question.
EXTRACTING INFORMATION FOR AN ITS 113T: Stroke Volume is determined by Central Venous Pressure and Inotropic State. Which determinant is dominant in this case?
S: central venour pressure (the system picks up the word pressure and decides that this means MAP) T: Nope, the important determinant of Stroke Volume in this case is not Mean Arterial Pressure. The important determinant of Stroke Volume in this case is Central Venous Pressure.
These errors were caused by the interaction between our method for processing phrases and our information extraction approach.
Missing vocabulary In examining these sessions we found two additions to the lexicon that could have improved system understanding of student input. In one case the system asked the student to name a stage and the student typed “initial” instead of the expected “DR” or “Direct Response”. Since the DR stage is indeed the initial one, we decided to add “initial” to the lexicon as a synonym for “DR”. The other addition involves the above-mentioned polysemous abbreviation “in.” for “inverse” and “increase”, depending on context.
Error messages specifying desired input
The 35 one-hour sessions contained 86 error messages, and four-fifths of the sessions contained at least one of these messages, so it is clear that they impact on users. We believe that they are responsible for increased user satisfaction with the system, since we saw very few of the unrepaired repeated misunderstanding loops that we often observed in use of the previous input understander. These messages and their frequencies observed during this trial are listed in Table 6.
MICHAEL S. GLASS and MARTHA W. EVENS
(Used when two alternative answers are detected in one response) Most of the time one of these messages was enough to get the student back on track. There were seven sequences of two or more of these turns, the longest sequence of this kind had four error messages. These messages are issued by the input understander, which does not inform the instructional planner that the student is in trouble. The system has since been changed so that after two error messages the student receives the correct answer.
Problem dialogues and student frustration There were only a handful of dialogue segments that exhibited serious problems. We exhibit a few of them here.
Bozena Thompson, who wrote one of the first interactive interfaces to database systems, discovered almost 30 years ago  that any interactive system that accepts natural language input must expect remarks from frustrated users. At this point such remarks from the student are not understood by the CIRCSIM-Tutor. In November, 1999, the system received only three such inputs which could be interpreted this way, and they were pretty mild. Two were strings of random letters, they could have been either expressions of frustration or accidents with
the keyboard. The last is illustrated in this exchange:
T: What is the other determinant of Mean Arterial Pressure?
It is possible that “clueless” was an insult directed at the system, or it may have been intended as “I don’t know” student initiative that the system should have recognized. In trials with fifty students of an earlier version of CIRCSIM-Tutor in Fall, 1998 we had some clearly negative comments not fit to print. We did not see any in this trial. We would like to believe that improvements in the system averted these extremes of student frustration. It is also possible that the students were inhibited by the fact that we had team members patrolling the room to help in case of system crashes (there were none) and to make sure that no one left without taking a post-test and filling out a questionnaire. Since we do not have a cogent plan for responding to expressions of frustration we have not put any effort into recognizing them.
In this next example the student seems to be doggedly answering some other question rather than paying attention to the error messages.
T: What stage must the value of Cardiac Output follow in SS?
T: Nope, the value of Cardiac Output doesn’t follow the value of Cardiac Output in SS.
MICHAEL S. GLASS and MARTHA W. EVENS We have fixed this problem by giving the student the correct answer when the second failed attempt is not of the proper category, since a second category error indicates that the student did not understand the error message. This shows that the input understander should perhaps alert the planner when an attempt at repair fails.
There are several areas where improvements in the capabilities of the input understander seem particularly germane to the future of CIRCSIMTutor. As noted, there are still problems in the spelling correction phase of the system. When the misspelled word is in the context of a phrase, the system sometimes fails. It seems also necessary to use the dialogue context to weight candidate words, as is the case where “in” could mean “increase” or “inverse”, are still not using phrases and context effectively in spelling correction spelling correction is triggered only when a word in a string bounded by white space and punctuation does not appear in the lexicon. However this is inconsistent with correcting errors obtained by garbling one word into another, the error that misspells “form” as “from”.
Furthermore in the human tutoring transcripts we see conjoined terms such as “svi” for “sv i”, a phenomenon that is inconsistent with our algorithm of finding the closest single word in the lexicon.
From our study of the human tutoring sessions we can see that possible new tutoring plans will require the students to utter physiological concepts that commonly are expressed as variety of prepositional phrases. We already observed several such instances in November 1999 experiment. For example, the phrase “volume of blood” is sometimes used to denote the volume of blood in the central venous compartment. Our first thought was to include it in the lexicon as a synonym for the parameter “central blood volume”, or CBV as it is
usually abbreviated in the tutoring sessions. The problem is two-fold:
there is a certain amount of linguistic creativity in describing parameters, and some phrases can be used in other contexts to denote different parameters. For example, “volume of blood in the ventricle” denotes ventricular filling and “volume of blood ejected by the heart” is a roundabout way of describing “stroke volume”, so it would be infelicitous
EXTRACTING INFORMATION FOR AN ITS 117to recognize “volume of blood” as a synonym for CBV. Similarly, the word “pressure” is also used in the range of contexts of the form “pressure” + [“of blood”] + preposition + anatomical structure Generally “pressure” when unqualified denotes mean arterial pressure but “pressure in the veins” means central venous pressure. We can add finite state machines to handle these phrases by looking for words such as “volume”, “pressure”, “blood”, and “length” combined with prepositions such as “of ” and “in” followed by noun phrases. We would also need a small ontology of anatomical terms.
Equations in the input
Examination of student utterances in human tutoring sessions shows many student answers containing algebraic expressions, even when the tutor did not explicitly ask for an equation. So a question asking for the
determinants of cardiac output may be answered with an equation:
K40-tu-108-6: Can you tell me what determines the value of CO?
These equations are invoked for their descriptive power; students are not solving problems quantitatively. Glass  catalogs these answers, discusses the possibility of adding a special purpose expression grammar, and discards that plan in favor of a grammar combining algebra and English because there are so many examples in which algebra and
English are combined into one sentence, as in:
K7-st-100-1: But isn’t CO X TPR =MAP ?
K13-st-52-1: Sv times hr =co.
Understanding student input involving equations and making appropriate responses is known to be a tricky problem [28, 52].
We believe that the reason we see such language in human tutoring transcripts but not often in student use of CIRCSIM-Tutor is that unlike the human tutors, the computer does not invoke equations in its own language. Human tutors spontaneously produce combinations like this
Because of its power and ubiquity in the human tutoring transcripts, we believe it will be useful to incorporate equation language into CIRCSIMTutor’s dialogues as well.
Answering “Why” questions Another concern is that the system does not ask as many deep questions such as “why” questions as we would like, because it cannot understand the answers well enough to respond to them. In order to collect more student answers to “why” questions and to discover whether the students benefit from trying to answer these questions even without much feedback, we added at least one “why” question to each procedure during the next set of experiments and then provided the student with a good answer based on our most recent set of human tutoring sessions.
This change also addresses the fact that the computer tutor does not elicit any student language when the student correctly solves the problem with no errors. We also observe that in sessions with human tutors the students periodically ask substantive questions. Even more often they state a mini-theory and ask the tutor to comment . We would like to try to provide this experience for the students. ITS architectures such as AutoTutor  have been evolved toward a two-tiered approach, one for short (several word) student utterances and another using technologies such as robust parsing or latent semantic analysis for longer student utterances.
The CIRCSIM-Tutor dialogue-based intelligent tutoring system incorporates an approach toward processing student utterances that is designed to be simple and robust, using finite state transducers to extract the answer to the tutor’s question from the student’s utterance.
The tutoring system with this input understanding component was successfully tested with 35 tutoring sessions using first year medical students at Rush Medical College in a regular physiology laboratory.