Background for IPIG in Dublin, Agenda Item #4
Comments by Linda Driver (RLG)
Anyone have thoughts on what to do about interoperability testing? Because many implementors won't be able to participate live at the Stockholm meetings, I'm throwing out a few questions and a few thoughts of my own that may generate some discussion.
Linda Driver (RLG): I won't be at the IPIG meeting in Stockholm, but fortunately Dave Richards and Lennie Stovel will be there representing RLG. I do think this is a very important issue that needs further thought and discussion. We learned a great deal from the last round of interop testing, but we only skimmed the surface.
In that flurry of testing in June, what went right? What went wrong?...
Why are you (i.e., "you" in a corporate sense) participating in interoperability testing? What do you want to get out of it? How best can that be achieved?
Here are some of my thoughts on it. If you find them a bit too rambling, I apologize.
I watched with interest as the reports on successful testing poured in during the second week of June this year. And like any spectator at a good game, started to keep a score-card, as I found it difficult to keep track of the activity without "keeping score".
I found it very difficult to actually get a picture of what the test results actually meant, as the results were recorded, message by message, on 21 different charts. At the same time, I noticed that there seemed to be a pattern of different levels of testing from the very basic connections to the testing of a complete transaction.
As an old cataloguer, I naturally started to classify the test results in a simple schema:
IPIG Phase I (e.g., ILL-REQUEST and STATUS-AND-ERROR-REPORT)
Basic connection
Non-returnables
Returnables
I noticed that the last three classes were used in the 1992 Interoperability Test Suite. So, using that as a guide, I went through the tests and classified the test results according to the tests results reported. Then I drew up a blank chart and started to fill it in with a summary of the activity. I also annotated whether the tests were using MIME-encoded SMTP or TCP, and whether the role played was that of the Requester or the Responder. I added colour to make the different levels of results stand out. And I began to see an overall picture of interoperability emerging.
TLC, with a full range of messages exchanged with AG-Canada, Fretwell-Downing, and MnSCU, seemed to have achieved the fullest range of testing with the largest group of partners. OCLC had exchanged the two APDU associated with "IPIG Phase I" with many other applications, but haven't made much progress beyond that in interoperability testing with other systems. Ameritech and Fretwell-Downing appeared to be the only ones successfully exchanging variants of ILL-ANSWER, such as "conditional" and "retry" in the latest round of testing, although TLC and Fretwell-Downing had done the full round of testing including all but one ILL-ANSWER variants in March..
Linda Driver made a minor correction, noting that it was not RLG who exchanged variants of ILL-ANSWER with FDI: I wish I could claim that we've tested all the flavors of ILL-Answer, but so far we've only exchanged Ill-Answer.unfilled.
Barb: My mistake, and my apologies to Ameritech. For it was Ameritech, not RLG, that tested variants of ILL-ANSWER with FDI, so I've corrected this in the paragraph above.
Brian Kowalski, Product Development Programmer, TLC, sent Barb a note to say:
The current charts don't represent the achievement of this level of testing complexity. And I feel that it is important that this is somehow conveyed.
Nic Sprauel's comments on the list on 16 June indicated that things didn't always go smoothly in the latest round of testing. And that most of the testing avoided "a maximum of externals or service configuration (eg electronic vs physical delivery, copy vs loan, etc..."
He also suggested that changes are needed to make the tests to be meaningful. This is even more critical now that the IPIG Profile has been approved. IPIG, as a group, now needs to decide how implementors are to dynamically exhibit conformance to the constraints identified in Clause 7 of the Profile. One way would be by successful completion of a group of interoperability test cases designed to exhibit the specified behaviour.
When IPIG last talked about testing, many implementors indicated that they had internal testing procedures and didn't require any external testing procedures at that time. But to demonstrate interworking, tests need to be done in a standard way, ensuring that when implementors say the same thing, it also means the same thing. And we need to work at the hard bits (externals, etc.).
Therefore, I think that we need to start working on a standard set of test cases. There is no external testing lab, so we're going to have to do it ourselves. Any development of a set of test cases should be based on a common understanding of why we're testing and whom we're testing for. We need to find a way to give the purchasers of implementations conforming to the IPIG Profile and the broader library community reasonable assurances that these systems will interoperate.
The 1992 test suite covers basic testing of the Protocol requirements. We need a way to cover the testing of this functionality, and this could be used as a starting point. We should also use the expertise gained thus far by the current implementors. However, we must go further and develop test cases to cover the constraints identified in the IPIG Profile.
Linda Driver: I fully agree. I think there are also some interesting problems that surfaced in the last round of testing that might not be discovered by standard tests. I'm sure most of us who were directly involved with the testing can think of one or two problems that might escape notice (at least initially) if we don't construct specific test situations to catch them. The next round of interop testing is extremely important to be sure that our systems can successfully interoperate at a user level.
IPIGlettes who were in Stockholm will recall that I had proposed an agenda item for that meeting to discuss testing. However, it was postponed because it was felt that some of the players were not present. I'm not sure, with the roll-call for the next meeting, if the right players will be at this upcoming meeting as well.
But I do think that we should be looking at a more comprehensive testing plan than what is currently in place. And we can't keep on postponing the discussion until the "right" players attend a meeting.
Ruth Moulton (now off touring Australia on a 6-month leave of absence) gave an excellent presentation on testing at the tutorial in Stockholm, and I would like to draw your attention to this http://www.nlc-bnc.ca/iso/ill/document/tutor/stut9910/index.htm as a introduction to this discussion:
She outlined 3 levels of testing:
The current IPIG testing is only at the first level, that of exchanging APDUs. Some developers have succeeded in exchanging all APDU among themselves, and should be ready to go on to a higher level of interoperability testing. And with the IPIG Profile now approved, we should be integrating profile requirements into the testing algorithm.
I sat on the sidelines as an observer, and wondered if there could be some sort of organization of the testing. I found that with the 20 odd charts, it was really hard to get a picture of how an application was doing. And I thought more about it after Nick Sprauel from Fretwell-Downing pleaded for more organization next time around. What about grouping the tests into catagories that get progressively more difficult and complex?
As a starting point, I'm looking at the type of structure that's in the Interoperability Test Suite that was prepared in 1992 for the early Canadian implementation. But this would have to be adjusted to meet the requirements of the IPIG Profile and current implementors.
So I'm throwing out the following as a basis for discussion. What about something like:
Testers would agree on the level of test, and that could be recorded on a single summary page. Potential customers could see quite readily the level at which an application had been tested.
Linda Driver:
I agree that we need to organize the testing. It would be useful to have consistent testing scenarios that exercise the requirements you outlined above. I've developed 7 scenarios for our own internal testing that might be adapted for this purpose. If you'd like, I could revise the scenarios and send them to you as an example.
Linda's notes (1999/11/18) on revised set of scenarios:
I revised the testing scenarios and ended up with 12 levels. A clever tester could combine a few of the scenarios into a single request scenario, thus cutting down on the number of real requests that would need to be exchanged (actually, scenario #8 simply combines #5-7 into one scenario).
I've tried to make the levels progressively more difficult. Actually, the levels reflect the way the actual testing progressed--from a simple request that was canceled or not supplied, to a loan request that moved through all the possible tracking stages. The last apdus that seem to get tested are Lost, Message, Expired and the various flavors of ILL-Answer (other than Unfilled or Conditional).
I've also included what I think might be a sample "Level 1" for operational interoperability, using some of the requirements from the IPIG Profile. This could get tricky, however. I don't think we want to have levels of compliance with the IPIG Profile, so I've just called it "operational interoperability."