Wednesday, March 27, 2024

AI-squared ... a testing paradigm


AI-squared. What's that?
Is this something Project Managers need to know about?
Actually, yes, PMs need to know that there are entirely new test protocols coming that more or less challenge some system test paradigms that are at the heart of PM best practice.

AI-squared
That's using an AI device (program, app, etc.) to validate another AI device, sometimes a version difference of itself! Like GPT-2 validating -- or supervising, which is a term of art -- GPT-4. (Is that even feasible? Read on.)

As reported by Matteo Wong, all the AI firms, to include OpenAI, Microsoft, Google, and others are working on some version of "recursive self-improvement (Sam Altman), or as OpenAI researchers put it: the "alignment" problem which includes the "supervision problem", to use some of the industry jargon. 

From a project development viewpoint, these techniques are close to what we traditionally think of as verification that results comport with the prompt, and validation that results are accurate. 

But in the vernacular of model V&V, and particularly AI "models" like GPT-X, the words are 'alignment' and 'supervision'
  • Alignment is the idea of not inventing new physics when asked for a solution. Whatever the model's answer to a prompt is, the prompted answer has to "align" with the known facts, or a departure has to be justified. One wonders if Einstein (relativity) and Planck (quantum theory) were properly "aligned" in their day. 

  • 'Supervision is the act of conducting V&V on model results. The question arises: who is "smarter": the supervisor or the supervised? In the AI world, this is not trivial. In the traditional PM world, a lot of deference is paid to the 'grey beards' or very-senior tech staff as the font of trustworthy knowledge. This may be about to change.
And now: "Unlearning"!
After spending all that project money on training and testing, you are now told to have your project model "unlearn" stuff. Why?

Let's say you have an AI engine for kitchen recipes, apple pie, etc. What other recipes might it know about? Ones with fertilizer and diesel? Those are to be "unlearned"

One technique along this line is to have true professional experts in the domains to be forgotten ask nuanced questions (not training questions) to ascertain latent knowledge. If discovered, then the model is 'taught to forget'. Does this technique work? Some say yes.
 
What to think of his?

Obviously, my first thought was "mutual reinforcement" or positive feedback ... you don't want the checker reinforcing the errors of the checked.  Independence of the developers by the testers has been a pillar of best-practices project process since anyone can remember.

OpenAI has a partial answer to my thoughts in this interesting research paper.

But there is the other issue: so-called "weak supervision" described by the OpenAI reseachers. Human developers and checkers are categorized as "weak" supervisors of what AI devices can produce. 

Weakness arises by limited by time, by overwhelming complexity, and by enormous scope that is economically out of reach for human validation. And, humans are susceptible to biases and judgments that machines would not be. This has been the bane of project testing all along: humans are just not consistent or objective in every test situation, and perhaps from day to day.

Corollary: AI can be, or should be, a "strong supervisor" of other AI. Only more research will tell the tale on that one.

My second thought was: "Why do this (AI checking AI)? Why take a chance on reinforcement?" 
The answer comes back: Stronger supervision is imperative. Better timeliness, better scope, and improved consistency of testing, as compared to human checking, even with algorithmic support to the human. 

And of course, AI testing takes the labor cost out of the checking process for the device. And reduced labor cost could translate into few jobs for AI developers and checkers.

Is there enough data?
And now it's reported that most of the low hanging data sources have been exploited for AI training. 
Is it still possible to verify and validate ever more complex models like it was possible (to some degree) to validate what we have so far?

Unintelligible intelligence
Question: Is AI-squared enough, or does the exponent go higher as "supervision" requirements grow because more exotic and even less-understood AI capabilities come onto the scene?
  • Will artificial intelligence be intelligible? 
  • Will the so-called intelligence of machine devices be so advanced that even weak supervision -- by humans -- is not up to the task? 



Like this blog? You'll like my books also! Buy them at any online book retailer!