1. Introduction
In recent years, global big-tech and high-tech companies such as Google, Meta, Apple, and NAVER have been competitively developing and gradually releasing advanced Generative AI (Artificial Intelligence). As a leader in this field, ChatGPT, a chatbot using conversational AI, was released by OpenAI in November 2022, leading to a surge in interest in Generative AI. ChatGPT is implemented using a Large Language Model (LLM) with a transformer model in natural language processing and unsupervised learning, which operates by considering extensive contexts and analyzing embedded content among words. To improve the accuracy of its outcomes, ChatGPT adopts Reinforcement Learning with Human Feedback (RLHF) through fine-tuning and prompt-tuning based on human feedback [1]. The achieved outcomes can be reinforced by probabilistic facts, human rewards, and other methods. In Generative AI, it is possible to reach better and better goals beyond the currently given optimized goals because the system improves through updates and adjustments via reinforcement-based fine-tuning.
Generative AIs work well to find solutions, but the accuracy of ChatGPT 4.0 is 85.5%, according to an announcement by OpenAI [2]. In addition, they are still struggling with issues such as hallucinations, confirmation bias, fake news, and various types of fraud [3]. It is necessary to provide explanations of the achieved solutions to determine their validity. However, it is not common to provide explanation of how outputs are achieved in Generative AIs, since the learning algorithms are based on black-box AI techniques such as Neural Networks (NN), Convolutional Neural Networks (CNN), and Deep Learning (DL) [4]. AlphaGo, for instance, is also trained using black-box AI like CNN. The explainable path might not be crucial because it is based on probabilistic models that assume the existence of unchangeable optimal goals for given problems. In mutually exclusive decision-making problems, it is important to show the paths to the goal as in Explainable AI (XAI) because the goal might be achieved by choosing branch among alternatives according to various conditions.
DARPA (Defense Advanced Research Projects Agency) started the XAI program in 2015 [5], as a type of white-box AI. Recently, interest in XAI has grown, and it has been applied to critical problem-solving, especially in financial, medical, and legal domains [6, 7]. For instance, XAI can be applied to diagnosing disease because it provides more reliable accuracy in diagnostic tests with interpretability [8, 9]. In addition, it is necessary to provide the explanations for the paths to reach goals like medical decisions for easier acceptance. The goal can be achieved by trade-off among mutually exclusive alternatives according to the specific contextual conditions.
In this research, we focused on mutually exclusive decision-making that requires explanations to show the path to the goals. To achieve this, we utilized a tree-based structure to address the problems. The proposed tree-based decision-making process involves choosing and pruning branches based on multi-dimensional contextual conditions. The goal may dynamically change depending on the chosen contextual conditions and can be achieved at the expense of the pruned branches in the tree-based structure. The achieved goal includes explanations of which branches are selected at each stage and why they chosen, considering the contextual variables. Therefore, the proposed Mutually Exclusive Learning (MEL) is based on tree-based structure to provide explanations for the uncertainty-aware problem.
This paper is organized as follows. In section 2, the related works and research motivation are reviewed. Section 3 addresses the learning structure and problem-solving process of MEL. In Section 4, the feature and conceptual variables for MEL are defined. Section 5 illustrates the scenario and verification of the proposed model. Finally, section 6 presents the conclusions and suggests further research related to MEL.
2. Literature Review
There are many machine learning algorithms such as NN, Reinforcement Learning (RL), Natural Language Processing (NLP) that are applied to develop AI models such as ChatGPT, Bard, LLaMA, AlphaGo, and Adaptive AI. One of the most popular AI models is Generative AI, which is developed using LLM and NLP in text-based conversational systems. It can generate tailored answers according to the user’s prompts.
For the development of more flawless AI, ChatGPT adopted RLHF, which can extract advanced outputs from rough drafts by fine-tuning based on human feedback. In addition, ChatGPT 4.0 supports multi-modality for providing advanced Generative AI. This is known as multi-modal AI, which interacts with users through conversational systems using a variety of converged modalities such as visual data (images, videos), audio (speech and sound), text, and other sensory inputs. As for other Generative AI, Google’s Bard was released in march 2023. Bard is based on LaMDA (Language Model for Dialogue Applications) using LLM and serves as an AI-based search engine service that supports online data surfing from the web. Meta also released LLaMA (Large Language Model Meta AI) in February 2023, which is developed for non-commercial use to assist expert studies, unlike ChatGPT, which is intended for public use. NAVER has released HyperCLOVA X, which is specialized at Korean compared to other Generative AIs developed by global AI companies. It has a deep understanding of Korean culture and law, as exemplified in the “NAVER knowledge iN” service. It also functions as a real-time search service based on a large amount of knowledge storage. Additionally, Korean companies have developed other models, such as Kakao’s KoGPT, SKT’s A. and LG’s EXAONE. However, the learning process of these AI models is based on black-box AI, so they do not provide any explanations for the goals they reach.
In ChatGPT, as a representative black-box AI, the learning process is composed of three step: preparing tasks, developing reward model, and applying RL-based on fine-tuning the LLM. First, preparing tasks involve training the model repeatedly using transformers with NLP and Supervised Learning (SL) to generate answers for any given prompt. Second, reward model is trained using RL-based on human preferences. It is derived from the trained language model using a ranked data set built from human feedback to evaluate the generated outputs from AI models. Finally, RL-based fine-tuning of LLM involves processes to achieve more reliable outputs, such as defining the action space, observation space, and the reward function. Generating qualified outcomes for given prompt is crucial. The observation space includes all possible input token sequences from the environment, while the RL algorithm operates on all tokens in the model’s vocabulary within the action space. The reward function combines the preference model and constraints for the AI-agent’s output. The reward predictor is tuned by human feedback within the reward function. The RLHF process involves the RL algorithm, the environment and the reward predictor [2].
AlphaGo, another type of black-box AI, was developed for the game of Go. To reach the goal, it is necessary to reduce the search space from the enormous number of possible cases. Effective prediction of board positions and moves is crucial [4]. To achieve this, AlphaGo adopted a combination of Deep Neural Networks (DNN) and tree-based search engine, integrating Monte-Carlo Tree Search (MCTS) with policy and value networks [4]. The DNN is trained using a novel combination of SL from human expert games and RL from games of self-play [4]. Tree-based MCTS engine is applied to reduce the search space through selected actions by lookahead search. Policy networks are developed for selection move positions, and value networks are used to evaluate board positions. The DNN is developed using a combination of SL and RL. SL is used to train the predictions of human expert moves in policy networks, while RL is used to improve the SL positions by self-play positions in value networks [4].
Adaptive AI, a recent advancement, focuses on dynamic customization of its decision-making algorithms without human feedback. This AI model employs self-evolving algorithms to interpret and integrate newly acquired data autonomously [10]. Adaptive AI has diverse applications, including business problem-solving, fraud detection in financial transactions, improving logistics operations, and identifying patterns in patient symptoms. Its computational model is continuously self-evolving without human intervention, making it suitable for volatile and dynamic environments.
Despite their capabilities, the illustrated learnings still have limitations such as generating distorted outcomes due to algorithmic bias, data bias, and system errors. These limitations can perpetuate harmful stereotypes through the continuous supply of incomplete information, leading to issues like hallucinations, fake news, and confirmation bias [3, 11]. There are ongoing discussions on addressing the problem of hallucinations. AI-generated content that is not based on truthful sources highlights the need for caution in over-relying on AI-generated results. In Generative AI, it is challenging to determine the accuracy of outcomes due to the inexplicable nature of the learning process [5]. If the results are related to mutually exclusive decision-making problems such as employment processes, medical decisions, wastewater discharge status, and business problems requiring choices, then they become difficult to utilize the derived outcomes without any explanations from black-box AI [2].
Recent AI research has focused on explaining achieved results and correcting inaccurate and unreliable outcomes. There is an increasing number of studies on Explainable Artificial Intelligence (XAI) programs [5, 11, 12], also known as white-box AI. Conversely, since black-box AI models do not provide any explanations, various issues arise from generated fake news, confirmation bias, and hallucinations [3]. Consequently, the demand for developing XAI has increased [5], particularly in fields related to critical decision-making problems such as medicine, finance, transportation, security, legal, military [5, 13, 14, 15]. Providing explanations for the decision-making process is essential to achieve these goals.
DARPA spent four years on the research progress of the XAI program from 2017 to 2021 [5]. Numerous technical approaches have been developed as part of DARPA's XAI program, focusing on extracting explanations from black-box AI models like deep learning or hybrid deep learning approaches [5]. Ultimately, the explanations would be provided to users who need to understand and interact with the decision-making process [5]. Furthermore, there is ongoing R&D on XAI in various organizations, such as Google, IBM, Watson’s openspace, Kyndi (Cognitive search platform) [16], AITRICS, and more.
As discussed, ChatGPT is applied to find incrementally reinforced solutions with human feedback at a particular moment. AlphaGo is based on tree-based rigid problem-solving to find the optimal solutions. Adaptive AI is suitable for business problems, acting as a dynamic decision-making tool using self-evolving algorithms with data. These AI models are based on black-box AI. Therefore, there are limitations in providing explanations for how they reach their goals, even through DARPA is working on developing XAI.
Specifically, this research focuses on providing explanations for the achieved goals in mutually exclusive decision-making. Tree-based MEL consists of the processes to reach the goal by pruning branches as alternatives and providing explanations with the pruning history while considering various conditions.
3. Mutually Exclusive Learning
3.1 Tree-based rigid learning vs. Graph-based evolving learning
Solved problems by AI learning might be roughly classified into graph-based structure and tree-based structure as in Figure 1. As black-box AI, reinforcement-driven decision-making has a graph-based evolving structure because it involves solving problems to extract enhanced solutions. The learning of AlphaGo is based on planning that uses a high-performance tree-based search engine, such as MCTS, conjunction with policy and value networks. In the problem, it is clear that there are optimal solutions under considerable conditions, even though it needs tremendous effort to find the optimal moves in the game of Go.
As shown in Figure 1, the decision-making process is classified into two types: reinforced decision and mutually exclusive decision. The proposed MEL (Mutually Exclusive Learning) is based on tree-based structure, which is applied to prune branches according to contextual conditions at a moment.
(Figure 1) Graph-based Structure vs. Tree-based Structure in AI Learning
3.1.1 based evolving AI learning
In Generative AI, the learning process is based on the graph structure to drive incrementally reinforced solutions. So, it is possible for the currently given solutions to be improved through interaction of training models, reward programs, etc. The graph-based problem solving process has a evolving architecture. For instance, adaptive AI supports dynamic refinements with fluid data of decision-makings through the changes in real-world environments. In the problems with graph-based evolving structure, it is possible to customize result and maximize reward at a particular situation.
As in Figure 2, in graph-based evolving AI learning, there are various paths to reach the goal between input as the prompt node and the output as the answer node. In the learning process, it is not important which path is chosen to reach the goal but rather which better goal is chosen. Therefore, it does not focus on explaining how the goal was reached.
(Figure 2) Graph-based Evolving Structure in AI Learning
3.1.2 based AI learning
As in mutually exclusive decision-making, for critical decision-making, it needs to guarantee finding an optimized solution by pruning branches as alternatives under considerable contextual conditions. As illustrated in Figure 3, to address mutually exclusive decision-making, tree-based structure for reasoning is adopted in this research. In mutually exclusive decision-making, there is no intersection among branches as paths to reach a goal. To achieve the goal, it is necessary to determine the choice of branches based on the tree-based structure under consideration of contextual conditions.
(Figure 3) Tree-based Structure in AI Learning
As in Figure 3, the bold lines are chosen paths to reach a goal and the dotted lines are pruned paths. For instance, if the reached goal is ‘1’, then the chosen nodes are ‘12,’ and ’121.’ The pruned nodes are ‘2,’ ’11,’ and ‘122.’ The chosen paths might be used to provide explanations with pruned contextual conditions. In the decision-making, choosing branches to be pruned from the tree-based structure can lead to an unrecoverable decision because of exclusive decision-making problem. So, it is important to provide why the nodes are chosen. To do this, it necessary to consider conditions for pruning the branches at a moment. It is possible to take an optimized result under consideration of contextual conditions such as time window as time interval, context as lookahead search depth of phase, and decision criteria. If they are deployed in an environment to make a mutually exclusive decision, then decision-making process becomes more intricate. The achieved goals can be mutually excluded at a particular moment.
3.2 White-box AI vs. Black-box AI
In MEL, the process is based on the sequential reduction of alternatives by pruning branches through the selectively chosen contextual conditions. Tree-based MEL can provide the pruning histories as explanations in white-box AI. On the other hand, RL is based on the improvement of reward to achieve the goal depending on human feedback, reward and so on, using graph-evolving structure as black-box AI as in Figure 4. There are limitations that can not provide any explanations how to reach the goal. As black-box learning, RL is based on probability for providing more reliable and accurate solution through the verification and cross-reference with human such as ChatGPT using LLM. RL forced to achieve better goals rather than the provision of explanation. So, the learning algorithm focuses on identifying a better goal regardless of how to reach the goal.
(Figure 4) Graph-based Structure in Black-box AI
In AlphaGo, learning is based on SL and RL. SL is adopted human expert knowledge for position moves. RL is adopted for value networks to find optimal position through evaluating board positions. AlphaGo is also developed as a kind of black-box AI, but it assumes that problems have optimal solutions even if there is enormous search space. As the rigid problem, the learning of goal seeking is based on tree-based structure with lookahead search. The AI-model is also conceptually based on the assumption of the existence of an optimal goal regardless of the path to reach the goal.
On the other side, mutually exclusive decision-making needs an explanation of why and how the decision is reached, as it involves sacrificing other alternatives. The explanations can be composed of the paths by deep-diving the breadth and depth of the tree through pruning the branches depending on considerable conditions as in white-box AI.
3.3 Conditions and Situations
3.3.1 conditions
The reasoning of tree-based MEL is constructed by the trace of chosen branches of the tree. The reasoning process is based on trade-off with the other choice according to the specified conditions. In MEL, contextual conditions are composed of time window to determine the temporal perspective, phases as lookahead search spaces at a particular contexts and decision criteria to prune branches. The decision-making process might be tailored by the chosen criteria depending on contextual environments [17]. As the contextual conditions to prune branches, time window and phases as a depth of sequence in process are considered to explore deep-dive context to reach an optimal goal. In addition, decision criteria are also proposed as a type of a constraint for pruning the branches in mutually exclusive decision-making. Depending on which conditions are chosen, the outcomes can be differently arison in decision-making process.
3.3.2 Multi-dimensional conditions
In MEL, to reach a goal, time window, phase and criteria are illustrated as multi-dimensional contextual conditions for mutually exclusive decision-making.
First, it requires to consider time window that refers to a specific period during which an event can occur or an activity can be performed. Time window might be considered as the range of time within which a particular goal must be reached. It could refer to the time frame within which a particular process or action is expected to happen. For instance, as a multi-dimensional condition, time window is a specified time boundary within which an event or action can occur, such as a range of dates. The dates can be composed of yesterday, today and tomorrow. Yesterday refers to the time before the action or process happens. Tomorrow refers to the time after a particular action or process that are expected to happen. Today refers to the current time when a particular action or process happens. The state of current is affected by the previously happened actions or processes, such as the state of current is affected by the state of the phase from yesterday.
Second, the phase is composed of actions or processes based on contextual conditions. Actions are selected in advance from all possibly considerable states in the considered phase, taking into account contextual conditions. The lookahead search levels of phase are the deep-dive depths of predicted processes or actions depending on contextual conditions. The actions or processes of the current phase can propagate to actions or processes of next phase. The goal can be different according to the multi-dimensional contextual conditions such as scope of time window and the lookahead search space of phases. For instance, four different situations are illustrated in Figure 5, each with a distinct and considerable scope of time window and considered levels of phase.
(Figure 5) Time window, Phase and Situation
Finally, the decision criteria are also used to prune the branches to determine the path to make a decision. By the chosen criteria, the constructed path to reach a goal can be used to provide explanation for the derived output from the mutually exclusive learning.
3.3.3 figured situation
In mutually exclusive decision-making, it is possible to reach the different goals depending on the particular situation. To reach a goal, it is necessary to consider converged contextual conditions, such as the situation S constructed by the configuration of multi-dimensional time window, phases and decision criteria. In MEL, the reached goal might change depending on which multi-dimensional contextual conditions are chosen as the configured conditions to construct the situation S as in Figure 5.
For instance, as shown in Table 1, the configured situation 1 is composed of time window at a particular moment t and the considered phase p. Situation 2 is composed of time window including the previous time step t-1, current time t and the lookahead time t+1 in the phase p. In situation 3, phase p and the lookahead state of the phase p+1 are considered at a particular time t. Situation 4 is composed of current time t and the lookahead time t+1 under consideration of phase p, the lookahead state of the phase p+1 and state of the phase after the next phase p+2.
(Table 1) The illustrations of Situations with Multi-Dimensional Time window and Phase
In MEL, to derive the goal, the situation is configured by the multi-dimensional conditions. In MEL, there is no the best or better output unlike ChatGPT, but there is the optimal outcome depending on the considered situation including multi-dimensional conditions at a particular moment.
3.4 Explanations
To solve the given problems, AI processes can be automatically activated for the decision-making. In AI learning, if it is guaranteed to reach accurate and reliable outputs, then it is not necessary to find an explanation of how to reach the goal. As was discussed, it has still limitations, such as adverse reaction stemming from the outputs of black-box AI, including hallucinations, confirmation bias, and spread of fake news, which highlight the need for explainable AI. To mitigate these limitations, it is necessary to focus and address on potential issues for providing explanation of the decision-making process in AI learning.
For the providing explanations, MEL is proposed by the assistant as multi-dimensional contextual conditions with chosen decision criteria to prune the branches of the tree to obtain the goal. The derived sequential process by pruning can provide explanation of how to extract the goal in decision-making. The explanation is based on the sequence of activated actions or processes according to the chosen situation. If the pruning branches as the action are taken from the configured situation with multi-dimensional contextual conditions, then the agent ahead the predicted next phase to find optimal policy. The history of the chosen processes can be used as an explanation of how to reach the goal.
In MEL, it is not possible for multiple actions to occur simultaneously in mutually exclusive environments. If an action is chosen for the selection of a branch, then the other branches will be excepted and pruned by the criteria. The sequence of decision-making and the considered multi-dimensional conditions used to provide explanation of how to reach the goal.
4. MEL-based Decision Making
In MEL with tree-based structure, the inference process for the explanation are composed of the sequence of the chosen branches, which is one of the possibly extracted episodes with the constructed decision paths depending on the configured situations composed of multi-dimensional contextual conditions.
Definition 1 (Multi-dimensional ContextualCondition) Multi-dimensional ContextualCondition MDCC is composed of ‘Phase,’ ‘TimeWindow,’ and ‘DecisionCriteria.’
MDCC=< P,TW,DC> (1)
MDCC is a finite set of multi-dimensional contextual conditions as in Equation (1).
Definition 2 (Phase) P is phase that is composed of actions. These are propagated by the results derived from the previous phase or multi-dimensional contextual conditions as in Equation (2).
P=pi | pi is the ith level of phase, i is on integer (2)
pi= <A>, where A is a set of actions of the ith phase with multi-dimension.
Definition 3 (TimeWindow) TimeWindow TW is specified by the time frame within which an event or action can occur, such as a range of time as in Equation (3).
TW = {twj | twj is the jth Time Window, j is an integer} (3)
twj =<T>, where T is a set of time with multi-dimension.
Definition 4 (DecisionCriteria) DecisionCriteria DC is used to prune branches for mutually exclusive decision-making. It is represented as in Equation (4).
DC={dck | dck is the kth Decision Criteria, k is an integer} (4)
dck =<C>, where C is a set of criteria with multi-dimension.
, where dck is the kth decision criteria used in exclusive decision-making process and comprised of a set of the used criteria to reach a goal.
Definition 5 (Action) Action A is a finite set of actions or processes propagated depending on contextual conditions to reach a goal. It is simply represented as in Equation (5).
A = {al | al is the lth action, l is integer} (5)
A is a set of actions that occur at a phase to reach the goal.
Definition 6 (Time) Time T is a finite set of time boundaries. It is represented as in Equation (6).
T = {t±m | t±m is time condition, m ≥ 0, m is an integer} (6)
, where t±m is time condition, ±m is a considerable state of time condition (previous, current, lookahead) to construct timeframe.
If m is 0, then it refers the current time. If m is 1, then it refers to the lookahead time. If m is -1, then it refers to the previous time of the current.
Definition 7 (Criteria) Criteria C is a finite set of the chosen criteria for MEL. It is represented as in Equation (7).
C = {c0 | c0 is the oth criteria, o ≥ 0, o is an integer} (7)
The criteria are used to prune the branches in MEL.
Definition 8 (Situation) Situation S is composed of phases and time. It is represented as in Equation (8).
S=sy | sy is the yth situation, y is integer} (8)
sy= < P,T >.
The phases are used to the paths to reach a goal in decision-making process. It is a kind of a map of decision processes from the ith phase to the jth phase including actions depending on the kth decision criteria as in Figure 6.
(Figure 6) Phase and DecisionCriteria
Definition 9 (Value function) Value function v* (sy) has the maximum value over all considered situations at the moment as in Equation (9).
v*(sy) = maxπvπ (sy), sy is the yth situation (9)
Definition 10 (Episode) Episode E is composed of a sequence of phases that include actions according to the multi-dimensional contextual conditions. It is a set of history of chosen phases for making an mutually exclusive decision as in Equation (10).
E = {eq | eq is the qth episode, q is integer} (10)
eq = <P>.
The episode is composed of the sequence of the chosen phases.
Definition 11 (Probability) Probability Prob is a phase transition probability matrix, it is represented as in Equation (11).
Prob(p | pcj) = probcjppj
probcjppj = prob [p(i + 1) = pj | cij = cj, pi = p] (11)
p is the ith phase and cij is criteria from the ith phase to the jth phase. So, pi + 1 is the jth phase pj as a next phase. The pj is determined by chosen branch cbcjp by pruning. Probability is for phase transition as in Figure 7.
(Figure 7) Probability of Phase Transition
Definition 12 (ChosenBranch) ChosenBranch CB is a finite set of the chosen branches by pruning as in Figure 8 and Equation (12).
(Figure 8) Chosen Branch by Pruning
CB = {cbcjp | p is the ith phase and cj is the jth criteria} (12)
In Figure 8, for instance, phase p is prunned by the criteria c1 . As the result, pahse pc1 is selected. pc1 is pruned by criteria c12 and phase pc12 is selected. pc12 is pruned by criteria c121 and pc121 is selected. So, the chosen branch CB is composed of cbc1p, cbc12pc1 and cbc121pc12 such as CB=cbc1p cbc12pc1, cbc121pc12.
Definition 13 (Pruning function) Pruning function uPrn, as a kind of the step function, is defined as in Equation (13).
\(\begin{align}u \operatorname{Prn} p_{i}=\left\{\begin{array}{l} 0, p_{i} \text { is pruned } \\ 1, p_{i} \text { is selected }, \end{array}\right. \\p_{i} \text { is} \text { the} \; i^{\text {th }} \text { phase}, \text { i} \text { is} \text { an} \text { integer}. \\u \operatorname{Prn}_{*}\left(s_{y}\right)=v_{*}\left(s_{y}\right)\end{align}\) (13)
As in Figure 8, the chosen branch CB is composed of p, pc1, pc12 and pc121 by the pruning function, the selected branches are as follows: uPrn(p) = 1, uPrn(pc1) = 1, uPrn(pc12) = 1, uPrn(pc121) = 1. On the other hand, the pruned branches are as follows: uPrn(pc2) = 0, uPrn(pc11) = 0, uPrn(pc122) = 0.
Definition 14 (Utility function) Utility function U as a predicted utility to reach a goal is composed of phases as alternatives of phase pi as in Equation (14).
U = Σcj Prob(p | pcj)pipcjpi, (14)
,where ∑cj)Prob(p | pcj)pi = 1, pi is the ith level of phase and cj is the jth criteria from the ith level of phase to the jth level of phase.
5. Illustration and Verification
5.1 Illustration
There are many illustrations as mutually exclusive problems that need exclusive decision-making. One of them is the employment process as in Figure 9. In hiring process, we can assume that there are two candidates who have different strength. To make a decision for hiring only one candidate, it is necessary to consider configured situation with multi-dimensional contextual conditions including time window, lookahead phases and criteria for pruning branches.
(Figure 9) Employment Process as the Mutually Exclusive Problem
To do this, it is necessary to consider the hiring environments for the company. One of the candidates has strength for the general computer tasks. Another is specialized on AI tasks. Only one employee can be hired. In the current environment, the vacancy is opened for the position of a computer specialist to process general computing tasks. However, in the near future, it is expected to hire employee who can handle AI tasks. It is also necessary to consider contexts such as the current conditions of manpower supply and demand in the company. The considerable contexts can be related to the company’s needs for either a person who handles general computer tasks or an AI specialist.
For the processing of MEL, Figure 9 shows decision tree that has two branches which are completely mutually exclusive because there is no intersection between them.
For the mutually exclusive decision-making, as multi-dimensional contextual conditions, it is important to determine time window such as ‘current(t),’ ‘previous(t-1)’ and ‘lookahead(t+1).’ In addition, it is necessary to consider phases with actions. Phases are composed of lookahead search depths of phases. As in Figure 9, phase p with levels of the tree is composed of the first and second phase such as pfirst and psecond . As search levels or depths of the tree, the first phase pfirst is composed of ‘CS’ and ‘AI’ such as pfirst = {pCS,pAI} as in Equation (2).
For instance, if it is possible to make a decision of who will be hired at the moment, then the inference through the first phase will be stopped. However, if it is difficult to make a decision, then it is necessary to consider more contextual conditions such as the lookahead of the next phase. So, it is important to determine to what search level of phase will be chosen in MEL. The actions of the second phase as lookahead phase will be expected to propagate from the first phase. The second phase psecond is composed of ‘GW,’ ‘SP,’ and ‘pc11’ such as psecond = {ppc11, pGW, pSP}. ‘GW’ refers to the ‘general workers’ and ‘SP’ refers to the ‘AI specialist.’
If the company has a demand plan to hire an AI specialist in the near future, then time T is composed of ‘current(t)’ and ‘lookahead(t+1)’ as multi-dimensional conditions at the first phase pfirst. The ‘current(t)’ refers to the current environment of the company and the ‘lookahead (t+1)’ refers to the future environment. As in Equation (3) and (6), T is comprised of twfirst for pfirst and twsecond for psecond . twfirst can be comprised of ‘current(t)’ and ‘lookahead(t+1)’ time dimension such as twfirst={t0 , t1}, where twfirst is time window of phase pfirst, ranging from current time t0 to the lookahead time t1 as in Equation (3) and (6).
It is possible to consider ‘current,’ and ‘future’ as the decision criteria such as C = {ccurrent, cfuture} as in Equation (7). If the company focuses on hiring workers for the future, then the criteria for the decision can be the ‘future’ for the demand plan. So, the considered both phases are pfirst = {pCS, pAI} and psecond = {ppc11, pGW, pSP}.
The selected phase is pAI in pfirst and pSP is selected in psecond as AI specialist with strength of job position in Figure 9. It takes into account the time window such as twfirst = {t0, t1} as ‘current’ and ‘lookahead’ time. For phase psecond , the time window is twsecond = {t0} with criteria ‘current’ at the moment. It means that it does not have to lookahead next phase at the moment. The derived result can differ when considering only time t0 versus when considering both time t0 and t1
As in Equation (8), the situation is composed of the considerable phase and time window at a particular moment. A set of situations such as S = {sfirst, ssecond, sy,…} can be derived as follows: sy = {{pAI, {t0, t1}, {pSP, t0}}, and more. Among them, situation v*(sy) is selected by the max value such as maxπvπ(sy)as in Equation (9). The selected v*(sy) is sfirst with criteria cfuture at pfirst and ccurrent at psecond.
As in Equation (10), episode E is a set of episodes that are comprised of phases extracted from each situation. Each episode is composed of a sequence of phases selected by each situation. The selected episode ey is composed of the phases of sy with phases pAI and pSP.
Each phase has transition probability as in Equation (11). For instance, as in Prob(pAI | pSP cfuture), probcfuturepAI, pSP is the phase transition probability from pfirst to psecond, with criteria cfuture from pAI to pSP. So, pSP of psecond as a next phase is determined by choosing branch function cbcfuturepforst as in Equation (12). The phase pCS is pruned by the time window twforst={t0, t1} and twsecond={t0}, the lookahead of the phase pfirst is psecond with criteria cfuture.
As in Equation (13), by the pruning function uPrn, the selected phases are as follows: uPrn(p) = 1, uPrn(pAI) = 1, uPrn(pSP) = 1. The pruned phases are as follows: uPrn(pCS) = 0, uPrn(pGW) = 0, uPrn(ppc11) = 0. The selected situation is sy such as uPrn*(sy) = 1.
The utility function U of the selected situation sy is composed of selected phases at each level of the inference tree as in Equation (14). For instance, pfirst is composed of phases pCS and pAI such as pfirst = {pCS, pAI} at pfirst. The utility function can be extracted from using the probabilities as follows: Prob(p | pCSccurrent)pfirstpCSpfirst Prob(p | pCScfuture)pfirstpCSpfirst, Prob(p | pAIccurrent)pfirstpAIpfirst and Prob(p | pAIcfuture)pfirstpAIpfirst.
psecond is composed of phases pGW, pSP and ppc11 such as psecond = {pGW, pSP, ppc11} at the second level of the employment process. The utilities of Prob(p | pGWccurrent)psecondpGWpsecond, Prob(p | pSPccurrent)psecondpSPpsecond, and Prob(p | ppc11ccurrent)psecondppc11psecond, are compared. If Prob(p | pSPccurrent)psecondpSPpsecond, has max value v*(sy), then pSP is selected as one of the phases. At each level, the summation of probabilities of all phases at each level is 1 such as ∑cjProb(p | pcj)pi = 1. The utility function to support the proposed mutually exclusive decision-making can be presented by U = Prob(pi) + (1 - Prob(pi)) as in Equation (14).
Finally, AI specialist is employed in the employment process. It provides an explanation of how to reach the goal by the chosen branch CB composed of cbcfuturepAI, cbccurrentpSP as in Equation (12). The sequence of phases used to provide the explanation why AI specialist is employed for the demand plan of the company with criteria ‘future’ at pfirst and ‘current’ at psecond .
5.2 Verification
In tree-based learning MEL, the goal can be reached by pruning branches under consideration of specific conditions such as proposed multi-dimensional contextual conditions at a particular moment.
In MEL, tree-based structure is applied to reduce the search space from the enormous number of alternative cases. MEL can reduce the search space rather than graph-evolving learning that considers all possible states in the entire environment to reach the better goal. Tree-based search engine is applied to reduce the search space through selected actions by lookahead search as in AlphaGo. Graph-based evolving AI learning is based on search space n×m matrics learning, but search space of tree-based learning is less than n×m matrics because it is based on the pruning of branches for mutually exclusive decision-making.
To derive a goal, it is important to determine which phases should be looked ahead in the tree. The search space will increase depending on the size of the lookahead search levels. If the size of level for diving into the search phase is increased, the search space to achieve the goal will exponentially expand.
To reduce the search space, it is necessary to prune branches by criteria. The criteria chosen for any search process are indeed closely related to the size of the search space. The effective determination of the criteria affects to the search effort, performance efficiency and the optimization of the reached goal. The learning for the selection of criteria can be trained from a variety of cases such as cumulative results and rewards with contextual conditions through MEL.
In MEL, tree-based structure is also applied to provide explanation of how to reach a goal. The sequential pruning processes under consideration of multi-dimensional contextual conditions provide the explanation of how to reach a goal such as white-box AI. On the other hand, graph-evolving learning RL can not provide any explanations for the reached goal such as black-box AI.
The effectiveness and efficiency of MEL are depending on the finite search spaces that are determined by multi-dimensional conditions. It can reduce the enormous search space. In addition, tree-based structure MEL provide explanations of how to reach goals, which are composed of sequence of pruning branches by criteria. Therefore, it is important to consider how to determine the lookahead search level for finding feasible or optimal explainable solutions effectively under multi-dimensional contextual conditions.
6. Conclusion
The proposed MEL focuses on solving the mutually exclusive problem in decision-making. In MEL, according to the given multi-dimensional contextual conditions, the optimized goal at a particular moment will be reached with explanations.
As briefly discussed, AlphaGo exemplifies a tree-rigid model that searches for optimal solutions. ChatGPT as graph-evolving model uses heuristic algorithms that can be refined through human feedback and reward. RL based on the reward hypothesis is grounded on fine-tuning as in ChatGPT. RL is training process by the trial and error through interacting with environments including contexts and receiving reward signals. Adaptive AI is emphasizing autonomous evolution. These differences highlight the varying approaches to AI.
These models focus on finding goals as optimal or enhanced solution with human-guided improvements. In ChatGPT, the goal is consistently reinforced by data and human reward because it is based on graph-evolving structure as a kind of black-box AI. On the other hand, AlphaGo is assumed that it has a optimal solution, even if it needs a significant amount of efforts to achieve the goal which is based on probabilistic facts.
In this research, mutually exclusive problems are focused on, which differ from these addresses by the previously discussed approaches. In everyday life, the problems we face often require making mutually exclusive decisions. MEL is proposed to solve the mutually exclusive problem in decision-making, which is tree-based structure. The goal can be reached by trade-off among mutually exclusive alternatives of the tree according to the specific contextual conditions. So, in MEL, decision-making is based on pruning branches under consideration of multi- dimensional contextual conditions to achieve the goal in tree-based structure. As white-box AI, the pruning process can provide explanation how to reach the goal.
In MEL, the decision-making process needs to prune mutually exclusive alternatives to reach a goal. To do this, multi-dimensional contextual conditions are considered, which are composed of the time window, phase including actions or processes, and decision criteria. For mutually exclusive decision-making, the multi-dimensional contextual conditions are composed of time window as the temporal perspective, the significant lookahead levels of phases and criteria to choose or prune alternatives.
As the multi-dimensional contextual conditions, time window is comprised of previous, current and lookahead time. For instance, t is current which is related to at the moment. t - i(i ≥ 0) refers to previous events, actions, or processes that have already occurred. t+i refers to lookahead events, actions, or processes that are yet to happen. The phase is composed differently depending on how many dimensions are included in the lookahead phase. Criteria are selected whenever it is necessary to prune the branches of the tree. The configuration of multi-dimensional contextual conditions is proposed as the configured situation. It shows the specifically finite environment to reach a goal in the mutually exclusive decision-making at a particular moment. Depending on the chosen situation, different goals might be reached through mutually exclusive decision-making.
MEL adopt the tree-based learning model for providing explanation for the derived goal with specific conditions. The goal depends on the policy of pruning branches, which can be dynamically changed by specific multi-dimensional contextual conditions at any particular moment. Through the pruning process, the explanation of how to reach a goal is represented by the chosen episode of selected branches. As one of the mutually exclusive problems, the illustrated employment process demonstrates how to reach the goal in mutually exclusive decision-making and provides an explanation by pruning branches using MEL.
Furthermore, it is necessary to implement and experiment with the proposed MEL for mutually exclusive decision-making and apply it to solve real-world problems. Through the experiment and interacting with environments, it is necessary to train the learning processes with reward signals such as multi-dimensional conditions and criteria to prune the branches. The learning process involves how to weight the criteria for pruning branches, which levels are considered as contextual conditions, how to optimally configure the situation, and other factors according to the specified environment at a particular moment. The tree-based approach makes it possible to provide explanations to reach the goals. Verification through experimentation is also needed to assess the effectiveness of the proposed MEL.
References
- Y. Wu, L. Wang, C. Kai, and M. Peng, "Dynamics-Based Location Prediction and Neural Network Fine-Tuning for Task Offloading in Vehicular Networks," KSII Transactions on Internet and Information Systems, Vol. 17, No. 12, pp. 3416-3435, 2023. https://dx.doi.org/10.3837/tiis.2023.12.011
- OpenAI, "GPT-4 Technical Report," Cornell University, 2023. https://doi.org/10.48550/arXiv.2303.08774
- L. Zhou, Y. Duan, and W. Wei, "Research on the Financial Data Fraud Detection of Chinese Listed Enterprises by Integrating Audit Opinions," KSII Transactions on Internet and Information Systems, Vol. 17, No. 12, pp. 3218-3241, 2023. https://dx.doi.org/10.3837/tiis.2023.12.001
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the Game of Go with Deep Neural Networks and Tree Search," Nature, Vol. 529, pp. 484-489, 2016. https://www.nature.com/articles/nature16961 https://doi.org/10.1038/nature16961
- D. Gunning, E. Vorm, J. Y. Wang, and M. Turek, "DARPA's explainable AI (XAI) program: A retrospective," Applied AI Letters, Vol. 2, No. 4, 2021. https://www.researchgate.net/publication/356781652_DARPA_%27s_explainable_AI_XAI_program_A_retrospective
- J. Wiles, and L. Perri, "Why Adaptive AI Should Matter to Your Business," Gartner, 2022. https://www.gartner.com/en/articles/why-adaptive-ai-should-matter-to-your-business
- S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso-Moral, R. Confalonieri, R. Guidotti, J. D. Ser, N. Diaz-Rodriguez, and F. Herrera, "Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence," Information Fusion, Vol. 99, pp. 1-52, 2023. https://doi.org/10.1016/j.inffus.2023.101805
- H. A. Simon, "Administrative Behavior: A Study of Decision-making Processes in Administrative Organizations," The Oxford Handbook of Classics in Public Policy and Administration, pp. 12-21, 2016. https://doi.org/10.1093/oxfordhb/9780199646135.013.22
- J. Simonsen, and H. A. Simon, "Administrative Behavior How organizations can be understood in terms of decision processes," Computer Science, Roskilde University, 1994. https://www.semanticscholar.org/paper/Herbert-A.-Simon%3A-Administrative-Behavior-how-Can-Simonsen-Simon/a78c493ee0e8c9dfe7bd0fbaa2ef0ca2c8aa4562
- Gartner, "Information Technology Glossary. Adaptive AI," Gartner Glossary, 2024. https://www.gartner.com/en/information-technology/glossary/adaptive-ai
- A. Bertrand, R. Belloum, J. R. Eagan, and W. Maxwell, "How Cognitive Biases Affect XAI-assisted Decision-making: A Systematic Review," In Proc. of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022. https://doi.org/10.1145/3514094.3534164
- V. Belle, and I. Papantonis, "Principles and Practice of Explainable Machine Learning," Frontiers in Big Data, Vol. 4, 2021. https://doi.org/10.3389/fdata.2021.688969
- H. J. Lee, "Mutually Exclusive Decision-Making Learning of AI based on Contexts," KSII The 18th Asia Pacific International Conference on Information Science and Technology(APIC-IST), 2023. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwiKpufe9NeGAxXIp1YBHed2DcoQFnoECBkQAQ&url=https%3A%2F%2Fapicist.org%2Fmedia%3Fkey%3Dsite%2Fapicist2023%2FProceedings_of_APIC-IST_2023.pdf&usg=AOvVaw3rvPVogs6y_HJK_Up-s93r&opi=89978449
- G. Campitelli,and F. Gobet, "Herbert Simon's Decision-Making Approach: Investigation of Cognitive Processes in Experts," Brunel University, 2010. https://journals.sagepub.com/doi/10.1037/a0021256
- V. Harish, F. Morgado, A. D. Stern, and S. Das, "Artificial Intelligence and Clinical Decision Making: The New Nature of Medical Uncertainty," Academic Medicine, Vol. 96, No. 1, pp. 31-36, 2021. https://dx.doi.org/10.1097/ACM.0000000000003707
- C. H. Choi, "KYNDI," Artificial Intelligence Times, 2021. https://www.aitimes.kr/news/articleView.html?idxno=20974
- Y. Li, "Knowledge Recommendation Based on Dual Channel Hypergraph Convolution," KSII Transactions on Internet and Information Systems, Vol. 17, No. 11, pp. 2903-2923, 2023. https://dx.doi.org/10.3837/tiis.2023.11.001