Back to Methods

UniER Model Frameworks

Top
Method Details

Model Frameworks by Generation Family

The models are grouped according to UniER's benchmark taxonomy: end-to-end ILER, two-stage ILER, full-path PLER, and step-by-step PLER.

End-to-end ILER

End-to-end ILER methods directly map diagnosed student state or unmastered concepts into exercise recommendations.

DRER

End-to-end ILER

DRER is a deep reinforcement learning framework for adaptive exercise recommendation. It uses Exercise Q-Networks to capture students' exercising states and histories, and optimizes multi-objective rewards.

State ModelingRepresents exercising states and historical interaction signals.
Exercise Q-NetworkScores candidate exercises through a reinforcement learning formulation.
Multi-objective RewardJointly considers review and exploration, smoothness, and engagement.
DRER model framework

AKTRec

End-to-end ILER

AKTRec adapts the AKT knowledge tracing model for exercise recommendation. AKT uses self-attentive encoders and a monotonic attention mechanism to model past exercise history; AKTRec then identifies unmastered knowledge concepts and recommends exercises covering these gaps.

Attentive KT EncoderModels the student's historical interactions with attention.
Knowledge Gap DiagnosisFinds knowledge concepts that remain insufficiently mastered.
Exercise RetrievalMaps weak concepts to corresponding exercises.

SimpleKTRec

End-to-end ILER

SimpleKTRec adapts SimpleKT, a lightweight attention-based knowledge tracing framework. It combines exercise embeddings with their associated knowledge concepts and uses dot-product attention to extract temporal knowledge states for recommendation.

Concept-aware EmbeddingRepresents exercises through their related knowledge concepts.
Temporal AttentionExtracts evolving knowledge states efficiently.
Gap-based RecommendationRecommends exercises linked to unmastered concepts.

MMER

End-to-end ILER

MMER is a meta multi-agent reinforcement learning framework for exercise recommendation. Each knowledge concept is treated as an agent, enabling cooperative and competitive interactions among concepts and adaptation to new student groups.

Concept AgentsModels each KC as an agent in a multi-agent setting.
Meta-trainingSupports quick adaptation with few-shot student data.
Recommendation PolicyCoordinates KC-level signals to select exercises.
MMER model framework

Two-stage ILER

Two-stage ILER methods separate candidate generation from filtering, re-ranking, or set-level optimization.

KCP-ER

Two-stage ILER

KCP-ER uses a Knowledge Concept Prediction Layer and an Exercise Set Filtering Layer. It predicts concept coverage and mastery, filters suitable exercises, and generates a diversified recommendation list.

KC PredictionPredicts coverage and mastery of knowledge concepts.
Exercise FilteringSelects exercises by difficulty and novelty constraints.
List GenerationUses simulated annealing to improve diversity.
KCP-ER model framework

KG4Ex

Two-stage ILER

KG4Ex is a two-stage exercise recommendation method with feature extraction and knowledge graph-based recommendation. It learns relationships among students, exercises, and knowledge concepts to produce explainable recommendations.

Feature ExtractionComputes mastery, future concept probability, and forgetting-related signals.
Knowledge GraphBuilds structured relations among students, exercises, and KCs.
Explainable RankingUses graph embeddings to recommend exercises with reasons.
KG4Ex model framework

MulOER

Two-stage ILER

MulOER-SAN adopts a two-layer multi-objective framework. The bottom layer traces mastery and predicts concept coverage with self-attention, while the top layer filters candidate exercises and optimizes diversity with smooth difficulty progression.

Self-attention LayerPredicts KC coverage and traces student mastery.
Candidate FilteringConstructs a candidate subset for recommendation.
Multi-objective SearchOptimizes diversity while controlling difficulty smoothness.
MulOER model framework

ER-TGA

Two-stage ILER

ER-TGA combines cognitive diagnosis with a tribal-alliance genetic algorithm. It identifies weak knowledge components, builds candidate exercises, and selects an exercise subset by minimizing multi-objective recommendation cost.

Cognitive DiagnosisIdentifies weak knowledge components and supports cold-start scenarios.
Candidate Exercise SetBuilds candidate exercises aligned with knowledge gaps.
TGA OptimizationBalances accuracy, novelty, diversity, proximity, coverage, quantity, and volatility.
ER-TGA model framework

NR4DER

Two-stage ILER

NR4DER contains an exercise filter module and a neural re-ranking module. It predicts mastery, builds suitable candidates, enhances inactive student representations, and re-ranks exercises for relevance and diversity.

Exercise FilterPredicts mastery and constructs difficulty-appropriate candidates.
Student EnhancerImproves representation for inactive students.
Neural Re-rankingIntegrates relevance and learning pattern diversity.
NR4DER model framework

Full-path PLER

Full-path PLER generates the complete learning path in one step, typically with encoder-decoder style generation.

SRC

Full-path PLER

SRC formulates learning path recommendation as a set-to-sequence task. A concept-aware encoder captures correlations among candidate learning concepts, and an attention-based decoder sequentially generates the learning path.

Concept-aware EncoderCaptures relations among candidate concepts.
Attention DecoderGenerates the full learning path sequence.
KT Auxiliary ModulePredicts mastery at each step to stabilize training.
SRC model framework

Step-by-step PLER

Step-by-step PLER treats learning path generation as sequential decision-making and updates decisions along the path.

AC

Step-by-step PLER

AC applies the classical Actor-Critic framework to step-by-step learning path recommendation. The critic estimates the value of the current student state, and the actor selects the next exercise to optimize learning outcomes.

ActorSelects the next exercise according to the current policy.
CriticEstimates expected value for the current student state.
TD UpdateUses temporal-difference learning to update policy and value estimates.

DQN

Step-by-step PLER

DQN adapts Deep Q-Network style reinforcement learning to step-by-step recommendation. The agent estimates action values for possible next exercises and updates them through Q-learning with experience replay.

State ObservationRepresents the current learning state before choosing the next action.
Q-value EstimationScores candidate next exercises as actions.
Experience ReplayStabilizes off-policy updates by replaying past transitions.

CSEAL

Step-by-step PLER

CSEAL models learning path recommendation as a Markov Decision Process. It combines knowledge tracing, cognitive navigation, and an actor-critic recommender to generate exercises sequentially.

Knowledge TracingEstimates evolving student mastery levels.
Cognitive NavigationSelects candidate items through prerequisite graph constraints.
Actor-Critic RecommenderChooses exercises while reducing search space.
CSEAL model framework

RLTutor

Step-by-step PLER

RLTutor models adaptive tutoring as a step-by-step sequential decision problem. An inner model estimates student knowledge state, and a PPO-based reinforcement learning agent optimizes the teaching strategy.

Inner ModelEstimates current student knowledge state with temporal memory dynamics.
Virtual StudentSupports learning policies with fewer real student interactions.
PPO AgentOptimizes the next exercise recommendation strategy.
RLTutor model framework

GEHRL

Step-by-step PLER

GEHRL uses hierarchical reinforcement learning for goal-oriented learning path recommendation. A high-level agent selects sub-goals, while a low-level agent recommends learning items to achieve each sub-goal.

High-level AgentSelects sub-goals aligned with the target objective.
Low-level AgentRecommends items sequentially for each sub-goal.
Graph Candidate SelectorConstrains actions to goal-related exercises.
GEHRL model framework

DLPR

Step-by-step PLER

DLPR addresses learning path recommendation as a dynamic sequential decision problem. It uses a Difficulty-driven Hierarchical Reinforcement Learning framework with an L-Agent and a P-Agent, together with knowledge state estimation.

L-AgentSelects learning items at the high level.
P-AgentChooses associated practice items at the low level.
DIMKT State EstimatorTracks evolving knowledge and difficulty signals.
DLPR model framework

PKSD

Step-by-step PLER

PKSD introduces privileged knowledge state distillation for reinforcement learning-based educational path recommendation. It uses simulator-derived knowledge state as privileged information during training and estimates latent knowledge state during inference.

Privileged State EncoderLearns from simulator-derived knowledge state during training.
Knowledge State AdapterEstimates latent state from regular exercise logs at inference time.
RL Path AgentGenerates personalized and structured learning paths.
PKSD model framework

KnowLP

Step-by-step PLER

KnowLP constructs personalized learning paths with step-by-step reinforcement learning guided by dual knowledge structure graphs. It uses specialized agents to address prerequisite, similarity, and difficulty factors.

Dual Knowledge GraphsCapture prerequisite and similarity relations among concepts.
Specialized AgentsSelect concepts and exercises according to prerequisite, similarity, and difficulty signals.
Knowledge TracingTracks mastery and difficulty to adapt the generated path.
KnowLP model framework