This project trains an agent for Scopone (10-card base) using CTDE + PPO with action-conditioned policy, compact observation, belief + IS-MCTS (optional), self-play league, and a GUI.
- Compact observation (hist-k + sets), seat/team embedding.
- Action-conditioned actor/critic; PPO with GAE, KL early-stop, cosine LR, entropy schedule.
- Belief module (particle filter) and IS-MCTS booster with determinisations.
- Self-play league with Elo and softmax sampling.
- Benchmark tools and evaluation utilities.
pip install -r requirements.txt
python trainers/train_ppo.py --iters 2000 --horizon 256 --compact --k-history 12 --seed 0 --ckpt checkpoints/ppo_ac.pth
Logs (if TensorBoard available):
tensorboard --logdir runs
python tools/benchmark_ac.py --games 100 --compact --k-history 12 --ckpt checkpoints/ppo_ac.pth \
--out-csv results.csv --out-json summary.json
python tools/benchmark_ac.py --mcts --sims 256 --dets 16 --games 50 --compact --k-history 12 \
--ckpt checkpoints/ppo_ac.pth --out-json summary_mcts.json
- Versus baseline heuristic:
python -c "from evaluation.eval import eval_vs_baseline; print(eval_vs_baseline(games=50))"
- League Elo update between last two checkpoints:
python -c "from evaluation.eval import league_eval_and_update; print(league_eval_and_update())"
Integrate the actor/critic into scopone_gui.py selecting checkpoint and optional IS-MCTS (work-in-progress).
Note: la codifica a storia completa legacy è stata rimossa. Usare l'osservazione compatta con --k-history.
- Observation:
--compact,--k-history - PPO: cosine LR, entropy schedule (linear), KL target, minibatch, multi-epoch
- IS-MCTS:
--mcts,--sims,--dets - Seeds:
--seed
See requirements.txt (torch, numpy, tqdm, gymnasium, pandas, openpyxl, scipy, tensorboard, numba optional).
environment.py— Gym env with compact obs and cachesobservation.py— encoders and features (compact + compatibilità 10823 fissa per legacy)models/— action-conditioned actor/critic encodersalgorithms/ppo_ac.py— PPO (CTDE-ready) with schedules and KL controlbelief/— particle filter for hidden handsalgorithms/is_mcts.py— IS-MCTS with determinisationsselfplay/league.py— checkpoint league and Elotrainers/train_ppo.py— trainer with self-play multi-seattools/benchmark_ac.py— benchmark CLI for ACevaluation/eval.py— evaluation helperstests/— unit tests
Use --seed in trainer/benchmark. If --seed < 0, a random non-negative seed is generated and printed. Checkpoints include run config.
- Training uses IS-MCTS optionally. Defaults are now neutral:
prior_smooth_eps=0.0,root_dirichlet_eps=0.0. Set them explicitly if you want smoothing or root noise. - Root temperature can be scheduled during rollout; pass
--mcts-root-tempto override.
- Environment runs on CPU by default; models use
SCOPONE_DEVICE(auto-selects CUDA if available unless overridden). - GradScaler uses the unified AMP API when on CUDA; falls back gracefully otherwise.