请问尝试使用过DPO/PPO的RL方式吗？ #15

Open

opened

on Feb 25, 2026

非常感谢你们出彩的工作！你们的思路对我目前很有启发，目前做的场景受限于算力。如果使用DPO构建偏好对，来做RL，是一种可行的探索吗？

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests