-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
您好。我最近用您的框架复现qwen2.5-math-7b-base在luffy数据集的性能。
yaml如下:
project: "mix_chord"
name: "mix_chord_math_qwen2.5-math_luffy"
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,/r-contentsecurity/share/datas_yl/yanlong/trinity_ckpt}
algorithm:
algorithm_type: mix_chord
repeat_times: 8 # or 16 for better performance in math related tasks
kl_loss_fn_args:
kl_coef: 0.00
sample_strategy_args:
expert_data_ratio: 0.111
policy_loss_fn_args: # feel free to change, we encourage you to try out different hyperparameters
mu_warmup_steps: 0 # 0 for chord-mu and chord-phi
mu_decay_steps: 0 # 200 for chord-mu and 0 for chord-phi
mu_peak: 0.5 # 0.9 for chord-mu and 0.1 for chord-phi
mu_valley: 0.1 # 0.05 for chord-mu and 0.1 for chord-phi
enable_phi_function: true # false for chord-mu and true for chord-phi
clip_range: 0.28
sft_loss_agg_mode: "token-mean"
use_dynamic_bsz: true
ppo_mini_batch_size: 576 # 320 = 256 + 64; if you set repeat times = 16, then it shoudle be 32 * 16 + 64
ppo_micro_batch_size_per_gpu: 8
ngpus_trainer: 8
train_batch_size_expert: 64
train_batch_size_usual: 512 # 32 batchsize * 8 repeat times
model:
model_path: ${oc.env:TRINITY_MODEL_PATH,/r-contentsecurity/share/datas_yl/yanlong/checkpoints/Qwen2.5-Math-7B}
max_response_tokens: 3072
max_model_len: 4096
cluster:
node_num: 1
gpu_per_node: 16
buffer:
total_epochs: 1
batch_size: 64
train_batch_size: 576
explorer_input:
taskset:
name: math_aligned
storage_type: file
path: ${oc.env:TRINITY_TASKSET_PATH, /mnt/nas/yanlong/data-important/ContinualRL/datasets/data/chord/luffy/rl}
split: 'train'
format:
prompt_key: 'question'
response_key: 'answer'
system_prompt: "Your task is to follow a systematic, thorough reasoning process before providing the final solution. This involves analyzing, summarizing, exploring, reassessing, and refining your thought process through multiple iterations. Structure your response into two sections: Thought and Solution. In the Thought section, present your reasoning using the format: \"<think>\n {thoughts} </think>\n\". Each thought should include detailed analysis, brainstorming, verification, and refinement of ideas. After \"</think>\n,\" in the Solution section, provide the final, logical, and accurate answer, clearly derived from the exploration in the Thought section. If applicable, include the answer in \\boxed{} for closed-form results like multiple choices or mathematical solutions."
rollout_args:
temperature: 1.0
logprobs: 0
workflow_args:
with_think: false
eval_tasksets:
- name: AIME2024
storage_type: file
path: ${oc.env:TRINITY_TASKSET_PATH, /mnt/nas/yanlong/datasets/aime_2024} # e.g. path to AIME2024
split: 'test'
repeat_times: 8
format:
prompt_key: 'Problem'
response_key: 'Answer'
system_prompt: "Your task is to follow a systematic, thorough reasoning process before providing the final solution. This involves analyzing, summarizing, exploring, reassessing, and refining your thought process through multiple iterations. Structure your response into two sections: Thought and Solution. In the Thought section, present your reasoning using the format: \"<think>\n {thoughts} </think>\n\". Each thought should include detailed analysis, brainstorming, verification, and refinement of ideas. After \"</think>\n,\" in the Solution section, provide the final, logical, and accurate answer, clearly derived from the exploration in the Thought section. If applicable, include the answer in \\boxed{} for closed-form results like multiple choices or mathematical solutions."
rollout_args:
temperature: 1.0
top_p: 0.7
default_workflow_type: 'math_boxed_workflow'
trainer_input:
experience_buffer:
name: math_buffer
storage_type: queue
path: 'sqlite:////mnt/nas/yanlong/data-important/cache/mix_chord_math_qwen2.5-math.db'
auxiliary_buffers:
sft_dataset:
total_epochs: 1
name: SFT_data
storage_type: file
schema_type: sft
path: ${oc.env:TRINITY_SFT_DATASET_PATH, /mnt/nas/yanlong/data-important/ContinualRL/datasets/data/chord/luffy/sft}
split: 'train'
format:
prompt_type: messages
messages_key: 'messages'
explorer:
eval_interval: 10
runner_per_model: 8
rollout_model:
engine_num: 8
tensor_parallel_size: 1
enable_prefix_caching: true
enforce_eager: false
dtype: bfloat16
seed: 42
synchronizer:
sync_method: 'nccl'
sync_interval: 1
sync_timeout: 1200
trainer:
save_interval: 50
trainer_config:
actor_rollout_ref:
model:
use_remove_padding: true
actor:
use_dynamic_bsz: true
ppo_max_token_len_per_gpu: 25600
ulysses_sequence_parallel_size: 2
optim:
lr: 1e-6 # or 5e-6, larger lr with warm up can result in better performance for SFT training.
ref:
log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size}
monitor:
monitor_type: wandb
但是感觉效果不是很理想。training reward和aime eval accuracy如下:
请问是哪里的设置有问题吗
Metadata
Metadata
Assignees
Labels
No labels