Skip to content

mix chord算法复现问题 #425

@23557-l

Description

@23557-l

您好。我最近用您的框架复现qwen2.5-math-7b-base在luffy数据集的性能。
yaml如下:

project: "mix_chord"
name: "mix_chord_math_qwen2.5-math_luffy"
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,/r-contentsecurity/share/datas_yl/yanlong/trinity_ckpt}
algorithm:
  algorithm_type: mix_chord
  repeat_times: 8 # or 16 for better performance in math related tasks
  kl_loss_fn_args:
    kl_coef: 0.00
  sample_strategy_args:
    expert_data_ratio: 0.111
  policy_loss_fn_args: # feel free to change, we encourage you to try out different hyperparameters
    mu_warmup_steps: 0  # 0 for chord-mu and chord-phi
    mu_decay_steps: 0 # 200 for chord-mu and 0 for chord-phi
    mu_peak: 0.5 # 0.9 for chord-mu and 0.1 for chord-phi
    mu_valley: 0.1 # 0.05 for chord-mu and 0.1 for chord-phi
    enable_phi_function: true # false for chord-mu and true for chord-phi
    clip_range: 0.28
    sft_loss_agg_mode: "token-mean"
    use_dynamic_bsz: true
    ppo_mini_batch_size: 576 # 320 = 256 + 64; if you set repeat times = 16, then it shoudle be 32 * 16 + 64
    ppo_micro_batch_size_per_gpu: 8
    ngpus_trainer: 8
    train_batch_size_expert: 64
    train_batch_size_usual: 512 # 32 batchsize * 8 repeat times
model:
  model_path: ${oc.env:TRINITY_MODEL_PATH,/r-contentsecurity/share/datas_yl/yanlong/checkpoints/Qwen2.5-Math-7B}
  max_response_tokens: 3072
  max_model_len: 4096
cluster:
  node_num: 1
  gpu_per_node: 16
buffer:
  total_epochs: 1
  batch_size: 64
  train_batch_size: 576
  explorer_input:
    taskset:
      name: math_aligned
      storage_type: file
      path: ${oc.env:TRINITY_TASKSET_PATH, /mnt/nas/yanlong/data-important/ContinualRL/datasets/data/chord/luffy/rl}
      split: 'train'
      format:
        prompt_key: 'question'
        response_key: 'answer'
        system_prompt: "Your task is to follow a systematic, thorough reasoning process before providing the final solution. This involves analyzing, summarizing, exploring, reassessing, and refining your thought process through multiple iterations. Structure your response into two sections: Thought and Solution. In the Thought section, present your reasoning using the format: \"<think>\n {thoughts} </think>\n\". Each thought should include detailed analysis, brainstorming, verification, and refinement of ideas. After \"</think>\n,\" in the Solution section, provide the final, logical, and accurate answer, clearly derived from the exploration in the Thought section. If applicable, include the answer in \\boxed{} for closed-form results like multiple choices or mathematical solutions."
      rollout_args:
        temperature: 1.0
        logprobs: 0
      workflow_args:
        with_think: false
    eval_tasksets:
    - name: AIME2024
      storage_type: file
      path: ${oc.env:TRINITY_TASKSET_PATH, /mnt/nas/yanlong/datasets/aime_2024}  # e.g. path to AIME2024
      split: 'test'
      repeat_times: 8
      format:
        prompt_key: 'Problem'
        response_key: 'Answer'
        system_prompt: "Your task is to follow a systematic, thorough reasoning process before providing the final solution. This involves analyzing, summarizing, exploring, reassessing, and refining your thought process through multiple iterations. Structure your response into two sections: Thought and Solution. In the Thought section, present your reasoning using the format: \"<think>\n {thoughts} </think>\n\". Each thought should include detailed analysis, brainstorming, verification, and refinement of ideas. After \"</think>\n,\" in the Solution section, provide the final, logical, and accurate answer, clearly derived from the exploration in the Thought section. If applicable, include the answer in \\boxed{} for closed-form results like multiple choices or mathematical solutions."
      rollout_args:
        temperature: 1.0
        top_p: 0.7
    default_workflow_type: 'math_boxed_workflow'
  trainer_input:
    experience_buffer:
      name: math_buffer
      storage_type: queue
      path: 'sqlite:////mnt/nas/yanlong/data-important/cache/mix_chord_math_qwen2.5-math.db'
    auxiliary_buffers:
      sft_dataset:
        total_epochs: 1
        name: SFT_data
        storage_type: file
        schema_type: sft
        path: ${oc.env:TRINITY_SFT_DATASET_PATH, /mnt/nas/yanlong/data-important/ContinualRL/datasets/data/chord/luffy/sft}
        split: 'train'
        format:
          prompt_type: messages
          messages_key: 'messages'
explorer:
  eval_interval: 10
  runner_per_model: 8
  rollout_model:
    engine_num: 8
    tensor_parallel_size: 1
    enable_prefix_caching: true
    enforce_eager: false
    dtype: bfloat16
    seed: 42
synchronizer:
  sync_method: 'nccl'
  sync_interval: 1
  sync_timeout: 1200
trainer:
  save_interval: 50
  trainer_config:
    actor_rollout_ref:
      model:
        use_remove_padding: true
      actor:
        use_dynamic_bsz: true
        ppo_max_token_len_per_gpu: 25600
        ulysses_sequence_parallel_size: 2
        optim:
          lr: 1e-6 # or 5e-6, larger lr with warm up can result in better performance for SFT training.
      ref:
        log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
        log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
        ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size}
monitor:
  monitor_type: wandb

但是感觉效果不是很理想。training reward和aime eval accuracy如下:

Image Image

请问是哪里的设置有问题吗

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions