diff --git a/docs/flagrelease_en/model_list.txt b/docs/flagrelease_en/model_list.txt index f6548b9..71cfd2d 100644 --- a/docs/flagrelease_en/model_list.txt +++ b/docs/flagrelease_en/model_list.txt @@ -70,6 +70,8 @@ FlagRelease/MiniMax-M2.7-iluvatar-FlagOS FlagRelease/MiniMax-M2.7-metax-FlagOS FlagRelease/MiniMax-M2.7-nvidia-FlagOS FlagRelease/MiniMax-M2.7-zhenwu-FlagOS +FlagRelease/MiniMax-M3-mthreads-FlagOS +FlagRelease/MiniMax-M3-nvidia-FlagOS FlagRelease/QwQ-32B-FlagOS-Cambricon FlagRelease/QwQ-32B-FlagOS-Iluvatar FlagRelease/QwQ-32B-FlagOS-Nvidia @@ -105,6 +107,8 @@ FlagRelease/Qwen3.5-35B-A3B-iluvatar-FlagOS FlagRelease/Qwen3.5-397B-A17B-metax-FlagOS FlagRelease/Qwen3.5-397B-A17B-nvidia-FlagOS FlagRelease/Qwen3.5-397B-A17B-zhenwu-FlagOS +FlagRelease/Qwen3.6-27B-hygon-FlagOS +FlagRelease/Qwen3.6-27B-metax-FlagOS FlagRelease/Qwen3.6-35B-A3B-nomtp-ascend-FlagOS FlagRelease/Qwen3.6-35B-A3B-nomtp-hygon-FlagOS FlagRelease/Qwen3.6-35B-A3B-nomtp-iluvatar-FlagOS diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-mthreads-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-mthreads-FlagOS.md new file mode 100644 index 0000000..a34a15b --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-mthreads-FlagOS.md @@ -0,0 +1,152 @@ +--- +base_model: +- "" +language: +- zh +- en +license: apache-2.0 +--- + +# Introduction +MiniMax M3, released on June 1st, is the first Chinese model to simultaneously deliver frontier coding/agentic capabilities, 1M ultra-long context, and native multimodality — and the only open-source model in the world with all three. The core innovation is a proprietary MSA sparse attention architecture: at 1M context, compute per token is just 1/20th of the previous generation, with 9× prefilling speedup and 15× decoding speedup. On SWE-Bench Pro, M3 scores 59.0%, surpassing GPT-5.5 and Gemini 3.1 Pro, and approaching Opus 4.7; on the multimodal benchmark OmniDocBench, it also outperforms Gemini 3.1 Pro. In real-world tests, M3 autonomously ran for nearly 12 hours to successfully reproduce an ICLR award-winning paper, and within ~24 hours pushed FP8 GEMM kernel utilization from 7.6% to 71.3% — a 9.4× speedup. + +### Integrated Deployment +- Out-of-the-box inference scripts with pre-configured hardware and software parameters +- Released **FlagOS-Mthreads** container image supporting deployment within minutes +### Consistency Validation +- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public. + + +# Evaluation Results +## Benchmark Result +| Metrics | MiniMax-M3-Nvidia-Origin | MiniMax-M3-Mthreads-FlagOS | +|--------------|-------------------------------|--------------------------------------| +| GPQA_Diamond | 0.8636 | 0.8182 | + +# User Guide +Environment Setup + +| Item | Version | +|------------------|----------------------| +| Docker Version | Docker version 27.5.1, build 9f9e405 | +| Operating System | 22.04.4 LTS (Jammy Jellyfish) | + +## Operation Steps + +### Download FlagOS Image +```bash +docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-minimaxm3-mthreads-tree_0.5.2-gems_5.0.2-sglang_0.5.11-plugin_01.0-cx_none-python_3.10.12-torch_2.9.0-pcp_musa4.3.5-gpu_mthreads001-arc_amd64-driver_3.3.6-server:202606121704 +``` + +### Download Open-source Model Weights +```bash +pip install modelscope +modelscope download --model FlagRelease/MiniMax-M3-mthreads-FlagOS --local_dir /data/MiniMax-M3 +``` + +### Start the Container +```bash +docker run -dit \ + --name flagos \ + --privileged \ + --ipc host \ + --network host \ + --shm-size 64g \ + --env MTHREADS_VISIBLE_DEVICES=all \ + -v /data:/data \ + harbor.baai.ac.cn/flagrelease-public/flagrelease-minimaxm3-mthreads-tree_0.5.2-gems_5.0.2-sglang_0.5.11-plugin_01.0-cx_none-python_3.10.12-torch_2.9.0-pcp_musa4.3.5-gpu_mthreads001-arc_amd64-driver_3.3.6-server:202606121704 \ + sleep infinity +``` +### Start the Server +```bash +export SGLANG_FL_FLAGOS_BLACKLIST=cumsum,index_put,nonzero,nonzero_numpy,sort,mm,topk,isin +export MUSA_LAUNCH_BLOCKING=1 +export MCCL_TIMEOUT=14400 +export TORCH_COMPILE_DISABLE=1 + +# in node1 +SGLANG_FL_DISPATCH_LOG=/tmp/flaggems_dispatch.log nohup python -m sglang.launch_server \ +--model-path /data/MiniMax-M3 \ +--tp-size 8 --pp-size 2 \ +--nnodes 2 --node-rank 0 \ +--dist-init-addr 10.1.15.176:29500 \ +--host 0.0.0.0 --port 30000 \ +--page-size 1 --disable-cuda-graph --disable-piecewise-cuda-graph \ +--trust-remote-code --watchdog-timeout 3600 --mem-fraction-static 0.75 --max-running-requests 1 \ +> minimax3.log 2>&1 & + +# in node2 +SGLANG_FL_DISPATCH_LOG=/tmp/flaggems_dispatch.log nohup python -m sglang.launch_server \ +--model-path /data/MiniMax-M3 \ +--tp-size 8 --pp-size 2 \ +--nnodes 2 --node-rank 1 \ +--dist-init-addr 10.1.15.176:29500 \ +--host 0.0.0.0 --port 30000 \ +--page-size 1 --disable-cuda-graph --disable-piecewise-cuda-graph \ +--trust-remote-code --watchdog-timeout 3600 --mem-fraction-static 0.75 --max-running-requests 1 \ +> minimax3.log 2>&1 & +``` + +## Service Invocation +### Invocation Script +```bash +curl http://localhost:30000/v1/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Minimax", + "prompt": "中国的首都是?", + "max_tokens": 32, + "temperature": 0 + }' +``` + + +### AnythingLLM Integration Guide + +#### 1. Download & Install + +- Visit the official site: https://anythingllm.com/ +- Choose the appropriate version for your OS (Windows/macOS/Linux) +- Follow the installation wizard to complete the setup + +#### 2. Configuration + +- Launch AnythingLLM +- Open settings (bottom left, fourth tab) +- Configure core LLM parameters +- Click "Save Settings" to apply changes + +#### 3. Model Interaction + +- After model loading is complete: +- Click **"New Conversation"** +- Enter your question (e.g., “Explain the basics of quantum computing”) +- Click the send button to get a response +# Technical Overview +**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application. +## FlagGems +FlagGems is a high-performance, generic operator libraryimplemented in [Triton](https://github.com/openai/triton) language. It is built on a collection of backend-neutralkernels that aims to accelerate LLM (Large-Language Models) training and inference across diverse hardware platforms. +## FlagTree +FlagTree is an open source, unified compiler for multipleAI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. Forupstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration. +## FlagScale and vllm-plugin-fl +Flagscale is a comprehensive toolkit designed to supportthe entire lifecycle of large models. It builds on the strengths of several prominent open-source projects, including [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [vLLM](https://github.com/vllm-project/vllm), to provide a robust, end-to-end solution for managing and scaling large models. +vllm-plugin-fl is a vLLM plugin built on the FlagOS unified multi-chip backend, to help flagscale support multi-chip on vllm framework. +## **FlagCX** +FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community. + +## **FlagEval Evaluation Framework** + FlagEval is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features: + - **Multi-dimensional Evaluation**: Supports 800+ modelevaluations across NLP, CV, Audio, and Multimodal fields,covering 20+ downstream tasks including language understanding and image-text generation. + - **Industry-Grade Use Cases**: Has completed horizonta1 evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation. + +# Contributing + +We warmly welcome global developers to join us: + +1. Submit Issues to report problems +2. Create Pull Requests to contribute code +3. Improve technical documentation +4. Expand hardware adaptation support +# License +The model weights are derived from MiniMaxAI/MiniMax-M3 and are open‑sourced under the Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt + diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-nvidia-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-nvidia-FlagOS.md new file mode 100644 index 0000000..3762ddd --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_MiniMax-M3-nvidia-FlagOS.md @@ -0,0 +1,143 @@ +--- +base_model: +- "" +language: +- zh +- en +license: apache-2.0 +--- + +# Introduction +MiniMax M3, released on June 1st, is the first Chinese model to simultaneously deliver frontier coding/agentic capabilities, 1M ultra-long context, and native multimodality — and the only open-source model in the world with all three. The core innovation is a proprietary MSA sparse attention architecture: at 1M context, compute per token is just 1/20th of the previous generation, with 9× prefilling speedup and 15× decoding speedup. On SWE-Bench Pro, M3 scores 59.0%, surpassing GPT-5.5 and Gemini 3.1 Pro, and approaching Opus 4.7; on the multimodal benchmark OmniDocBench, it also outperforms Gemini 3.1 Pro. In real-world tests, M3 autonomously ran for nearly 12 hours to successfully reproduce an ICLR award-winning paper, and within ~24 hours pushed FP8 GEMM kernel utilization from 7.6% to 71.3% — a 9.4× speedup. + +### Integrated Deployment +- Out-of-the-box inference scripts with pre-configured hardware and software parameters +- Released **FlagOS-Nvidia** container image supporting deployment within minutes +### Consistency Validation +- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public. + + +# Evaluation Results +## Benchmark Result +| Metrics | MiniMax-M3-Nvidia-Origin | MiniMax-M3-Nvidia-FlagOS | +|--------------|-------------------------------|-------------------------------------| +| GPQA_Diamond | 0.8636 | 0.8283 | + + +# User Guide +Environment Setup + +| Item | Version | +|------------------|----------------------| +| Docker Version | Docker version 24.0.0, build 98fdcd7 | +| Operating System | 22.04.4 LTS (Jammy Jellyfish) | + +## Operation Steps + +### Download FlagOS Image +```bash +docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-minimax-m3-nvidia-tree_none-gems_5.0.2-sglang_plugin_0.1.0-cx_none-python_3.12.3-torch_2.11.0-pcp_cuda13.2-gpu_nvidia003-arc_amd64-driver_570.158.01:202606051536 +``` + +### Download Open-source Model Weights +```bash +pip install modelscope +modelscope download --model FlagRelease/MiniMax-M3-nvidia-FlagOS --local_dir /data/MiniMax-M3 +``` + +### Start the Container +```bash +docker run -d --name flagos-m3 \ + --gpus all \ + --network host \ + --ipc host \ + --ulimit memlock=-1 \ + --ulimit stack=67108864 \ + -v /dev/shm:/dev/shm \ + -v /root/.cache:/root/.cache \ + -v /data:/data \ + harbor.baai.ac.cn/flagrelease-public/flagrelease-minimax-m3-nvidia-tree_none-gems_5.0.2-sglang_plugin_0.1.0-cx_none-python_3.12.3-torch_2.11.0-pcp_cuda13.2-gpu_nvidia003-arc_amd64-driver_570.158.01:202606051536 \ + sleep infinity +``` +### Start the Server +```bash +export FLASHINFER_DISABLE_VERSION_CHECK=1 +export USE_FLAGGEMS=1 +export SGLANG_FL_OOT_ENABLED=1 +export SGLANG_FL_PREFER=flagos + +python3 -m sglang.launch_server \ + --model-path /data/MiniMax-M3 \ + --tp 8 --trust-remote-code --port 30000 --host 0.0.0.0 \ + --dtype bfloat16 --quantization mxfp8 \ + --attention-backend flashinfer \ + --mem-fraction-static 0.80 \ + --max-total-tokens 414018 \ + --chunked-prefill-size 4096 \ + --max-prefill-tokens 16384 \ + --disable-custom-all-reduce +``` + +## Service Invocation +### Invocation Script +```bash +curl http://localhost:30000/v1/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Minimax", + "prompt": "中国的首都是?", + "max_tokens": 32, + "temperature": 0 + }' +``` + + +### AnythingLLM Integration Guide + +#### 1. Download & Install + +- Visit the official site: https://anythingllm.com/ +- Choose the appropriate version for your OS (Windows/macOS/Linux) +- Follow the installation wizard to complete the setup + +#### 2. Configuration + +- Launch AnythingLLM +- Open settings (bottom left, fourth tab) +- Configure core LLM parameters +- Click "Save Settings" to apply changes + +#### 3. Model Interaction + +- After model loading is complete: +- Click **"New Conversation"** +- Enter your question (e.g., “Explain the basics of quantum computing”) +- Click the send button to get a response +# Technical Overview +**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application. +## FlagGems +FlagGems is a high-performance, generic operator libraryimplemented in [Triton](https://github.com/openai/triton) language. It is built on a collection of backend-neutralkernels that aims to accelerate LLM (Large-Language Models) training and inference across diverse hardware platforms. +## FlagTree +FlagTree is an open source, unified compiler for multipleAI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. Forupstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration. +## FlagScale and vllm-plugin-fl +Flagscale is a comprehensive toolkit designed to supportthe entire lifecycle of large models. It builds on the strengths of several prominent open-source projects, including [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [vLLM](https://github.com/vllm-project/vllm), to provide a robust, end-to-end solution for managing and scaling large models. +vllm-plugin-fl is a vLLM plugin built on the FlagOS unified multi-chip backend, to help flagscale support multi-chip on vllm framework. +## **FlagCX** +FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community. + +## **FlagEval Evaluation Framework** + FlagEval is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features: + - **Multi-dimensional Evaluation**: Supports 800+ modelevaluations across NLP, CV, Audio, and Multimodal fields,covering 20+ downstream tasks including language understanding and image-text generation. + - **Industry-Grade Use Cases**: Has completed horizonta1 evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation. + +# Contributing + +We warmly welcome global developers to join us: + +1. Submit Issues to report problems +2. Create Pull Requests to contribute code +3. Improve technical documentation +4. Expand hardware adaptation support +# License +The model weights are derived from MiniMaxAI/MiniMax-M3 and are open‑sourced under the Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt + diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-hygon-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-hygon-FlagOS.md new file mode 100644 index 0000000..b8b68b0 --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-hygon-FlagOS.md @@ -0,0 +1,145 @@ +--- +base_model: +- "" +language: +- zh +- en +license: apache-2.0 +--- + +# Introduction +The first open-weight release of Qwen3.6 is now available. Building on the Qwen3.5 series released in February and shaped by direct community feedback, Qwen3.6 prioritizes stability and real-world utility to deliver a more intuitive, responsive, and productive coding experience. Key improvements include enhanced agentic coding capabilities for frontend workflows and repository-level reasoning, along with a new thinking preservation option that retains reasoning context from historical messages to streamline iterative development. + +### Integrated Deployment +- Out-of-the-box inference scripts with pre-configured hardware and software parameters +- Released **FlagOS-Hygon** container image supporting deployment within minutes +### Consistency Validation +- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public. + + +# Evaluation Results +## Benchmark Result +| Metrics | Qwen3.6-27B-Nvidia-Origin | Qwen3.6-27B-Hygon-FlagOS | +|--------------|---------------------------|--------------------------| +| GPQA_Diamond | 85.86 | 82.83 | +| ERQA | 59.25 | 50.5 | + +# User Guide +Environment Setup + +| Item | Version | +|------------------|----------------------| +| Docker Version | Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1 | +| Operating System | Ubuntu 22.04.4 LTS (Jammy Jellyfish) | + +## Operation Steps + +### Download FlagOS Image +```bash +docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-27b-hygon-tree_0.5.1_hcu3.6-gems_5.0.2-vllm_0.20.0_das.dtk2604-plugin_0.1.1_vllm0.13.0.g90e8c497e-cx_none-python_3.10.12-torch_2.10.0_das.opt1.dtk2604.20260325.g:202606101401 +``` + +### Download Open-source Model Weights +```bash +pip install modelscope +modelscope download --model FlagRelease/Qwen3.6-27B-hygon-FlagOS --local_dir /data/Qwen3.6-27B +``` + +### Start the Container +```bash +docker run \ + --name flagos \ + --network=host \ + --ipc=host \ + --device=/dev/kfd \ + --device=/dev/mkfd \ + --device=/dev/dri \ + -v /opt/hyhal:/opt/hyhal \ + -v /data:/data \ + --group-add video \ + --cap-add=SYS_PTRACE \ + --security-opt seccomp=unconfined \ + -itd \ + harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-27b-hygon-tree_0.5.1_hcu3.6-gems_5.0.2-vllm_0.20.0_das.dtk2604-plugin_0.1.1_vllm0.13.0.g90e8c497e-cx_none-python_3.10.12-torch_2.10.0_das.opt1.dtk2604.20260325.g:202606101401 \ + bash +docker exec -it flagos bash +``` +### Start the Server +```bash +export GEMS_VENDOR="hygon" +export VLLM_PLUGINS="fl" +export VLLM_FL_FLAGOS_WHITELIST="cos,cumsum,fill,full,gather,gt,le,lt,max,mul,sin,softmax,to,where,zeros,zeros_like" +vllm serve /data/Qwen3.6-27B \ + --port 8000 \ + --trust-remote-code \ + --served-model-name flagOS \ + --dtype bfloat16 \ + --tensor-parallel-size 2 \ + --gpu-memory-utilization 0.925 \ + --max-model-len 262144 \ + --reasoning-parser qwen3 \ + --no-enable-log-requests \ + --no-enable-prefix-caching +``` + +## Service Invocation +### Invocation Script +```bash +curl http://localhost:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "flagOS", + "messages": [{"role": "user", "content": "你好"}] + }' +``` + + +### AnythingLLM Integration Guide + +#### 1. Download & Install + +- Visit the official site: https://anythingllm.com/ +- Choose the appropriate version for your OS (Windows/macOS/Linux) +- Follow the installation wizard to complete the setup + +#### 2. Configuration + +- Launch AnythingLLM +- Open settings (bottom left, fourth tab) +- Configure core LLM parameters +- Click "Save Settings" to apply changes + +#### 3. Model Interaction + +- After model loading is complete: +- Click **"New Conversation"** +- Enter your question (e.g., “Explain the basics of quantum computing”) +- Click the send button to get a response +# Technical Overview +**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application. +## FlagGems +FlagGems is a high-performance, generic operator libraryimplemented in [Triton](https://github.com/openai/triton) language. It is built on a collection of backend-neutralkernels that aims to accelerate LLM (Large-Language Models) training and inference across diverse hardware platforms. +## FlagTree +FlagTree is an open source, unified compiler for multipleAI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. Forupstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration. +## FlagScale and vllm-plugin-fl +Flagscale is a comprehensive toolkit designed to supportthe entire lifecycle of large models. It builds on the strengths of several prominent open-source projects, including [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [vLLM](https://github.com/vllm-project/vllm), to provide a robust, end-to-end solution for managing and scaling large models. +vllm-plugin-fl is a vLLM plugin built on the FlagOS unified multi-chip backend, to help flagscale support multi-chip on vllm framework. +## **FlagCX** +FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community. + +## **FlagEval Evaluation Framework** + FlagEval is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features: + - **Multi-dimensional Evaluation**: Supports 800+ modelevaluations across NLP, CV, Audio, and Multimodal fields,covering 20+ downstream tasks including language understanding and image-text generation. + - **Industry-Grade Use Cases**: Has completed horizonta1 evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation. + +# Contributing + +We warmly welcome global developers to join us: + +1. Submit Issues to report problems +2. Create Pull Requests to contribute code +3. Improve technical documentation +4. Expand hardware adaptation support +# License +The model weights are derived from Qwen/Qwen3.6-27B and are open‑sourced under the Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt + diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-metax-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-metax-FlagOS.md new file mode 100644 index 0000000..407d2dd --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-27B-metax-FlagOS.md @@ -0,0 +1,142 @@ +--- +base_model: +- "" +language: +- zh +- en +license: apache-2.0 +--- + +# Introduction +The first open-weight release of Qwen3.6 is now available. Building on the Qwen3.5 series released in February and shaped by direct community feedback, Qwen3.6 prioritizes stability and real-world utility to deliver a more intuitive, responsive, and productive coding experience. Key improvements include enhanced agentic coding capabilities for frontend workflows and repository-level reasoning, along with a new thinking preservation option that retains reasoning context from historical messages to streamline iterative development. + + +### Integrated Deployment +- Out-of-the-box inference scripts with pre-configured hardware and software parameters +- Released **FlagOS-Metax** container image supporting deployment within minutes +### Consistency Validation +- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public. + + +# Evaluation Results +## Benchmark Result +| Metrics | Qwen3.6-27B-Nvidia-Origin | Qwen3.6-27B-Metax-FlagOS | +|--------------|---------------------------|--------------------------| +| GPQA_Diamond | 85.86 | 84.34 | +| ERQA | 59.25 | 57.5 | + +# User Guide +Environment Setup + +| Item | Version | +|------------------|----------------------| +| Docker Version | Docker version 27.5.1, build 27.5.1-0ubuntu3~22.04.2 | +| Operating System | Ubuntu 22.04.5 LTS (Jammy Jellyfish) | + +## Operation Steps + +### Download FlagOS Image +```bash +docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-27b-metax-tree_0.5.1_metax3.0-gems_5.0.2-vllm_0.13.0_empty-plugin_0.1.0_vllm0.13.0-cx_0.8.0-python_3.12.11-torch_2.8.0_metax3.3.0.2-pcp_maca3.3.0.15-gpu_metax001-arc_amd64-driver_3.8.1:202606090203 +``` + +### Download Open-source Model Weights +```bash +pip install modelscope +modelscope download --model FlagRelease/Qwen3.6-27B-metax-FlagOS --local_dir /data/Qwen3.6-27B +``` + +### Start the Container +```bash +docker run -itd \ + --name flagos \ + --privileged \ + --network=host \ + --security-opt seccomp=unconfined \ + --security-opt apparmor=unconfined \ + --shm-size '100gb' \ + --ulimit memlock=-1 \ + --group-add video \ + --device=/dev/dri \ + --device=/dev/mxcd \ + --device=/dev/mem \ + --device=/dev/infiniband \ + -v /usr/local/:/usr/local/ \ + -v /data/:/data/ \ + harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-27b-metax-tree_0.5.1_metax3.0-gems_5.0.2-vllm_0.13.0_empty-plugin_0.1.0_vllm0.13.0-cx_0.8.0-python_3.12.11-torch_2.8.0_metax3.3.0.2-pcp_maca3.3.0.15-gpu_metax001-arc_amd64-driver_3.8.1:202606090203 \ + /bin/bash + docker exec -it flagos /bin/bash + +``` +### Start the Server +```bash +FLAGGEMS_VENDOR=metax \ +CUDA_VISIBLE_DEVICES=0,1 \ +VLLM_FL_FLAGOS_WHITELIST=cat,cos,cumsum,fill,full,gather,gt,le,lt,max,mul,sin,softmax,to,where,zeros,zeros_like \ +vllm serve /data/Qwen3.6-27B \ + --tensor-parallel-size 2 --port 8000 --trust-remote-code --dtype bfloat16 \ + --served-model-name flagOS \ + --max-num-batched-tokens 65536 --max-num-seqs 256 --async-scheduling +``` + +## Service Invocation +### Invocation Script +```bash +curl http://localhost:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "flagOS", + "messages": [{"role": "user", "content": "你好"}] + }' +``` + + +### AnythingLLM Integration Guide + +#### 1. Download & Install + +- Visit the official site: https://anythingllm.com/ +- Choose the appropriate version for your OS (Windows/macOS/Linux) +- Follow the installation wizard to complete the setup + +#### 2. Configuration + +- Launch AnythingLLM +- Open settings (bottom left, fourth tab) +- Configure core LLM parameters +- Click "Save Settings" to apply changes + +#### 3. Model Interaction + +- After model loading is complete: +- Click **"New Conversation"** +- Enter your question (e.g., “Explain the basics of quantum computing”) +- Click the send button to get a response +# Technical Overview +**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \. This enables efficient and automated model migration across diverse chips, opening a new chapter for large model deployment and application. +## FlagGems +FlagGems is a high-performance, generic operator libraryimplemented in [Triton](https://github.com/openai/triton) language. It is built on a collection of backend-neutralkernels that aims to accelerate LLM (Large-Language Models) training and inference across diverse hardware platforms. +## FlagTree +FlagTree is an open source, unified compiler for multipleAI chips project dedicated to developing a diverse ecosystem of AI chip compilers and related tooling platforms, thereby fostering and strengthening the upstream and downstream Triton ecosystem. Currently in its initial phase, the project aims to maintain compatibility with existing adaptation solutions while unifying the codebase to rapidly implement single-repository multi-backend support. Forupstream model users, it provides unified compilation capabilities across multiple backends; for downstream chip manufacturers, it offers examples of Triton ecosystem integration. +## FlagScale and vllm-plugin-fl +Flagscale is a comprehensive toolkit designed to supportthe entire lifecycle of large models. It builds on the strengths of several prominent open-source projects, including [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [vLLM](https://github.com/vllm-project/vllm), to provide a robust, end-to-end solution for managing and scaling large models. +vllm-plugin-fl is a vLLM plugin built on the FlagOS unified multi-chip backend, to help flagscale support multi-chip on vllm framework. +## **FlagCX** +FlagCX is a scalable and adaptive cross-chip communication library. It serves as a platform where developers, researchers, and AI engineers can collaborate on various projects, contribute to the development of cutting-edge AI solutions, and share their work with the global community. + +## **FlagEval Evaluation Framework** + FlagEval is a comprehensive evaluation system and open platform for large models launched in 2023. It aims to establish scientific, fair, and open benchmarks, methodologies, and tools to help researchers assess model and training algorithm performance. It features: + - **Multi-dimensional Evaluation**: Supports 800+ modelevaluations across NLP, CV, Audio, and Multimodal fields,covering 20+ downstream tasks including language understanding and image-text generation. + - **Industry-Grade Use Cases**: Has completed horizonta1 evaluations of mainstream large models, providing authoritative benchmarks for chip-model performance validation. + +# Contributing + +We warmly welcome global developers to join us: + +1. Submit Issues to report problems +2. Create Pull Requests to contribute code +3. Improve technical documentation +4. Expand hardware adaptation support +# License +The model weights are derived from Qwen/Qwen3.6-27B and are open‑sourced under the Apache License 2.0: https://www.apache.org/licenses/LICENSE-2.0.txt + diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-35B-A3B-nomtp-metax-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-35B-A3B-nomtp-metax-FlagOS.md index de5e09b..16d48e5 100644 --- a/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-35B-A3B-nomtp-metax-FlagOS.md +++ b/docs/flagrelease_en/model_readmes/FlagRelease_Qwen3.6-35B-A3B-nomtp-metax-FlagOS.md @@ -1,7 +1,10 @@ --- -base_model: -- "" +license: apache-2.0 +language: +- zh +- en --- + # Introduction **Qwen3.6-35B-A3B** is a fully open-source sparse MoE model (35B total parameters / 3B active parameters) that excels at agentic coding, significantly outperforming its predecessor Qwen3.5-35B-A3B and holding its own against dense models such as Qwen3.5-27B and Gemma4-31B. Key features include: @@ -19,8 +22,8 @@ base_model: ## Benchmark Result |Metrics|Qwen3.6-35B-A3B-nomtp-Nvidia-Origin|Qwen3.6-35B-A3B-nomtp-Metax-FlagOS| |-------|---------------|---------------| -|GPQA_Diamond |0.8283 |0.8384| -|ERQA | 0.5875 | 0.55| +|GPQA_Diamond |0.8283 |0.8081| +|ERQA | 0.5875 | 0.555| # User Guide Environment Setup @@ -34,7 +37,7 @@ Environment Setup ### Download FlagOS Image ```bash -docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-metax-release-model_qwen3.6-35b-a3b-nomtp-tree_none-gems_4.2.0-vllm_0.13.0-cx_none-python_3.12.11-torch_musa-2.8.0-pcp_maca3.3.0.15-gpu_metax-arc_amd64-driver_2.2.9:202604152134 +docker pull harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-35b-a3b-nomtp-metax-tree_none-gems_4.2.0-vllm_0.13.0_empty-plugin_0.0.0-cx_0.8.0-python_3.12.11-torch_2.8.0_metax3.3.0.2-pcp_maca3.3.0.15-gpu_metax001-arc_amd64-driver_2.15.9:202606100608 ``` ### Download Open-source Model Weights @@ -46,30 +49,43 @@ modelscope download --model FlagRelease/Qwen3.6-35B-A3B-nomtp-metax-FlagOS --loc ### Start the Container ```bash #Container Startup -docker run -itd \ - --name flagos \ - --privileged \ - --network=host \ - --security-opt seccomp=unconfined \ - --security-opt apparmor=unconfined \ - --shm-size '100gb' \ - --ulimit memlock=-1 \ - --group-add video \ - --device=/dev/dri \ - --device=/dev/mxcd \ - --device=/dev/mem \ - --device=/dev/infiniband \ - -v /usr/local/:/usr/local/ \ - -v /data/:/data/ \ - harbor.baai.ac.cn/flagrelease-public/flagrelease-metax-release-model_qwen3.6-35b-a3b-nomtp-tree_none-gems_4.2.0-vllm_0.13.0-cx_none-python_3.12.11-torch_musa-2.8.0-pcp_maca3.3.0.15-gpu_metax-arc_amd64-driver_2.2.9:202604152134 \ - /bin/bash +docker run -itd + --name flagos + --privileged + --network=host + --security-opt seccomp=unconfined + --security-opt apparmor=unconfined + --shm-size '100gb' + --ulimit memlock=-1 + --group-add video + --device=/dev/dri + --device=/dev/mxcd + --p 8000:8000 + --env CUDA_VISIBLE_DEVICES=0,1 + --device=/dev/mem + --device=/dev/infiniband + -v /usr/local/:/usr/local/ + -v /data/:/data/ + harbor.baai.ac.cn/flagrelease-public/flagrelease-qwen3.6-35b-a3b-nomtp-metax-tree_none-gems_4.2.0-vllm_0.13.0_empty-plugin_0.0.0-cx_0.8.0-python_3.12.11-torch_2.8.0_metax3.3.0.2-pcp_maca3.3.0.15-gpu_metax001-arc_amd64-driver_2.15.9:202606100608 bin/bash docker exec -it flagos /bin/bash ``` ### Start the Server ```bash -USE_FLAGGEMS=1 vllm serve /data/Qwen3.6-35B-A3B-nomtp --tensor-parallel-size 2 --port 8000 --served-model-name qwen36 +export USE_FLAGGEMS=1 +export VLLM_PLUGINS=fl +export VLLM_FL_PLATFORM=maca +export CUDA_VISIBLE_DEVICES=0,1 +export VLLM_FL_PREFER=flagos +export VLLM_FL_SKIP_ATEN_OVERRIDE=1 +export VLLM_FL_NO_MCOP_MOESUM=1 +export VLLM_FL_MCOP_MOEALIGN=1 +export VLLM_FL_MOE_TUNED_CFG=1 +export MACA_PATH=/opt/maca +export LD_LIBRARY_PATH=/opt/maca/lib:/opt/maca/mxgpu_llvm/lib:/opt/maca/ompi/lib +export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True +vllm serve /data/Qwen3.6-35B-A3B-nomtp --served-model-name qwen36 --host 0.0.0.0 --port 8000 --trust-remote-code --max-model-len 73728 --gpu-memory-utilization 0.90 --tensor-parallel-size 2 --no-enable-prefix-caching --compilation-config '{"cudagraph_mode":"FULL"}' --max-num-batched-tokens 16384 --block-size 32 ``` ## Service Invocation