This test suite is used for benchmarking the bandwidth and latency of data transfers between the cpu and global memory. It focuses on benchmarking the transfer speeds of various APIs (XRT API and OpenCL/OCL) between the host and the FPGA board.
Below, you will find setup and execution instructions to reproduce my results.
Here's the content converted to markdown format:
The emulations were run on both CentOS 7.9.2009 and CentOS 8.4.2105. Their respective disk images (.iso) files can be found in the CentOS vault provided by Internet Initiative Japan. A link to the vault can be found here:
https://ftp.iij.ad.jp/pub/linux/centos-vault/
If you plan to install it locally, you will need to save the disk image and then flash it onto a bootable USB with at least 16 GB of storage. To flash it, I recommend using the software https://rufus.ie/.
Next, reboot your computer and load into BIOS. Here, you will be able to select to boot from the USB that you flashed. Do so and then follow the setup instructions on the screen to install CentOS. Here is a useful tutorial to help visualize the steps https://www.youtube.com/watch?v=4viTo4gulQk.
If you plan to run the tests locally, before the tests can be run, the environment needs to be set up. For the following tools, you want to keep the software version consistent. For my emulation and hardware tests, the version of Xilinx tools used was 2021.2.
Below includes the needed software:
-
Vivado and Vitis, for simpler installation, install the Linux Self-Extracting web installer or the Single-File installer, and then follow the instructions from the installer.
Vitis: https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vitis.html
The default installation of these files is in the
/opt/Xilinx/folder. If they are not placed there automatically, you will need to copy them over manually. -
Xilinx Runtime (XRT) and Platform Packages
For the target platform for emulation,
Alveo U200was used. However, any of the supported platforms found here https://xilinx.github.io/Vitis_Accel_Examples/2021.2/html/shells.html can be used.Xilinx Runtime (XRT) & Deployment Target Platform: https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/alveo.html
The above will provide you with 2 *.rpm files. Ensure that you have admin access and then run the command
sudo yum install *.rpmfor each .rpm file.The default installation of XRT is in the
/opt/xilinx/folder, and the target platform can be found under/opt/xilinx/platforms/. If they are not placed there automatically, you will need to copy them over manually. -
Ensure that Python >= 2.7.5 is installed by running
python --version.You will also need Python 3.6+ in order to run scripts used to parse python logs and to graph the data.
For CentOS 7, Python 2.7.5 is installed by default, however, that is not the case for CentOS 8. The latest Python version for CentOS 8 is Python 3.9.6, to install it, run
sudo yum install python39
If you plan to run the tests on an AWS EC2 F1 Instance, you will need:
-
Setup AWS & AWS Permissions
Assuming that you already have an AWS account, you will need to:
-
Setup AWS IAM Permissions for Creating FPGA Images Navigate to
IAM>Policies>Create policySelect Service
EC2, check the permissionsCreateFpgaImageandDescribeFpgaImages, and forResources, selectAll, then clickNext. Name the policy and then create it.Next, go to
UsersorUser groups, then add the permissions to your user or group that you belong to. -
Setup S3 Bucket for Compiled Target
*.awsxclbinFirst, install theAWS CLI. then,Set your credentials (found in your console.aws.amazon.com page), region (us-east-1) and output (json)
aws configureCreate an S3 bucket (choose a unique bucket name)
aws s3 mb s3://<bucket-name> --region us-east-1Create a temporary file used to create folders in the S3 bucket.
touch temp.txtChoose a Design Checkpoint (dcp) folder name.
aws s3 cp temp.txt s3://<bucket-name>/<dcp-folder-name>/Choose a logs folder name.
aws s3 cp temp.txt s3://<bucket-name>/<logs-folder-name>/
-
-
FPGA Developer AMI https://aws.amazon.com/marketplace/pp/prodview-gimv3gqbpe57k Subscribe to this software and continue to configuration.
From here, choose software version
1.12.2 (Jan 31, 2023)in order to use the 2021.2 version of the tools. Make sure that the region is set toUS East (N. Virginia)as that is the closest location that supports EC2 F1 instances.Next, continue to launch. In the
Choose Actionbox, selectLaunch through EC2. If you are planning to run emulations, choose thez1d.2xlarge, if planning to run on hardware/fpga, choose thef1.2xlargeinstance.In the
Configure storageblock, clickAdvanced, expand Volume 2 and changeDelete on terminationtoYes. You can now clickLaunch instance.From here, assuming AWS permissions are set up correctly, you will need to set up an
Elastic IPfor this EC2 instance. First, clickAllocate Elastic IP address, and then clickAllocate. Next, clickActionsand then select the newly created instance and its private IP address.Next, you want to edit your
Security Group's>Inbound rulesby creating a new rule forSSH, My IP, thenSave rulesto whitelist your IP to be allowed to access the instance.Now, to connect to the EC2 instance through SSH, perform the command
ssh -i "<path>/<to>/<private>/<key>" centos@<public ip address> -
Setup EC2 Instance
-
Patch Outdated CentOS Mirror List
sudo sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo sudo sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo sudo sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repoNOTE: you will need to copy the commands from https://serverfault.com/a/1161847 due to the way latex formats the caret.
Once updated, refresh the cache.
yum clean all ; yum makecache -
Setup Vitis
git clone https://github.com/aws/aws-fpga.git $AWS_FPGA_REPO_DIR cd $AWS_FPGA_REPO_DIR source vitis_setup.sh -
Setup XRT Replace release tag with your version found in the table here, the value is correct for version
2021.2: https://github.com/aws/aws-fpga/blob/master/Vitis/docs/XRT_installation_instructions.mdXRT_RELEASE_TAG=202120.2.12.427 cd aws-fpga source vitis_setup.sh cd $VITIS_DIR/Runtime export XRT_PATH="${VITIS_DIR}/Runtime/${XRT_RELEASE_TAG}" git clone http://www.github.com/Xilinx/XRT.git -b ${XRT_RELEASE_TAG} ${XRT_PATH} cd ${XRT_PATH} sudo ./src/runtime_src/tools/scripts/xrtdeps.sh cd build scl enable devtoolset-9 bash ./build.sh cd Release sudo yum reinstall xrt_*-aws.rpm -yIf the above fails, you may need to clear the cache again and run the above commands again.
yum clean all ; yum makecache
-
NOTE: you will need to source the following 2 scripts before you run compile or run anything.
source /home/centos/src/project_data/aws-fpga/vitis_setup.sh
source /home/centos/src/project_data/aws-fpga/vitis_runtime_setup.sh
The value for the PLATFORM is $AWS_PLATFORM
Now that the environment is set up, refer to EXECUTION for steps to run the tests.
Here's the content converted to Markdown format:
Before you can run the execution, you will need to clone the repository, in which you will find 3 folders local_emu, aws_emu, and aws_hw. Note that emulation will take a long time for larger data sizes.
If your python version is not standard, you will need to EXPORT python=path/to/python in order for the bash scripts to run python scripts.
The way these tests are set up is that in the ./src/template, there are 6 designs, each to test one of the data transfer actions from the above list. Each design performs the same vector add function. 2 identical functions of variable size will be added and the result is returned and then verified. The size of the data transferred / vectors are incremented by powers of 2, starting from the smallest possible value to the largest possible value.
There are 3 parameters/variables that are needed:
- Vector A: The first vector to be added.
- Vector B: The second vector to be added.
- Size: The size of the vectors and variable data size to let the FPGA know the size of the vectors for vector addition.
In a typical test run (./src/run_compile.sh), each design folder will be copied to ./src/$PROJECT where it will be compiled for the target platform and ran. The compilation stage is done asynchronously for optimisation and the execution is ran individually to record maximum performance. As the designs are compiled and ran, information regarding the run will be printed out, and everything will be logged in the logs found in ./logs, one for each design (6 logs per test run, minus the ones that are not supported or ran).
From here, in order to gather information (size of data transfer and elapsed time) from the test runs, run the script ./src/parse_logs.py which will create .csv files in ./logs_data with the needed information. To graph the .csv files, run ./src/graph_data.py, which will graph the transfer size on the x-axis and elapsed time on the y-axis in log10 timescale.
- Navigate to the
local_emufolder.cd local_emu - Change the PLATFORM variable in
src/run_compile.shto the target platform found in/opt/xilinx/platforms/(the folder name of the platform). Save and exit. - Change the data_sizes variable in
src/update_data_size.pydata size values you want. Each value represents the number of elements to in the vector to transfer. For example,data_sizes = [16, 32]will transfer 16 integers and in the next run transfer 32 integers in the next test run. Note that to calculate the total bytes sent, multiply the number of integers by 4. Save and exit. - Go into the
srcfolder and runrun_compile.sh.cd src; sh run_compile.sh
- Navigate to the
aws_emufolder.cd aws_emu - Change the data_sizes variable in
src/update_data_size.pydata size values you want. Each value represents the number of elements to in the vector to transfer. For example,data_sizes = [16, 32]will transfer 16 integers and in the next run transfer 32 integers in the next test run. Note that to calculate the total bytes sent, multiply the number of integers by 4. Save and exit. - Go into the
srcfolder and runrun_compile.sh.cd src; sh run_compile.sh
- Start on a
z1d.2xlargeinstance, and navigate to theaws_hwfolder.cd aws_hw - Change the data_sizes variable in
src/update_data_size.pydata size values you want. Each value represents the number of elements to in the vector to transfer. For example,data_sizes = [16, 32]will transfer 16 integers and in the next run transfer 32 integers in the next test run. Note that to calculate the total bytes sent, multiply the number of integers by 4. Save and exit. - Go into the
srcfolder and runrun_compile.sh. This step may take up to 3 hours.cd src; sh run_compile.sh - Create a backup of the newly compiled binary files. This will create a backup folder
.backupin theaws_hwfolder.sh backup.sh - Open
create_afi.shand edit theS3_BUCKET_NAME,S3_DCP, andS3_LOGSaccording to what you named the bucket and folders. Note that for this step, you must have setup AWS & Permissions according toSETUP). - Run
create_afi.shsh create_afi.sh - Save the folders in the
.backup/srccontaining the*.awsxclbinfiles locally. An easy method to set this up without further setup is by following this tutorial for SFTP: https://www.youtube.com/watch?v=o-dH2C_Nz-E - Wait until the Amazon FPGA Image (API) is available.
You can check the status of it by finding the AFI id
*_afi_id.txtfile in the.backup/src/<PROJECT>folder and then running:The status JSON should showaws ec2 describe-fpga-images --fpga-image-ids <AFI ID>when the AFI is ready.... "State": { "Code": "available" }, ...
- Once ready, create a AWS EC2 F1 instance, setup Vitis and XRT as shown in
SETUPand in the same manner as thez1dinstance was setup. - Now in the same manner as before, copy the project files from the backup folder back into the
srcdirectory of the newly cloned project. Thus, when you navigate toaws_hw/src, you will findocl_cpu_to_gmem, ocl_cpu_to_gmem_rw, .... - Now, execute the test on hardware by running
run_hw.shsh run_hw.sh
You will notice that after every test, the ./logs folder will become populated with log files for each project/design. These log files contain execution information of the type of data transfer, the size, and the time it took to complete. Below are the steps to parse them into a CSV format and graph them.
Note that running parse_logs.py and graph_data.py require Python 3.6+.
- Install the needed python library.
pip install plotnine - Navigate the the
srcfolder.cd ./src - Create the
logs_datafolder.mkdir logs_data - Parse the logs and generate CSV files in
log_data.python parse_logs.py - Edit Graph titles and output file names to your liking.
Edit lines:
and
labs(title=f'Benchmark {action} Speeds vs Data Sizes ({log_type})', x='Transfer Size (GB) (log scale)', y='Transfer Speed (GB/s)') +
plot.save(f'../logs_data/{log_type}_{action}_log.png', width=10, height=6, dpi=300)
- Graph the CSV files into PNGs in
log_data.python graph_data.py