Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,20 @@ idmap.txt
*_test
*.out
*.pyc
.vscode
**/validation/**/*.cm
*.pdf
*.png
!example_validate.json
build/
*.bin
*.bin
*:Zone.Identifier
**/*:Zone.Identifier
htsim/sim/datacenter/experiments_output*/
htsim/sim/datacenter/output_metrics/
output_metrics/
paper_plots/
sim_progress.log
core_*
htsim/sim/datacenter/core_*
*.err
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,38 @@ Connections <M>

For more details, see [htsim/README.md](htsim/README.md).

## Spritz Integration

This branch integrates the Spritz source-routing load balancers from the `ad` branch of `https://github.com/aleskubicek/sc25-spritz` into the Dragonfly and SlimFly UEC binaries. With `-routing SOURCE`, use `-LB ECMP`, `OPS`, `FLICR`, `FLOW_V1` for Spritz-Scout, or `FLOW_V2` for Spritz-Spray. The Spritz artifact workloads, topology assets, batch scripts, and reproduction drivers are under [htsim/sim/datacenter](htsim/sim/datacenter).

To reproduce the compact Dragonfly comparison:

```bash
cd htsim/sim
cmake -S . -B build
cmake --build build --parallel
cd datacenter
python3 reproduce_spritz_subset.py
```

The resulting CSV is written to `experiments_output/spritz_subset/p4a8h4/permutation_global_4MiB/summary.csv`. See [htsim/README.md](htsim/README.md) for the full Spritz flag list and artifact script commands.

The full `ad` artifact pipeline is available via:

```bash
bash reproduce.sh quick
bash reproduce.sh full
```

Paper-style plots are written under `paper_plots/`. For example:

```bash
bash reproduce.sh plot fig6 quick
cd htsim/sim/datacenter
python3 simulate_df_no_fail.py --output-root experiments_output_quick --only-experiment adv_i5_4MiB --parallel 4
OUTPUT_ROOT=experiments_output_quick OUT_DIR=../../../paper_plots/quick/fig1 bash run_fig_1.sh
```


# References
If you use ATLAHS for your research, please cite our paper using:
Expand All @@ -111,4 +143,4 @@ If you use ATLAHS for your research, please cite our paper using:
primaryClass={cs.DC},
url={https://arxiv.org/abs/2505.08936},
}
```
```
96 changes: 90 additions & 6 deletions htsim/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,18 +69,15 @@ You can run a single network connection using UEC CMS as follows:
The output consists of two major parts: the configuration and setup section, and the runtime section.
The first part of the output shows the configured/derived/default parameter settings and values used by htsim.
It is important to verify that all the parameters are indeed accepted as expected when using custom configurations.
The section part of the output starts with the `Starting simulation` line and displays per flow information.
The runtime section starts with the `Starting simulation` line and ends with a packet summary. Per-flow completion records are written to `output_metrics/flowsInfo.csv`.

```
Starting simulation
Flow Uec_0_13 flowId 1000000001 uecSrc 0 starting at 0
Flow Uec_0_13 flowId 1000000001 uecSrc 0 finished at 176.2 total messages 1 total packets 490 RTS 0 total bytes 2002140 in_flight now 0 fair_inc 0 prop_inc 15978 fast_inc 519430 eta_inc 9321 multi_dec -0 quick_dec -0 nack_dec -0
.Done
New: 490 Rtx: 0 RTS: 0 Bounced: 0 ACKs: 124 NACKs: 0 Pulls: 0 sleek_pkts: 0
```

In this specific example, it displays the single flow's start and end information, including the flow completion time and details on the specific congestion control mechanism used.
The last line shows a summary of the run, starting with the total number of packets sent, retransmissions, control messages, ACKs, etc.
The last line shows a summary of the run, starting with the total number of packets sent, retransmissions, control messages, ACKs, etc. To restore verbose flow and trigger traces in stdout, set `HTSIM_TRACE_FLOW_COMPLETIONS=1` and/or `HTSIM_TRACE_TRIGGERS=1`.

To get more details, the `-debug` flag increases the output and shows more details on the active congestion control mechanism.

Expand Down Expand Up @@ -112,6 +109,41 @@ Supported routing strategies: `MINIMAL`, `VALIANT`, `UGAL_L`, `SOURCE`

For `SOURCE` routing, host-level routing tables are loaded automatically from the `host_table/` subdirectory within the topology path.

#### Spritz Source Routing

The Dragonfly and SlimFly binaries accept the Spritz artifact-style source load balancer flags when `-routing SOURCE` is selected:

- `-LB ECMP`: deterministic source-path hashing.
- `-LB OPS`: oblivious path selection; use `-flow-weight-scaling 0` for uniform OPS and a positive value such as `3` for latency-weighted OPS.
- `-LB FLICR`: Flicr-style adaptive source-path selection.
- `-LB FLOW_V1`: Spritz-Scout.
- `-LB FLOW_V2`: Spritz-Spray.

The Spritz knobs from the artifact scripts are also supported: `-flow-explore-threshold`, `-flow-ecn-threshold`, `-flow-weight-scaling`, `-flow-sort-insert`, `-flow-small-flows-bias`, `-flow-small-flows-threshold`, and `-flow-small-flows-weight`. The `-CC` aliases used by the Spritz artifact are accepted; `SMARTT_ECN` maps onto this tree's NSCC/SMaRTT-compatible congestion-control path.

Example Spritz-Spray run on the Dragonfly workload used by the paper scripts:

```bash
cd htsim/sim/datacenter
./htsim_uec_df \
-basepath ./topologies/dragonfly/p4a8h4 \
-tm ./experiments/df/p4a8h4/permutation_global_4MiB.cm \
-p 4 \
-routing SOURCE \
-CC SMARTT_ECN \
-LB FLOW_V2 \
-flow-explore-threshold 44 \
-flow-ecn-threshold 8 \
-flow-weight-scaling 3 \
-flow-sort-insert 1 \
-flow-small-flows-bias 1 \
-flow-small-flows-threshold 524288 \
-flow-small-flows-weight 100.0
```

Each run exports CSV metrics in `output_metrics/`, including `flowsInfo.csv`, `packetInfo.csv`, and `globalInfo.csv`.
By default, UEC flow completion and trigger traces are not printed to stdout; set `HTSIM_TRACE_FLOW_COMPLETIONS=1` or `HTSIM_TRACE_TRIGGERS=1` when debugging old log-parsing workflows.

To generate custom Dragonfly topology assets (`dragonfly.topo`, `dragonfly.adjlist`, `host_table/`) for any `(p,a,h)`, use:

```bash
Expand Down Expand Up @@ -180,6 +212,59 @@ Connections 1

The `datacenter/topologies` folder contains a set of examples.

## Reproducing Spritz Results

The Spritz artifact workloads and topology assets are ported from the `ad` branch of `https://github.com/aleskubicek/sc25-spritz`. They are available under `sim/datacenter/experiments/` and `sim/datacenter/topologies/`. To run a compact Dragonfly comparison over the paper's `permutation_global_4MiB.cm` workload:

```bash
cd htsim/sim
cmake -S . -B build
cmake --build build --parallel
cd datacenter
python3 reproduce_spritz_subset.py
```

The summary is written to:

```text
experiments_output/spritz_subset/p4a8h4/permutation_global_4MiB/summary.csv
```

The script covers `MINIMAL`, `VALIANT`, `UGAL_L`, `SOURCE+ECMP`, uniform/weighted `OPS`, `FLICR`, `SPRITZ_SCOUT`, and uniform/weighted `SPRITZ_SPRAY`. To run only selected algorithms, pass `--only`, for example:

```bash
python3 reproduce_spritz_subset.py --only ECMP FLICR SPRITZ_SCOUT SPRITZ_SPRAY_W
```

The original artifact-style batch scripts are also included in `sim/datacenter/`:

```bash
python3 simulate_df_no_fail.py
python3 simulate_sf_no_fail.py
python3 simulate_df_fail_2p.py
python3 simulate_sf_fail_2p.py
```

Those scripts run the broader Dragonfly/Slim Fly experiment matrix and save per-experiment summaries under `experiments_output/`.

For the full `ad` artifact flow, use the root-level reproduction wrapper:

```bash
bash reproduce.sh quick
bash reproduce.sh full
bash reproduce.sh plot fig6 quick
```

`quick` runs a reduced simulation set and generates paper-style plots under `paper_plots/quick/`; `full` runs the broader artifact matrix. `validate_results.py` checks the full output layout and quantitative claims.

To regenerate the Figure 1-style panels, run the Dragonfly 4 MiB adversarial case and then the parameterized plotting script:

```bash
cd sim/datacenter
python3 simulate_df_no_fail.py --output-root experiments_output_quick --only-experiment adv_i5_4MiB --parallel 4
OUTPUT_ROOT=experiments_output_quick OUT_DIR=../../../paper_plots/quick/fig1 bash run_fig_1.sh
```


## Default Parameters

Expand Down Expand Up @@ -220,4 +305,3 @@ cmake --build build --parallel # To build the project
cd build
ctest
```

4 changes: 3 additions & 1 deletion htsim/sim/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ set(SOURCE_FILES
hpccpacket.cpp
logfile.cpp
loggers.cpp
metric.cpp
data_collector.cpp
meter.cpp
mtcp.cpp
ndp.cpp
Expand Down Expand Up @@ -158,4 +160,4 @@ endif()
add_custom_target(
always_run ALL
COMMAND ${CMAKE_COMMAND} -E echo "ENABLE_TESTS=${ENABLE_TESTS}"
)
)
45 changes: 33 additions & 12 deletions htsim/sim/compositequeue.cpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
// -*- c-basic-offset: 4; indent-tabs-mode: nil -*-
#include "compositequeue.h"
#include <math.h>
#include <iostream>
#include <sstream>
#include "compositequeue.h"
#include <cstdlib>
#include <math.h>
#include <iostream>
#include <sstream>
#include "ecn.h"

static int global_queue_id=0;
Expand All @@ -24,11 +25,14 @@ CompositeQueue::CompositeQueue(linkspeed_bps bitrate, mem_b maxsize, EventList&
_num_pulls = 0;
_num_drops = 0;
_num_stripped = 0;
_num_bounced = 0;
_ecn_minthresh = maxsize*2; // don't set ECN by default
_ecn_maxthresh = maxsize*2; // don't set ECN by default

_return_to_sender = false;
_num_bounced = 0;
_ecn_minthresh = maxsize*2; // don't set ECN by default
_ecn_maxthresh = maxsize*2; // don't set ECN by default
_is_failing = false;
_fail_rate = 0;
_packet_count = 0;

_return_to_sender = false;

_queuesize_high = _queuesize_low = 0;
_queuesize_high_watermark = 0;
Expand Down Expand Up @@ -147,9 +151,26 @@ void CompositeQueue::doNextEvent() {
completeService();
}

void CompositeQueue::receivePacket(Packet& pkt)
{
if (_queue_id == DEBUG_QUEUE_ID)
void CompositeQueue::receivePacket(Packet& pkt)
{
_packet_count++;
if (_is_failing && pkt.type() == UECDATA && _packet_count > _fail_rate) {
_packet_count = 0;
pkt.free();
return;
}

static const char* trace_flow_env = std::getenv("HTSIM_TRACE_FLOW");
static int trace_flow_id = trace_flow_env ? atoi(trace_flow_env) : -1;

if (trace_flow_id >= 0 && pkt.size() > 64 &&
static_cast<int>(pkt.flow_id()) == trace_flow_id) {
cout << "I am at Queue " << _nodename << " receiving packet seqno " << pkt.id()
<< " size " << pkt.size() << " flowid " << pkt.flow_id() << " at time "
<< timeAsUs(eventlist().now()) << endl;
}

if (_queue_id == DEBUG_QUEUE_ID)
{
cout << timeAsUs(eventlist().now()) << " name " << _nodename << " arrive "
<< _queuesize_low * 8 / ((_bitrate / 1000000.0)) << " _queueid " << _queue_id << " switch " << _switch->getID()
Expand Down
33 changes: 21 additions & 12 deletions htsim/sim/compositequeue.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,19 @@ class CompositeQueue : public Queue {
_ecn_minthresh = ecn_thresh;
_ecn_maxthresh = ecn_thresh;
}
void set_ecn_thresholds(mem_b min_thresh, mem_b max_thresh) {
_ecn_minthresh = min_thresh;
_ecn_maxthresh = max_thresh;
if (_queue_id == 2)
cout << "queue_id " << _queue_id << " ecn_low " << _ecn_minthresh << " ecn_high " << _ecn_maxthresh << endl;
}

int _num_packets;
void set_ecn_thresholds(mem_b min_thresh, mem_b max_thresh) {
_ecn_minthresh = min_thresh;
_ecn_maxthresh = max_thresh;
if (_queue_id == 2)
cout << "queue_id " << _queue_id << " ecn_low " << _ecn_minthresh << " ecn_high " << _ecn_maxthresh << endl;
}

void set_fail_rate(int fail_rate) {
_is_failing = true;
_fail_rate = fail_rate;
}

int _num_packets;
int _num_headers; // only includes data packets stripped to headers, not acks or nacks
int _num_acks;
int _num_nacks;
Expand Down Expand Up @@ -84,9 +89,13 @@ class CompositeQueue : public Queue {

bool _return_to_sender;

int _queue_id;
CircularBuffer<Packet*> _enqueued_low;
CircularBuffer<Packet*> _enqueued_high;
};
int _queue_id;
CircularBuffer<Packet*> _enqueued_low;
CircularBuffer<Packet*> _enqueued_high;

bool _is_failing;
int _fail_rate;
int _packet_count;
};

#endif
Loading
Loading