Project Title: BIFROST: A Modular Simulation Framework for Multi-Objective Scheduling of ML Pipelines on Heterogeneous Cloud Infrastructure Student: Marco Mehta Course: BSc Hons Computer Science with Year in Industry Abstract: A machine learning (ML) pipeline is a sequence of stages (data preprocessing, model training, evaluation) that run across a cluster of machines, often with different hardware. Deciding which stage runs on which machine is the scheduling problem tackled here. Comparing different scheduling strategies is hard: live cluster experiments are expensive and difficult to reproduce, while simulation is only useful if its inputs accurately reflect real hardware behaviour. This dissertation treats the problem as one of measurement before scheduling, with two complementary artefacts. A reproducible benchmarker profiles nine supervised-learning workloads across six Amazon Web Services (AWS) cloud server types, producing empirical distributions of execution time, startup latency, and estimated energy for each pipeline stage, measured over 30 isolated runs per (workload, hardware) pair. BIFROST (Benchmark-Informed Framework for Resource Oriented Scheduling Trade-offs), a modular discrete-event simulation framework, consumes these profiles to evaluate scheduling strategies against a configurable set of objectives. The evaluation uses five objectives: total elapsed time (makespan), energy consumption, aggregate deadline overrun (tardiness), plan churn across successive scheduling decisions (scheduling instability), and unevenness of work distribution across nodes (load imbalance). The simulator additionally models three runtime dynamics that a realistic cloud cluster exhibits and that make scheduling decisions harder: bursty job arrivals, automatic cluster resizing in response to workload, and intermittent task and node failures. The framework is exercised through a 1,680-run empirical study using four scheduling strategies of increasing sophistication: two objective-blind baselines (First-Come-First Served and Random), a heuristic that minimises makespan alone (Heterogeneous Earliest Finish-Time, HEFT), and a multi-objective evolutionary algorithm (Non-dominated Sorting Genetic Algorithm II, NSGA-II). These are compared across three principal scenarios under a pre-committed non-parametric statistical protocol. Profile data validity is established before any scheduling result is interpreted. The results fall into three qualitatively different regimes depending on how much spare capacity the cluster has relative to the workload it is being asked to run. When the cluster has more compute capacity than the workload needs, the four strategies produce clearly different scores on every objective, and NSGA-II wins on four of the five. When workload pressure on the cluster is moderate, the final objective scores measured after workload completion show no statistically significant difference between strategies, yet a per-decision audit reveals NSGA-II actively trading off energy against load imbalance at the moment each task is placed on a machine. Under severe contention, the scheduler is left with only one viable option on 94.3% of its decisions, because the hardware constraints eliminate all other trade-off candidates, structurally limiting how much strategies can differ on their final scores. The contribution is a validated, reusable harness for measurement and simulation in multi-objective ML pipeline scheduling research. It also provides empirical evidence that whether scheduling strategies can be told apart at all depends on how constrained the cluster is relative to the workload.