| The Generative ABM Experiment (1/3): From Concept to First Prototype |
| Credits: Concept, draft text, and cross validation by Peer-Olaf Siebers. Advice and copy-editing by Claude Sonnet 4.5 |
| Welcome! |
|
How do you actually integrate LLMs to make agents in social simulations mimic human-like decision-making behaviour? That is the practical challenge we are tackling in this new blog series.
Welcome to the Generative Agent-Based Modelling (GABM) experiment! In the next three blog posts I will discuss my attempt to implement my own GABM from scratch, based on the concepts described in my previous post. I will explain how I approached the tasks, what I learned from it, and what needs to change in future iterations. The planned blog posts are:
|
| Purpose |
| This first post focuses on the foundations. Here, I will describe the development of a first functional prototype. By prototype I mean an abstract and minimal version of the intended application. It must run and produce some kind of "meaningful" output. It is an exploratory tool that allows one to understand how things work and what I must pay attention to in the next development stages. |
| VIBE Coding |
| To build this quickly, I used an approach called VIBE coding. Instead of writing every line of code by hand, I described what I wanted in plain English to the LLMs I used for the conversation: Claude Sonnet 4.5 and DeepSeek-V3. These LLMs then generated the required Python code on my behalf. This is a fantastic way for rapid prototyping: you focus on the "what" and the "why", and the AI helps with the "how". The catch? You are likely to get bloated code and may not fully understand every line that is generated, but for a first prototype, the priority is to get something that works. |
| Using LLMs Locally |
|
For LLM-driven agent communication I deployed a local LLM using KoboldCPP 1.98.1. KoboldCPP is an easy-to-use, free and open-source text generation tool that allows users to run LLMs locally. Running a model locally keeps all conversations private, since nothing leaves the machine. It also allows for unlimited API usage free of charge. Execution is straightforward. After downloading the KoboldCPP executable (scroll down to "Assets" on the release page) it can be launched from the command line using its default settings. The only required input is the model file. Smaller models run faster, but this may reduce output quality. I say "may" because I have not tested the effect yet.
My "lab" setup was pretty modest: a desktop from 2018 with an Intel i5 processor, 2 GB of VRAM and 8 GB of RAM, running Windows 10 (x64). After trying out a few different models and different model sizes I decided to go for "qwen2.5-1.5b-instruct-q6_k.gguf" (1.5 GB) for my initial exploratory experiments. The model is available on Hugging Face. Which model works best for you depends, of course, on your computer specs. To run KoboldCPP in the terminal window we can use the command "koboldcpp --model qwen2.5-1.5b-instruct-q6_k.gguf --port 5001", assuming that the model is stored in the same folder as KoboldCPP. |
| Bringing Agents to Life Through LLM Dialogue |
| I started with a very simple Python script where for a few agents, each with their own personality, we could send questions to the local LLM and get personalised agent responses. With KoboldCPP running in one terminal window and my Python script in another, the stage was set. |
![]() |
| Click image to enlarge |
| Communication worked! This marked the first real milestone :). |
| The Experiment: An AI-Powered Epidemic Model |
|
With basic communication working, it was time for a real test. I chose a classic Agent-Based Model, the SIR model, which simulates how a disease like COVID-19 spreads through a population, with people transitioning between "susceptible", "infected", and "recovered" state.
The twist? Instead of agents following pre-defined rules for decision-making, I tasked the LLM with making decisions on their behalf. The scenario: "Should a healthy agent decide to self-isolate based on its personality and the number of infected people nearby?" Here is the pseudocode of the LLM-driven decision process:
|
Example interaction:
|
![]() |
| Click image to enlarge |
![]() |
| Click image to enlarge |
| The "It Looks Right, But Is It?" Problem |
|
Looking at it from a macro-level perspective, everything looked fine. However, looking carefully at the micro-level via the communication log, it turned out that many of the log entries contained some contradictory information about persona, decision, and reasoning. This brings us back to a typical LLM phenomenon: something looks right on the surface as it is presented in fluent natural language, but when we dig deeper we realise that it is not. The log also showed that quite a few decisions were based on the fallback rules used when communication and transmission errors occurred.
Here is an example where the response contains fundamental contradictions upon closer inspection.
Avoiding transmission errors requires fine-tuning (calibration) of the parameters controlling the LLM operations, such as the prompt and response length, the chosen LLM, and ensuring that we have sufficient computing power and memory. |
| Conclusions |
|
This first prototype achieved its main goal: we have a collection of agents independently communicating with an LLM for decision support and we have a working simulation model. We cannot expect a fully robust decision-support tool after just two weeks of evening work! What is good about this prototype is that it does not crash (it has a sophisticated error handling framework) and communication runs smoothly, as long as the number of concurrent requests is kept at an appropriate level (three with my machine). The prototype has proven to be very helpful for exploring how to build such GABMs in principle and where the pitfalls are. The logging system and the stats displayed at the end of a simulation run are very helpful when validating the model at both micro and macro levels.
The following sequence diagram shows how the system currently operates: |
![]() |
| Click image to enlarge |
The main shortcomings of the current prototype are:
|
In order to overcome the shortcomings, here are some suggestions for improvements
|
| Of course, such improvements would require more computing power and memory than the current solution. The following sequence diagram shows how such a system might operate: |
![]() |
| Click image to enlarge |
|
In the next post we will see how to develop context-rich prompts and what the impact is in terms of decision quality. It is not as straightforward as you might think!
The VIBE coded SIR prototype ABM Python sourcecode is available on GitHub. |
| Last but Not Least, a Warning |
|
We must maintain a healthy scepticism toward LLMs. Despite their capabilities, they are prone to high hallucination rates and are often disconcertingly good at presenting these fabrications with confidence. While solutions like Retrieval-Augmented Generation (RAG) show promise, the core issue remains: LLMs operate on probability, not true understanding.
Additionally, an often-overlooked aspect is that an LLM's responses are contextual to the entire conversation. If you are debugging code, a crucial best practice is to start a new chat once you have a fixed version. This ensures the model is no longer influenced by the "memory" of your earlier, buggy code, leading to cleaner and more accurate responses. This inherent unpredictability makes the use of good operational practices as well as rigorous verification and validation more important than ever. Do not assume an LLM will correctly handle even simple tasks. Always double-check the output. |
| Back to Top |
| From Rules to Reasoning: Engineering LLM-Powered Agent-Based Models |
| Credits: Concept, draft text, and cross validation by Peer-Olaf Siebers. Advice and copy-editing by Claude Sonnet 4.5 |
| Welcome! |
|
Agent-based simulations have traditionally relied on explicit rule-based logic: agents follow predetermined if-then statements to make decisions. But what happens when we replace these rigid rules with large language models (LLMs) that can reason in natural language about complex scenarios? This shift introduces powerful new capabilities: agents can handle nuanced situations, demonstrate emergent reasoning, and respond to contexts that weren't explicitly programmed. However, it also introduces new technical challenges. Unlike instant rule evaluation, LLM calls require network requests to external servers, taking hundreds of milliseconds per decision. When you have hundreds or thousands of agents, this creates bottlenecks that traditional ABM frameworks weren't designed to handle. This post explores three fundamental concepts for building LLM-powered agent simulations: concurrent processing, decision independence, and context management. Throughout, we'll use Python code snippets to illustrate these concepts in practice, showing how to avoid common pitfalls and design systems that are both realistic and scalable. |
| Use Case: Simulating Disease Spread |
| Imagine simulating a disease outbreak using an SIR (Susceptible-Infected-Recovered) model in a town with 500 residents. Each person needs to decide their daily actions—whether to go to work, stay home, or seek medical care—based on local infection rates, their health status, and personal circumstances. |
![]() |
| The Temporal Ordering Problem |
|
In real life, people make decisions simultaneously based on the same shared reality. To replicate this in our simulation, we must ensure decision independence—all agents observe the same world state when making their choices. If we were to take a purely sequential approach—where each individual makes a decision and immediately updates the world state—we would create an artificial causality chain. The first agent would act based on yesterday's world state, but by the time we reach agent 500, the simulation would have already applied 499 decisions. This introduces temporal ordering bias. In an SIR model, this means that if agent 1 becomes infected and its state is immediately updated, agent 2 now perceives a higher infection rate than agent 1 did, possibly leading it to take more cautious actions such as "stay_home". This artificial dependency on agent ordering distorts the simulation's realism and can lead to systematically biased outcomes that don't reflect how decisions would actually unfold. |
| The Two-Phase Mechanism Solution |
|
To remove this ordering bias, we use a two-phase update mechanism that separates decision-making from decision-application. All agents first decide what to do based on the same snapshot of the world state (Phase 1), and only after every decision is collected do we apply all of them simultaneously (Phase 2). This ensures that all decisions are based on a consistent world view, eliminating the artefact of sequential bias. The solution is a two-phase update cycle: |
![]() |
| This ensures that all decisions are based on a consistent world view, eliminating the artefact of sequential bias. |
| Implementing Concurrency for Rule-Based Agents |
For a traditional rule-based Agent-Based Model (ABM), a synchronous implementation of this two-phase approach looks like this:
In this design, every agent observes the same snapshot of infection rates, hospital capacity, and other environmental variables before any changes are applied. Only after every agent has made its decision do we update the system state.Note that whilst this implements the correct logical structure for concurrent decisions, the execution itself is sequential—we process one agent at a time. For fast, rule-based logic where decision-making happens in microseconds, this is perfectly adequate. However, once decision-making involves external calls—such as requests to an LLM API—we need true concurrent execution to avoid unacceptable wait times. |
| Implementing Concurrency for LLM-Driven Agents |
|
The same two-phase structure applies to LLM-driven agents, but the technical implementation must change. The challenge is no longer just logical correctness but managing I/O latency from potentially hundreds of concurrent API calls. Here we use asynchronous programming—a form of concurrency where a single thread can pause (or yield control) while waiting for external operations—such as network requests—to complete. This allows other tasks to continue during that waiting period. In Python, this is implemented with the "asyncio" framework. The following implementation uses "await asyncio.gather()" to run all decision tasks with true concurrent execution:
Here, "await" suspends execution of the coroutine until the awaited task completes, allowing other tasks to run in the meantime. "asyncio.gather()" runs all tasks concurrently and returns their results once all are complete. This structure ensures that all decisions are made "at the same simulated moment" whilst also fully utilising available I/O time to maximise throughput.
|
| Why Not Use Threads? |
|
It's important to distinguish between asynchronous I/O concurrency and multithreading. Threading runs tasks in parallel using multiple threads within a process, but in Python this introduces complexity and overhead. Since LLM API calls are I/O-bound rather than CPU-bound, multithreading provides little benefit and may even reduce performance due to Python's Global Interpreter Lock (GIL), which prevents true parallel execution of Python code. Asynchronous I/O, in contrast, is lightweight, avoids GIL contention, and scales efficiently across large numbers of agents. The program doesn't waste time waiting—when one request is pending, it immediately starts or continues processing other requests. Only CPU-intensive tasks, such as complex numerical computation or local model inference, justify using threads or multiprocessing. For LLM-driven agents that primarily issue network requests, asynchronous I/O concurrency is the optimal strategy. |
| Managing Server Load with Semaphores |
Firing 500 simultaneous requests at your LLM server will likely crash it with HTTP 503 errors (indicating the server is temporarily overloaded). A semaphore limits concurrent requests:
Now only 50 requests run simultaneously, whilst the others queue. This balances speed with reliability. Adjust the limit based on your server capacity and rate limits.If you're experiencing constant 503 errors, you can use this concurrent architecture but set "max_concurrent_requests=1". This is not the same as sequential execution. It still maintains the crucial separation between decision and application phases, eliminating temporal ordering bias, whilst preventing server overload. You can then gradually increase concurrency as your infrastructure allows. |
| Robust Error Handling |
Network requests fail. APIs have outages. Timeouts occur. Reliable LLM-driven simulations require robust error handling with retry logic:
This pattern ensures your simulation continues even when individual LLM calls fail, whilst attempting to recover from transient network issues through exponential backoff.
|
| Understanding Context Windows |
|
The context window represents an LLM's working memory, measured in tokens. For English text, one token typically represents between 0.5 and 1.3 words, depending on the tokeniser and text complexity (technical terms often require more tokens than common words). Typical context capacities range from 4,096 to 128,000 tokens. This limit encompasses both the input prompt and the generated response. Crucially, the context window operates on a per-request basis rather than per agent. Each time an agent calls the LLM (for example, using "await self.llm_client.generate(prompt)"), the model processes that specific request independently, produces a response, and then discards all information from that interaction. There is no persistent memory between calls—each request begins with a completely blank state. |
| Designing Self-Contained Prompts |
Since each API call is independent, your prompts must be self-contained. Every decision requires a complete description of the agent's situation:
For truly independent decisions, include only current state—no conversation history. This keeps context requirements minimal and ensures agents don't influence each other's reasoning. In our SIR model, each agent sees only their own status and the current infection rates, not the decisions other agents are making.Note the explicit format instruction at the end—this makes parsing the LLM's response more reliable and handles one of the practical challenges of LLM integration. |
| When Context Windows Become Critical |
|
The design choice between stateless and stateful agents has significant implications for context management: stateless agents, recommended for most simulations, make independent decisions that require only a single prompt within the context window, making them simpler and more scalable, whereas stateful agents, used in complex social simulations, accumulate memory across timesteps, requiring both the current prompt and conversation history to fit within the context window, which introduces the risk of forgetting once the token limit is exceeded and necessitates careful memory management strategies. Such memory management strategies include truncating recent interactions, summarising older history, retaining emotionally or causally significant events, or using vector databases for semantic memory retrieval. For stateful agents, a memory management strategy could look like this:
|
| Controlling Randomness in LLM Responses |
|
Unlike rule-based agents that produce identical outputs for identical inputs when using the same random seed, LLMs introduce stochasticity. The same prompt can yield different responses due to the model's temperature parameter, which controls randomness. For reproducible simulations:
For exploratory simulations where you want to capture the range of possible agent behaviours, use moderate temperature values (0.3-0.7) and run multiple simulation replications with different random seeds.
|
| Validation and Calibration |
How do you verify that LLM agent behaviour aligns with domain expectations? Unlike rule-based models where logic is transparent, LLM reasoning is opaque. Implement these validation practices:
|
| Practical Guidelines |
The following guidelines summarise key best practices for developing robust and scalable LLM-driven agent-based models:
|
| Conclusion |
|
Building LLM-powered agent simulations requires careful consideration of concurrency, context, and control flow. By processing decisions concurrently with proper rate limiting, designing self-contained prompts with robust error handling, and systematically validating agent behaviour, you can create realistic, scalable simulations that avoid common pitfalls. The transition from rule-based to LLM-driven agents isn't just a technical upgrade—it's a paradigm shift that enables simulations of unprecedented behavioural complexity. However, this power comes with responsibilities: managing costs, ensuring reproducibility, and validating that emergent behaviours reflect genuine insights rather than prompt engineering artefacts. The future of agent-based simulation is "conversational" - let's build it together. |
| Back to Top |
| Publication: A Novel Multi-Agent Reinforcement Learning System for Trading Strategies |
| Credits: Written by Peer-Olaf Siebers |
|
The paper titled "StockMarl: A Novel Multi-Agent Reinforcement Learning System To Dynamically Improve Trading Strategies", authored by Peiyan Zou and Peer-Olaf Siebers, has been presented at the 37th European Modeling & Simulation Symposium (EMSS 2025), which is part of the I3M Conference. The paper deals with the development of StockMARL, an innovative simulation platform that integrates multi-agent modelling with deep reinforcement learning to create adaptive trading strategies. The system enables learning agents to observe and interact with diverse rule-based traders, allowing them to develop resilient and interpretable strategies within dynamic, behaviourally rich market environments. |
![]() |
| Click image to enlarge |
| The paper is based on the BSc dissertation of Peiyan and is available here. The presentation, given at the conference is available on YouTube. The slides are available here. |
| Back to Top |
| PROJECT UPDATE: Streamlining Simulation Modelling with Generative AI |
| Credits: Drafted by Peer-Olaf Siebers; turbocharged and summarised by ChatGPT-5. |
|
Simulation modelling is a powerful tool for exploring complex systems, particularly in Operations Research and Social Simulation. Agent-based modelling allows researchers to capture human decision-making and social dynamics, but its reliance on extensive manual coding creates a significant bottleneck. A University of Nottingham summer internship project, led by Sener Topaloglu and supervised by Peer-Olaf Siebers, investigated how Generative AI can help overcome this barrier. By leveraging Large Language Models (LLMs), the project explored automating the translation of natural language descriptions into GAML scripts for the Gama simulation platform. The approach used prompt engineering and reusable design patterns, aligned with the Engineering Agent-Based Social Simulation (EABSS) framework, to streamline the scripting process and enable model reusability. The feasibility study demonstrated that open-source models like Mistral and Llama can generate useful code. Smaller-scale fine-tuning proved effective, though larger datasets introduced hallucinations. The research also highlighted challenges, including resource limitations, context loss during crashes, and syntactic or logical errors in generated scripts. Despite these hurdles, the project showed that Generative AI can significantly reduce coding effort in simulation modelling. Since the internship has been completed in August 2024 there has been some activities with this project. Sener continued the research in his spare time, focussing on improving reliability, testing the latest LLMs, streamlining scripts, and developing a fully automated pipelines from conceptualisation to implementation. Since the internship concluded in August 2024, further progress has been made on this project. Sener has continued the research in his spare time, concentrating on improving model reliability, experimenting with the latest LLMs, streamlining the EABSS script, and building a fully automated pipeline that connect conceptual design and implementation. Detailed reports documenting the project and its extensions are available here:
|
| Back to Top |
| LLM4ABM Discussion @ The Ethics of LLM-Augmented ABM |
| Credits: Content co-created by the LLM4ABM SIG members. Outlined by Peer-Olaf Siebers. Copy-edited by Claude Sonnet 4. |
| Welcome! |
|
The integration of Large Language Models (LLMs) into Agent-Based Modelling (ABM) is moving faster than our ability to fully grasp its ethical consequences. Researchers are already experimenting with LLM-augmented workflows, yet the community lacks a shared framework for thinking about the risks and responsibilities involved. This makes it urgent to pause, reflect, and start shaping collective guidelines before questionable practices become entrenched. What follows is a glimpse into a lively, and at times chaotic, discussion from a recent LLM4ABM SIG meeting. The conversation moved in many directions, but with Claude's help it has been distilled into clear themes that reveal both the promise and the risks of LLM-augmented ABM. |
![]() |
| Image by Copilot (08/2025) |
| Why Do We Need an LLM4ABM Ethics Framework? |
| ABM has always involved ethical considerations, from how we represent human behaviour to whose voices we include in our models. But the integration of LLMs into ABM research introduces what one ethicist calls "the seduction of the frictionless". This seduction is dangerous. When we use LLMs to simulate stakeholder perspectives, we eliminate the messy, uncomfortable negotiations that define real human relationships. Unlike actual humans who push back, argue, and disagree, LLMs always comply. They will happily play any stakeholder role we assign them, creating an illusion of participatory modelling while actually silencing the very voices we claim to represent. This frictionless interaction risks making us forget that "the worth of a relationship is in the friction"; the challenging process of negotiating different viewpoints to reach genuine consensus. |
| What Do We Mean by Ethics in LLM4ABM? |
|
Ethics in this context operates on two levels. At its core, it is about choice and intentionality, we can only act ethically when we have alternatives and make deliberate decisions about our actions. In ABM research, this translates to ensuring our models do not systematically disadvantage or exclude people, particularly marginalised communities. The framework emerging from recent discussions identifies ethics as both deontological (rule-based obligations like "everyone should be heard") and consequentialist (focusing on long-term impacts rather than short-term gains). Crucially, ethics becomes meaningful only when there are "others" whose rights and perspectives we must protect—whether they are research participants, affected communities, or future generations. |
| Ethical Dimensions Across the ABM Lifecycle |
The ethical risks of LLM integration are not evenly distributed across the modelling process. They concentrate on the initial stages of the ABM lifecycle:
|
| Checklist of Ethical Risks |
As a practical starting point for responsible use of LLM4ABM, the following checklist outlines the core ethical risks that demand our attention and deliberate action:
|
| Next Steps for the ABM Community |
| We need ethical guidelines that acknowledge both LLMs' potential benefits (like protecting participant privacy through synthetic data) and their risks. This means developing transparent declaration standards for LLM use, creating frameworks for validating synthetic data quality, and establishing community norms around responsible AI integration. The goal is not to ban LLMs from ABM research, but to use them ethically, recognising that true innovation comes not from eliminating friction, but from navigating it responsibly. |
| Afterthought |
After we finished our discussion, I continued thinking about the topic and why our conversation felt so scattered. It then dawned on me that we actually have multiple dimensions of ethics that need to be considered separately at each stage of the ABM life cycle:
|
| Back to Top |
| LLM4ABM Discussion @ LLMs and ABMs: Promise, Pitfalls, and the Path to Trust |
| Credits: Credits: Content co-created by the LLM4ABM SIG members. Written by Peer-Olaf Siebers. Copy-edited by Claude Sonnet 4. |
| Welcome! |
|
Large Language Models (LLMs) are increasingly influencing research practices. For those working with Agent-Based Models (ABMs), the key question is how to integrate them effectively. They offer practical support in generating ideas and drafting code, but their use also brings uncertainty and caution. This post presents perspectives on using LLMs as tools that enhance research practices without undermining scientific rigour. The ABM community is at a pivotal moment as LLMs demonstrate unprecedented capabilities in social simulation and computational modelling. Recent discussions in the LLM4ABM SIG reflect both enthusiasm and concern regarding the integration of these models into research workflows. The central question addressed in this post is: Where can these tools provide genuine value, and where might they lead us off course? |
| Exploration versus Explanation |
| Perhaps the clearest boundary is between exploration and explanation. In the early stages of a project, LLMs shine. They can suggest new angles, summarise background material, or help generate initial model ideas. These tasks are creative and low risk, and the speed at which LLMs work makes them useful companions. But the stakes change when it comes to explanation, when we are trying to generate evidence, justify findings, or interpret results. Here, over-reliance on LLMs is dangerous. Their outputs are persuasive but not always reliable. If we mistake fluency for accuracy, we risk misleading ourselves and others. Many in the community agree: LLMs may be powerful for idea generation, but they cannot yet be trusted as evidence-making engines. |
| Trust and Transparency |
| That raises the question of trust. How do we create confidence in when and how these tools are used? The answer will not come from individuals working in isolation. It will require community-wide norms and practices. Other tools went through a similar journey. Calculators and spell checkers were once controversial in academic settings. They only became unremarkable once standards were set for when and how they could be used. LLMs are on the same trajectory. Until they become routine, transparency matters. Researchers have a duty to report how they used them—whether for editing, coding, brainstorming, or interpretation. This is not about shaming anyone, but about making practices visible and building trust. |
| Standards, Ethics, and Practicality |
| Of course, the practicalities matter. Some suggest every paper should include a short acknowledgement describing how LLMs were used. Others even argue for sharing prompts, so results can be reproduced. But would this create needless complexity? Many believe a simpler, lighter-touch approach is more realistic. Ethics are another concern. If LLMs shape research directions, code, or even conclusions, should we treat their influence the same way we treat human collaborators? Should there be explicit ethical guidelines about where they fit in? These questions are unresolved, but they are not going away. |
| Why the Unease? |
| Interestingly, researchers rarely feel nervous admitting they used machine learning in their work. Yet many hesitate to disclose LLM usage. Why the double standard? Perhaps because machine learning is seen as a technical method, while LLMs feel more like intellectual partners—closer to the work of writing and thinking. Whatever the reason, silence will not build confidence. Only openness will. |
| Conclusion |
| For now, the safest position is pragmatic. Use LLMs freely for exploration and brainstorming but be cautious about treating them as sources of evidence. Report their role openly, even if only briefly. Push for simple, shared standards that do not overburden researchers, but still ensure integrity. In time, the unease will fade. Just as calculators and spell checkers became unremarkable, LLMs will eventually find their place in everyday research. The transition will be messy, but the path is clear: cautious experimentation, transparent reporting, and collective responsibility for shaping how these tools are used. |
| Back to Top |
| LLM4ABM: A Forum for Discussing the Role of LLMs in ABMs |
| Credits: Written by Peer-Olaf Siebers. Copy-edited by Claude Sonnet 4 |
|
LLM4ABM is a Social Simulation discussion group founded in 2024 that meets online once a month. We are united by a shared interest in exploring how Large Language Models (LLMs) and Agent-Based Modelling (ABM) interact across the entire simulation study life cycle. Our starting point was a focused discussion on how LLMs might help transform qualitative evidence (interview data, ethnographic insights, case studies, expert knowledge, etc.) into behavioural rules that can be utilised in agent-based models. In practice, our conversations have expanded well beyond this. We often find ourselves debating the broader roles of LLMs within the ABM process, as well as their wider implications for scientific research that is built upon established standards and norms. These threads are not separate, but tightly interwoven, and our discussions tend to move fluidly between them. Along the way, I have been taking notes. With the agreement of the group, I will share some of these reflections as blog posts here. They capture many of the engaging and thought-provoking ideas that would otherwise remain tucked away in my notebook. If you would like to join the discussions, please get in touch, and I will add you to the group. |
| Back to Top |
| Tiya's Student Internship: The Use of LLMs for Social Simulation Development |
| Credits: Written by Peer-Olaf Siebers. Research conducted by Tiya Teshome (University of Leicester) |
|
Introduction In this blog post, I would like to share insights from a recent undergraduate internship project that explored the intersection of Large Language Models (LLMs) and Agent-Based Social Simulation (ABSS). ABSS has emerged as a powerful methodology for modelling complex systems, yet the manual design process remains a significant barrier to accessibility. This project investigated how LLMs can automate and streamline social simulation development, addressing three key research questions through systematic investigation:
1. Comparing LLM Output Quality The first task evaluated four leading LLMs, GPT-4, Claude, DeepSeek, and Gemini, across two distinct prompt types: (1) general use cases, and (2) ABSS specific use cases. Each LLM was assessed for precision, accuracy, and simulation modelling suitability. Results revealed varying strengths: whilst all models generated structured agent-based designs, consistency differed significantly. GPT-4 and Claude demonstrated superior architectural understanding, whilst Gemini excelled at generating visual components for enhanced model realism. Prompt engineering proved crucial, with iterative refinement necessary to achieve consistent, structured outputs suitable for implementation. 2. NetLogo vs Python Implementation Quality The second investigation compared LLM-assisted implementation across NetLogo (an ABSS IDE) and Python using an Epidemic SIR and a futuristic museum model. The NetLogo implementations proved more successful, with LLMs effectively debugging turtle logic and variable usage through targeted feedback. Python development presented greater challenges, with frequent non-existent method calls requiring manual corrections. Whilst LLMs provided valuable architectural guidance, Python's complexity demanded more human intervention than NetLogo's simplified agent-based environment. 3. Streamlit Web Application Development The final component produced a functional prototype using LLaMa 3.3 and Streamlit, enabling non-specialists to convert concepts into structured simulation designs. The application guides users through five modular stages: agent roles, behaviours, environment layout, interaction rules, and simulation measures. Key features include component editing capabilities and structured output generation, successfully democratising access to social simulation design whilst maintaining scientific rigour. Conclusions Overall, this project has been a big success thanks to the hard work of Tiya. The research demonstrates LLMs' potential to enhance social simulation accessibility, though human oversight remains essential for ensuring accuracy and implementation success. Acknowledgement: This internship was sponsored by the Royal Academy of Engineering in collaboration with Google DeepMind Research Ready and the Hg Foundation. |
| Back to Top |
| Publication: Large Language Models for Agent-Based Modelling |
| Credits: Written by Peer-Olaf Siebers |
| The paper titled Large Language Models for Agent-Based Modelling: Current and Possible Uses Across the Modelling Cycle, authored by the LLM4ABM Gang (Loïs Vanhée, Melania Borit, Peer-Olaf Siebers, Roger Cremades, Christopher Frantz, Önder Gürcan, František Kalvas, Denisa Reshef Kera, Vivek Nallur, Kavin Narasimhan, and Martin Neumann) has been accepted for presentation at the Social Simulation Conference 2025 (SSC2025). |
| Abstract: The emergence of Large Language Models (LLMs) with increasingly sophisticated natural language understanding and generative capabilities has sparked interest in the Agent-based Modelling (ABM) community. With their ability to summarize, generate, analyze, categorize, transcribe and translate text, answer questions, propose explanations, sustain dialogue, extract information from unstructured text, and perform logical reasoning and problem-solving tasks, LLMs have a good potential to contribute to the modelling process. After reviewing the current use of LLMs in ABM, this study reflects on the opportunities and challenges of the potential use of LLMs in ABM. It does so by following the modelling cycle, from problem formulation to documentation and communication of model results, and holding a critical stance. |
| Back to Top |
| Publication: Using an AI-powered Buddy for Designing Innovative ABMs |
| Credits: Written by Peer-Olaf Siebers |
| After working on it for more than a year, my paper Exploring Conversational AI for Agent-Based Social Simulation Design has finally been published in the Journal of Artificial Societies and Social Simulation. It explores the use of ChatGPT for conceptual modelling and the co-creation of agent-based models. To promote the paper, I gave a presentation at the LLM4ABM Special Interest Group meeting yesterday. Below, you can find links to the presentation slides, the published paper, and a GitHub Repository containing additional resources. The repository is a dynamic resource and over the summer I will add further examples, an updated script, and other resources. You are welcome to add your examples to the repository as well :-). |
| Abstract: ChatGPT, the AI-powered chatbot with a massive user base of hundreds of millions, has become a global phenomenon. However, the use of Conversational AI Systems (CAISs) like ChatGPT for research in the field of Social Simulation is still limited. Specifically, there is no evidence of its usage in Agent-Based Social Simulation (ABSS) model design. This paper takes a crucial first step toward exploring the untapped potential of this emerging technology in the context of ABSS model design. The research presented here demonstrates how CAISs can facilitate the development of innovative conceptual ABSS models in a concise timeframe and with minimal required upfront case-based knowledge. By employing advanced prompt engineering techniques and adhering to the Engineering ABSS framework, we have constructed a comprehensive prompt script that enables the design of conceptual ABSS models with or by the CAIS. A proof-of-concept application of the prompt script, used to generate the conceptual ABSS model for a case study on the impact of adaptive architecture in a museum environment, illustrates the practicality of the approach. Despite occasional inaccuracies and conversational divergence, the CAIS proved to be a valuable companion for ABSS modellers. |
| Back to Top |
| EABSS-2: A Software Engineer's Approach to Creating Agent-Based Models |
| Credits: Drafted by Peer-Olaf Siebers, turbocharged by ChatGPT-4o |
|
Ever wondered how to build agent-based models the smart way, without the usual headaches? If you are a fan of modelling human behaviour, testing policy impacts, or just love crafting digital societies, then you're going to love what's new in the world of simulation frameworks. Meet EABSS-2, the fresh and improved version of the Engineering Agent-Based Social Simulations (EABSS) framework! We all know that designing agent-based models can be a complex (sometimes messy) process, especially when working in teams. That's where EABSS-2 steps in to save the day. It is more than just a tool; it's a guided workflow that helps both solo and collaborative creators turn great ideas into working simulations with far less friction. Although EABSS-2 is still a work in progress, a preview and supplementary material are already available for those keen to take a first look. The framework's new features are introduced in an upcoming journal paper, which offers a detailed walkthrough of the improvements and a case study showcasing them in action. The official release is planned for December 2025, so stay tuned for more updates. |
So, what's new and exciting in EABSS-2?
|
| Back to Top |
| From Roots to Horizons: The Evolution of My ABM Research Journey | |||
| Credits: Words by Peer-Olaf Siebers. Title crafted by ChatGPT-4o | |||
| My research related to Agent-Based Modelling (ABM) falls under the broader theme of Collaboratively Creating Artificial Labs for Better Understanding Current and Future Human and Mixed Human/Robot Societies. I am a strong advocate for Computational ABM. Initially, my focus was on applying Computational ABM across a wide range of domains (poster 1 from 2012). Subsequently, I concentrated on integrating software engineering methods and techniques to develop conceptual agent-based models (see poster 2 from 2023). My current research explores how large language models (LLMs) can be used at various stages of the ABM study lifecycle (see poster 3 from 2024). For more information, please consult the posters:
| |||
| Back to Top | |||