Workflow Generation

December 13, 2024

Labs

Workflow Generation Powered by LLM

Workflow is a widely used concept across software systems, playing a pivotal role in automating business processes and scientific methodologies by implementing them as workflow models. Achieving automation in a no-code or low-code environment could drive unprecedented productivity gains and maximize process efficiency.

However, designing and implementing workflows effectively remains a significant challenge. Creating such models requires deep domain knowledge and sophisticated workflow modeling skills, making it a hard task often inaccessible to non-experts. This has been a major barrier to the broader adoption of workflow applications.

In this context, the emergence of Large Language Models (LLMs) has introduced new possibilities. LLMs, renowned for their ability to understand natural language instructions and generate contextually relevant code, have already demonstrated their potential in software development. Applying these capabilities to Workflow Generation offers a novel approach, lowering technical barriers and enabling users to design and modify workflow models effortlessly using natural language.

Nonetheless, challenges remain. Two major hurdles in leveraging LLMs for Workflow Generation are:

Accurately generating entirely new workflows from scratch remains a demanding task for the models as well.
Modifying or optimizing generated workflows involves nuances and requirements of specific business contexts, which makes the process hard for non-experts.

This article explores the current state of Workflow Generation, including its relationship with Robotic Process Automation (RPA), and examines approaches leveraging LLMs to overcome these challenges. (For simplicity, this blog considers Workflow Generation and RPA as synonymous concepts.)

‍

Research Trends

Source: Moreira, Sílvia, Henrique S. Mamede, and Arnaldo Santos. "Process automation using RPA–a literature review." *Procedia Computer Science* 219 (2023): 244-254.

‍

Research by Moreira et al. shows that studies on RPA have been growing linearly, with over 3,000 papers published annually by 2021. They define RPA as:

Robotic: systems that copy human behavior and perform tasks according toProcess: steps that lead to the fulfillment of a taskAutomation: any task that’s performed with assistance and not manually

Source: Ahuja, Shefali, and R. K. Tailor. "Performance evaluation of Robotic Process Automation on waiting lines of toll plazas." NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal| NVEO (2021): 10437-10442.

For an automation system to qualify as RPA, it must "understand" human behavior, complete assigned tasks, and perform these autonomously.

‍

The advent of LLMs has accelerated research and applications in this domain. By combining LLMs' powerful reasoning and natural language processing capabilities with traditional automation, the boundaries of Workflow Generation are expanding.

LLMs' ability to infer meaning from commands and perform reasoning beyond simple text prediction has gained significant attention. For instance, OpenAI's model o1-preview has demonstrated the capacity to accurately interpret user intentions and generate logical, creative responses across diverse scenarios. These capabilities open new opportunities for addressing unstructured data and complex workflows previously challenging for traditional RPA.

Moreover, LLMs excel at understanding natural language instructions, significantly increasing input flexibility. Unlike traditional RPA systems, which require strictly defined input formats, LLMs allow users to provide commands in varied formats and contexts. This enhances RPA’s potential, extending its application from repetitive, rule-based tasks to more creative and intelligent processes.

The following sections explore key research and case studies showcasing how LLMs are transforming Workflow Generation and overcoming its limitations.

FlowMind: Automatic Workflow Generation with LLMs

FlowMind introduces a method to automatically generate workflows using LLMs. The system structures the workflow creation process based on prompt engineering and leverages APIs to reduce hallucination risks while ensuring data and code security. Users provide feedback on high-level descriptions of generated workflows, enabling the system to refine and adjust workflows accordingly. This interactive process enhances both flexibility and accuracy.

‍

‍

The proposed model operates in two main stages:

Lecture Phase: A system prompt is configured with contextual information, API definitions, and a list of available resources.
Execution Phase: High-level descriptions are translated into actionable code, incorporating user feedback to refine the output.

FlowMind achieved approximately 30% higher performance than GPT baseline models on NCEN-QA-Easy datasets, highlighting its effectiveness.

‍

StateFlow: Enhancing LLM Task-Solving through State-driven Workflows

StateFlow adapts the concept of state machines, commonly used in system design, to generate workflows. A state machine models systems as states and transitions, facilitating structured problem-solving for workflow generation.

‍

For example, when solving SQL problems, the process is categorized into three stages—Observe, Select, and Verify—with corresponding actions defined for each. Using this structure, LLMs are guided to develop detailed logic for specific tasks.

‍

A closer examination reveals that in the Observe phase, the primary objective is to extract necessary information based on the given contextual data. In the Solve phase, this extracted information is used to guide the creation of executable SQL queries, enabling more structured reasoning. The researchers claim that this approach achieves superior performance compared to baseline GPT-4 models and ReAct-based models.

‍

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

On the other hand, there was a proposal to solve a problem using the Markov Chain Tree Search (MCTS) concept.

MCTS predicts possible outcomes of the next steps and selects the best course of action based on probabilities, similar to how AlphaGo determines optimal moves in Go. AlphaGo was designed to evaluate all possible next moves in a game of Go, calculate the likelihood of winning for each move, and then select the optimal one. Similarly, this approach can be applied to workflow generation, which is the core idea behind this methodology.

‍

The proposed methodology can be broadly divided into the following steps:

‍Natural Language Query Analysis: Analyze the user's query to extract key keywords, categories, and potential API candidates.

<Query>
I'm organizing a school trip to Barcelona and need to
find a hotel near the city center. Using Priceline com
Provider and ADSBx Flight Sim Traffic to help me
with hotel options and traffic details.

<Tag Extraction>
Category: Travel, Transportation
Tool: Pricelinecom Provider, ADSBxFlight SimTraffic
API: A1, A2, A3, B1, B2, B3

Solution Tree Construction Using MCTS: Construct a solution tree based on possible scenarios using MCTS.
Evaluation: Assess each path based on executability and correctness.
Feedback Reflection: Refine the workflow incrementally based on feedback to enhance its robustness.

‍

‍

This reinforcement learning-based approach has been experimentally proven to produce increasingly reliable models. Specifically, the researchers claim that it enables the generation of workflows that are over 20% more effective than using GPT-4 alone.

그 외에 Other notable studies include those that refine the feedback process to solve problems in the Minecraft game [6], extract knowledge from API documentation like Retrieval-Augmented Generation (RAG) to generate workflows [7], leverage multi-agent systems [8], develop more robust discriminators to enhance the evaluation process of outputs [9], and explore alternative methods of applying tree search [10, 11]

‍

Going into Conclusion

Workflow Generation is increasingly critical in modern software systems and business processes, yet it still faces significant challenges in automation and design. LLMs, with their ability to understand, reason, and generate natural language-based instructions, are driving innovative solutions to these challenges.

Research such as FlowMind and StateFlow maximizes LLMs' strengths, improving the accuracy and flexibility of workflow generation. ToolPlanner’s integration of reinforcement learning and MCTS further pushes the boundaries of what is achievable. Additionally, studies incorporating feedback processes, API-based knowledge extraction, and multi-agent collaboration showcase a broad array of innovative techniques.

As LLM-driven Workflow Generation continues to evolve, it holds the potential to transform problem-solving processes by simultaneously enhancing creativity and efficiency. Future research and applications in this field will undoubtedly uncover even greater possibilities, bringing us closer to a seamless integration of AI into complex workflows.

Reference

Moreira, Sílvia, Henrique S. Mamede, and Arnaldo Santos. "Process automation using RPA–a literature review." Procedia Computer Science 219 (2023): 244-254.
Ahuja, Shefali, and R. K. Tailor. "Performance evaluation of Robotic Process Automation on waiting lines of toll plazas." NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal| NVEO (2021): 10437-10442.
Zeng, Zhen, et al. "FlowMind: automatic workflow generation with LLMs." Proceedings of the Fourth ACM International Conference on AI in Finance. 2023.
Wu, Yiran, et al. "StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows." arXiv preprint arXiv:2403.11322 (2024).
Wu, Qinzhuo, et al. "ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback." arXiv preprint arXiv:2409.14826 (2024).
Wang, Zihao, et al. "Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents." arXiv preprint arXiv:2302.01560 (2023).
Xu, Jia, et al. "LLM4Workflow: An LLM-based Automated Workflow Model Generation Tool." Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024.
Lin, Leilei, et al. "MAO: A Framework for Process Model Generation with Multi-Agent Orchestration." arXiv preprint arXiv:2408.01916 (2024).
Chen, Ziru, et al. "When is tree search useful for llm planning? it depends on the discriminator." arXiv preprint arXiv:2402.10890 (2024).
Zhao, Zirui, Wee Sun Lee, and David Hsu. "Large language models as commonsense knowledge for large-scale task planning." Advances in Neural Information Processing Systems 36 (2024).
Zhao, Yu, et al. "Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions." arXiv preprint arXiv:2411.14405 (2024).