ARCH20: Papers with Abstracts

Papers
Abstract. This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with piecewise constant dynamics. The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020. In this fourth edition, five tools have been applied to solve six different benchmark problems in the category for piecewise constant dynamics: BACH, PHAVerLite, PHAVer/SX, TROPICAL, and XSpeed. Compared to last year, we combine the HBMC and HPWC categories of ARCH-COMP 2019 to a new category PCDB (hybrid systems with Piecewise Constant bounds on the Dynamics (HPCD) and Bounded model checking (BMC) of HPCD systems). The result is a snapshot of the current landscape of tools and the types of benchmarks they are particularly suited for. Due to the diversity of problems, we are not ranking tools, yet the presented results probably provide the most complete assessment of tools for the safety verification of continuous and hybrid systems with piecewise constant dynamics up to this date.
Abstract. We present the results of the ARCH1 2020 friendly competition for formal verification of continuous and hybrid systems with linear continuous dynamics. In its fourth edition, eight tools have been applied to solve eight different benchmark problems in the category for linear continuous dynamics (in alphabetical order): CORA, C2E2, HyDRA, Hylaa, Hylaa-Continuous, JuliaReach, SpaceEx, and XSpeed. This report is a snapshot of the current landscape of tools and the types of benchmarks they are particularly suited for. Due to the diversity of problems, we are not ranking tools, yet the presented results provide one of the most complete assessments of tools for the safety verification of continuous and hybrid systems with linear continuous dynamics up to this date.
Abstract. We present the results of a friendly competition for formal verification of continuous and hybrid systems with nonlinear continuous dynamics. The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020. This year, 6 tools Ariadne, CORA, DynIbex, Flow*, Isabelle/HOL, and JuliaReach (in alphabetic order) participated. These tools are applied to solve reachability analysis problems on six benchmark problems, two of them featuring hybrid dynamics. We do not rank the tools based on the results, but show the current status and discover the potential advantages of different tools.
Abstract. This report presents the results of a friendly competition for formal verification and policy synthesis of stochastic models. It also introduces new benchmarks within this category, and recommends next steps for this category towards next year’s edition of the competition. The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in Spring/Summer 2020.
Abstract. This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We more broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020. In the second edition of this AINNCS category at ARCH-COMP, four tools have been applied to solve seven different benchmark problems, (in alphabetical order): NNV, OVERT, ReachNN*, and VenMAS. This report is a snapshot of the current landscape of tools and the types of benchmarks for which these tools are suited. Due to the diversity of problems, lack of a shared hardware platform, and the early stage of the competition, we are not ranking tools in terms of performance, yet the presented results probably provide the most complete assessment of current tools for safety verification of NNCS.
Abstract. This report presents the results from the 2020 friendly competition in the ARCH workshop for the falsification of temporal logic specifications over Cyber-Physical Systems. We briefly describe the competition settings, which have been inherited from the previous year, give background on the participating teams and tools and discuss the selected benchmarks. The benchmarks are available on the ARCH website1, as well as in the competition’s gitlab repository2. In comparison to 2019, we have two new participating tools with novel approaches, and the results show a clear improvement over previous performances on some benchmarks.
Abstract. This paper reports on the Hybrid Systems Theorem Proving (HSTP) category in the ARCH-COMP Friendly Competition 2020. The characteristic features of the HSTP category remain as in the previous editions [MST+18, MST+19]: i) The flexibility of pro- gramming languages as structuring principles for hybrid systems, ii) The unambiguity and precision of program semantics, and iii) The mathematical rigor of logical reason- ing principles. The HSTP category especially features many nonlinear and parametric continuous and hybrid systems. Owing to the nature of theorem proving, HSTP again accommodates three modes: A) Automatic in which the entire verification is performed fully automatically without any additional input beyond the original hybrid system and its safety specification. H) Hints in which select proof hints are provided as part of the input problem specification, allowing users to communicate specific advice about the sys- tem such as loop invariants. S) Scripted in which a significant part of the verification is done with dedicated proof scripts or problem-specific proof tactics. This threefold split makes it possible to better identify the sources of scalability and efficiency bottlenecks in hybrid systems theorem proving. The existence of all three categories also makes it easier for new tools with a different focus to participate in the competition, wherever they focus on in the spectrum from fast proof checking all the way to full automation. The types of benchmarks considered and experimental findings with the participating theorem provers KeYmaera, KeYmaera X 4.6.3, KeYmaera X 4.8.0, Isabelle/HOL/Hybrid-Systems-VCs, and HHL Prover are described in this paper as well.
Abstract. This report presents the results of the repeatability evaluation for the 4th International Competition on Verifying Continuous and Hybrid Systems (ARCH-COMP’20). The competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020, affiliated with the IFAC World Congress. In its fourth edition, twenty-eight tools submitted artifacts through a Git repository for the repeatability evaluation, applied to solve benchmark problems for seven competition categories. The majority of participants adhered to the requirements for this year’s repeatability evaluation, namely to submit scripts to automatically install and execute tools in containerized virtual environments (specifically Dockerfiles to execute within Docker), and several categories used performance evaluation information from a common execution platform. The repeatability results represent a snapshot of the current landscape of tools and the types of benchmarks for which they are particularly suited and for which others may repeat their analyses. Due to the diversity of problems in verification of continuous and hybrid systems, as well as basing on standard practice in repeatability evaluations, we evaluate the tools with pass and/or failing being repeatable.
Abstract. Reachable set computation is one of the many widely-used techniques for the verification of safety properties of dynamical systems. One of the simplest algorithms for computing reachable sets for discrete nonlinear systems uses parallelotope bundles and Bernstein polynomials. In this paper, we describe Kaa, a terse Python implementation of reachable set computation which leverages the widely used symbolic package sympy. Additionally, we simplify the user interface and provide easy-to-use plotting utilities. We believe that our tool has pedagogical value given the simplicity of the implementation and its user- friendliness.
Abstract. The phenomenon of phase synchronization was evidenced in the 17th century by Huy- gens while observing two pendulums of clocks leaning against the same wall. This phe- nomenon has more recently appeared as a widespread phenomenon in nature, and turns out to have multiple industrial applications. The exact parameter values of the system for which the phenomenon manifests itself are however delicate to obtain in general, and it is interesting to find formal sufficient conditions to guarantee phase synchronization. Using the notion of reachability, we give here such a formal method. More precisely, our method selects a portion S of the state space, and shows that any solution starting at S returns to S within a fixed number of periods k. Besides, our method shows that the components of the solution are then (almost) in phase. We explain how the method applies on the Brusselator reaction-diffusion and the biped walker examples. These examples can also be seen as “challenges” for the verification of continuous and hybrid systems.
Abstract. Experience Report: Real-Time control systems can be difficult to analyze due to the mixture of discrete-time and continuous-time dynamics. This difficulty is particularly pronounced if the timing is non-periodic, e.g., due to network or execution effects. Still, most control loops behave similar to a purely continuous-time system disturbed by a small discretization error, which is exploited by Bak and Johnson (2015) in the method of Continuization. This paper uncovers limitations of that work and presents an extension, First-Order Continuization, based on a new formal framework that recovers previous results and eases future development.
Abstract. Benchmark Proposal: We describe how a well-known backward reachability problem with nonlinear dynamics and adversarial inputs—based on a pursuit evasion game with two identical vehicles that have Dubins car dynamics—can be viewed as a robust controlled backward reach tube. The resulting set is nonconvex with a surface that is nondifferentiable in places, yet (mostly explicit) closed form solutions for points on the surface of this set have been derived based on a classical differential game analysis, and so these points can be sampled with high accuracy at arbitrary density. We propose this problem as a benchmark because few existing reachability algorithms can tackle robust controlled backward reach tubes despite their potential for proving the robust safety of systems, and this (almost) analytic solution exists against which to compare prospective solutions. We then describe some extensions to the problem to provide additional future challenges. Code is provided.
Abstract. Temporal-logic based falsification of Cyber-Physical Systems is a testing technique used to verify certain behaviours in simulation models, however the problem statement typically requires some model-specific tuning of parameters to achieve optimal results. In this experience report, we investigate how different optimization solvers and objective functions affect the falsification outcome for a benchmark set of models and specifications. With data from the four different solvers and three different objective functions for the falsification problem, we see that choice of solver and objective function depends both on the model and the specification that are to be falsified. We also note that using a robust semantics of Signal Temporal Logic typically increases falsification performance compared to using Boolean semantics.
Abstract. In this benchmark proposal, we present a set of large specifications stated in Signal Temporal Logic (STL) intended for use in falsification of Cyber-Physical Systems. The main purpose of the benchmark is for tools that monitor STL specifications to be able to test their performance on complex specifications that have structure similar to industrial specifications. The benchmark itself is a Git repository which will therefore be updated over time, and new specifications can be added. At the time of submission, the repository contains a total of seven Simulink requirement models, resulting in 17 generated STL specifications.