1st Workshop on Advances in Open Runtime Technology for the Cloud

Associated with CASCON 2017

Modern language runtimes are complex, dynamic environments that involve a myriad of components that must work cooperatively to achieve the functional and performance requirements of a given language. Typical core runtime technologies include dynamic just-in-time compilers for performance, garbage collection for heap management, platform abstraction for ease of portability to different hardware and operating system environments, developer tooling for diagnosis and tuning of the various components, and interoperability between different language environments.

Cloud services such as IBM Bluemix or AWS are increasingly becoming the environments where applications are developed and deployed, data is stored, and businesses are run. Many of the features that define a cloud (e.g., resiliency, elasticity, consistency, security) are realized through runtime technologies. Clouds are polyglot environments, and therefore advances in cloud development are directly driven by innovation in runtime technologies. However, cloud environments pose unique, often conflicting demands on runtime systems that are often less of a concern in isolated systems. Throughput performance (how fast is my app?), density (how many instances of my app can I run simultaneously in my provisioned environment?), startup performance (how quickly can I launch a new instance of my app?), and language interoperability (how can my Ruby app efficiently call a function in a Python module?) are all important considerations that require innovation to solve effectively.

The goal of this workshop was to bring together development and research communities to share and discuss innovations, challenges, and research across a broad set of open-source runtime technologies (such as Eclipse OMR, LLVM, Eclipse OpenJ9, Node.js) for cloud environments. The focus on open technology solutions was key as it allowed for greater collaboration amongst individuals, communities, researchers, and companies through shared learning on common technology. The workshop did not publish formal proceedings.

Agenda

  • Greetings and introduction (Dr. Kenneth Kent, UNB & Daryl Maier, IBM Canada Lab)
  • Eclipse OMR at age 1: A retrospective and a look ahead (Daryl Maier, IBM)
  • Eclipse OpenJ9: an open source Java Virtual Machine for everyone (Mark Stoodley, IBM)
  • Testing dynamic compilers with Tril (Matthew Gaudet, IBM)
  • JIT Compilation as a Service (Marius Pirvu, IBM)
  • NUMA Awareness for Cloud Runtimes (Maria Patrou, University of New Brunswick)
  • Enhancing Variability Aware Support in Eclipse OMR (Samer AL Masri, University of Alberta)
  • Cold Object Segregation (Scott Young, University of New Brunswick)
  • Pause-Less garbage collection in Eclipse OpenJ9 (Irwin D'Souza, IBM)
  • Efficient and minimally invasive on stack replacement - OSR (Nic Coughlin, IBM)
  • GraphJIT: Runtime Graph Simplification in the JVM (David Bremner, University of New Brunswick)
  • Closing Discussion & Remarks

 

Workshop Speaker(s)

Daryl Maier (IBM Canada) is a Senior Software Developer in the IBM Canada Lab and the OMR Compiler Technology Lead in the IBM Runtimes team. He has spent the past three years leading a team to open-source the high performance compiler technology that underpins much of IBM's compiler technology stack as part of the Eclipse OMR project. Prior to that, Daryl worked on the J9 Java Virtual Machine focusing on Java performance and innovative dynamic compiler and garbage collection technologies.

Topic     Eclipse OMR at age 1: A retrospective and a look ahead. The Eclipse OMR project was created in 2016 as a toolkit of language-agnostic components for building language runtimes. This talk will provide a retrospective of the project over the past 18 months. It will highlight the ongoing contributions to the technology over that time, incubation projects incorporating Eclipse OMR technology, some lessons learned as a new open-source project, and exciting future directions.

Mark Stoodley (IBM Canada) has been building Just In Time compilers professionally for 15 years after graduating from the University of Toronto with a Ph.D. in computer engineering. Most recently, he has been leading the effort to open source the IBM J9 Java Virtual Machine at the Eclipse Foundation in the two projects Eclipse OMR and Eclipse OpenJ9. He is also the creator of the JitBuilder open source library to simplify the creation of Just In Time compilers for all kinds of different languages.

Topic     Eclipse OpenJ9: an open source Java Virtual Machine for everyone In September 2017, the OpenJ9 project was created at the Eclipse Foundation and 3.5M lines of code was contributed by IBM, representing virtually all of the implementation for the JVM battle tested by Fortune 500 companies for more than a decade to run their Java workloads in production. This project is openly governed and is licensed so that you can combine Eclipse OpenJ9 with OpenJDK9 to be able to run any Java application with quick start-up, low footprint, and high throughput. In this talk, I'll briefly explain why IBM has open sourced its J9 JVM and tell you about the technical innovations and performance results that we hope will motivate you to try it out. Since it's all open source now, I'll also talk about how you can get it, how it integrates with OpenJDK, and how you can get involved!

Matthew Gaudet (IBM Canada) has been a member of the IBM Runtimes team since 2015. His work has centred around open-sourcing the Testarossa compiler technology as part of the Eclipse OMR project, as well as making it consumable and usable by many language implementations. Lately, his focus is on improving testability of the compiler technology.

Topic     Testing dynamic compilers with Tril Testing the functional correctness of a dynamic compiler poses a number of challenges, in particular with the way its behaviour can change depending on the state of the environment during compile-time. This talk will introduce "Tril", a serialized representation of the tree-based intermediate language used within the open compiler technology of the Eclipse OMR project and demonstrate how it can be used to construct test cases that trigger precise behaviour within the compiler technology. Application of this technology to more effective unit testing will be discussed, as well as future applications to problem determination and servicability.

Marius Pirvu (IBM Canada) is an Advisory Software Developer in the IBM Runtime Technologies Group. After receiving his Ph.D. degree in Computer Science from Texas A&M University in 2000 he joined HP where he designed the chipset architecture of next generation servers. For the last 14 years he has been working at the IBM Canada Lab on the design and implementation of Testarossa Just-In-Time compiler used in IBM’s J9 JVM (now Eclipse OpenJ9). His main focus is optimizing the JVM to improve start-up, ramp-up and footprint.

Topic     JIT Compilation as a Service Many language runtimes rely on Just-in-Time (JIT) compilation to improve the performance of the applications running on top. Unfortunately, the JIT compiler can add significant overhead in terms of processing power and memory footprint. In this talk we will present the incipient stages of a new project which proposes to move the JIT compilation activity into its own process and offer on demand compilation services in the cloud. We will highlight the multiple advantages of such an approach and explore the challenges, both of technical and performance nature. In the end we will show some preliminary results from an early prototype and discuss future directions for the project.

Maria Patrou is a PhD student in Computer Science at University of New Brunswick. Her Master thesis was NUMA Awareness: Improving Thread and Memory Management in the JVM.

Topic     NUMA Awareness for Cloud Runtimes Memory management and thread organization is particularly important in systems with multiple cores and memory resources. In the cloud, different types of accesses are introduced by connecting several machines together and sharing the same virtual space. Due to network and latency issues, object location is important for application performance. A hardware implementation of this architecture is through NUMA systems. IBM's Java runtime identifies the NUMA hardware and organizes memory and threads accordingly. The main characteristic is the memory and thread distribution around nodes is as balanced as possible. The number of threads pinned and the size of the memory used are equally used from each node. However, threads are allowed to perform allocations in neighbor nodes, introducing remote accesses. In this presentation, a technique is described that avoids thread or memory migration; therefore, no calculations that increase the overall overhead occur. Instead, memory is organized per node, which enables the resizing of the heap without requiring extra synchronization. The multiple thread affinity policies presented require little calculation and, during their creation, place threads based on specific hardware and thread characteristics. Experiments that were conducted revealed improvements on execution times, cache misses and memory usage.

Samer AL Masri (University of Alberta)

Topic     Enhancing Variability Aware Support in Eclipse OMR The Eclipse OMR project is quickly evolving to support multiple architectures and languages. Hence, the necessity of having a strong, stable variability backend supporting all these architectures is increasing as the project develops. In our presentation, we describe the work we did to facilitate developers' interaction with OMR from a variability perspective. Our contributions are as follows: 1. We edited Clang such that it can compile the source code of all supported architectures consecutively (in one run). If errors occur, they are displayed in an organized manner sorted by architecture. Hence, developers can use Clang to compile (or check the syntax of) OMR for all architectures in one call for Clang after a contributor issues a push request for instance. Considering that the OMR static linter (OMRChecker) is a Clang plugin, our contribution impacts OMRChecker also. Now, OMRChecker can be easily run on all the architectures. 2. In order to increase the developers' understanding of how variability is implemented in OMR, we create another Clang plugin that goes through the source code and displays all the class hierarchies and function information in each hierarchy (location where the function is overloaded or overridden).

Scott Young is a Master’s of Computer Science student at the University of New Brunswick. His research interests include run-times, virtual machines and performance optimizations.

Topic     Cold Object Segregation As enterprise applications running on virtual machines get larger, more of the program must be paged out by the operating system. Also, objects created by object-oriented applications may be infrequently accessed; these objects can be dynamically identified at run-time. It is possible to take inspiration from Online Transaction Processing (OLTP) databases where data is grouped by frequency of access, and page out blocks that are not being used at the application level. Objects that are rarely accessed (Cold Objects) can be moved to tertiary memory. This would save memory and increase cache hits during runtime. Due to the cost of access barriers, segregation of cold objects requires an approximation of access frequency that is both fast and accurate. Moving these objects requires updating the references that point to them. To do this safely requires locking or some other concurrency mechanism. By doing this work during garbage collection, it is possible to reuse the same concurrency mechanism and avoid unnecessary extra locking. This talk explores advantages of segregating objects based on access frequency, why access barriers (which do it precisely) are worse than no segregation, and other challenges involved in cold object segregation.

Irwin D'Souza (IBM Canada) has been a member of the IBM Runtimes team since graduating in 2013 from the Faculty of Electrical and Computer Engineering at the University of Toronto. His focus has been on compilation control in the JIT Compiler in Eclipse OpenJ9. More recently, he has worked on the JIT Z code generator to support Pause-Less Garbage Collection, exploiting hardware profiling technology in the JIT, as well as development of internal profiling tools.

Topic     Pause-Less garbage collection in Eclipse OpenJ9 Java Virtual Machines (JVMs) employ built-in garbage collection (GC) technology that automatically manages memory usage within the Java heap. Typical GC policies in modern JVMs require synchronization between the application and GC threads as objects move around the heap. These are typically implemented via 'stop-the-world' pauses, where application threads are suspended while the GC relocates live objects. For Java applications with strict response-time-sensitive service level agreements and large heaps, such pause-times can result in unpredictable and inconsistent spikes in response times during a GC cycle. Concurrent Scavenge is a hardware supported GC mode in Eclipse OpenJ9 which aims to minimize the time spent in 'stop-the-world' pauses by garbage collecting in parallel with running application threads. The new GC mode employs the Guarded Storage Facility on IBM z14 TM (z14) which provides the hardware-based support to detect when potentially stale references are accessed. This talk gives a technical overview of the implementation, the challenges, and presents the final performance results.

Nicholas Coughlin (IBM Canada) is a Master’s student from the University of Queensland, working as an intern on IBM’s OpenJ9 JIT team.

Topic     Efficient and minimally invasive on stack replacement - OSR Dynamic language runtimes have traditionally relied on Just-in-Time (JIT) compilation to boost throughput performance at runtime, bypassing the languages’ REPL or interpreter loop and directly executing equivalent native code. This equivalency requirement coupled with the dynamic nature of these runtimes often complicates JIT compilation - the generated code must handle the expected execution path as well as one or more ‘fallback’ paths, capable of managing various corner-case execution states of greater rarity. The generation and preservation of these ‘fallback’ paths introduces compile time, memory and complexity overhead, whilst also limiting optimization opportunities. More recently, an alternative approach has been to transfer execution of sufficiently rare ‘fallback’ paths to the runtime’s existing REPL or interpreter loop, through a process called On Stack Replacement (OSR). In this talk we will present the Eclipse OMR compiler’s unique approach to OSR transitions, which maintains optimization conceptual integrity whilst providing the opportunity for compile-time performance and improved application throughput where desired, ultimately generating preferable results when compared to the traditional ‘fallback’ path implementations.

David Bremner is currently a Professor of Computer Science at the University of NewBrunswick. His current research interests include Programming Languages, Computational Geometry, and Mathematical Optimization.

Topic     GraphJIT: Runtime Graph Simplification in the JVM A Frequently Traversed but Stable Directed Acyclic (FTSDA) graph is a rooted directed graph, where all nodes are of the same type (i.e., the same super class or interface), and the graph structure is rarely modified. When traversing a FTSDA graph, repeated visits of intermediate nodes to reach a distant node from the root, increases runtime overhead. To reduce repeated graph traversal expenses, we provide GraphJIT, a JIT compiler, to translate a FTSDA graph into an equivalent simpler graph (fewer edges and nodes) by fusing graph internal nodes. This translation works on the bytecode to dynamically select hot nodes and determines the time for the fusion operation. The dynamic selection in GraphJIT is based on a graph node’s Entry Counter (EC), a frequency measurement of times the node has been visited. During runtime, a node would be classified to be hot, if its EC exceeds a threshold. The fusion operation mainly consists of a) dynamic bytecode generation that fuses classes of adjacent internal graph nodes on hot paths, and b) replacement of nodes in the source graph by the nodes, which are initialized from the generated bytecodes, for execution at runtime.