DeepSpec Summer School 2018

I had the good fortune of attending the DeepSpec Summer School 2018 (DSSS18) at Princeton University this summer. There, I met leading researchers in the field, got to learn about some truly-impressive research and meet some incredibly smart and like-minded, verification-inclined people. I strongly you to checkout the materials on the repository, and the lectures (once they are posted) online.

Prep

I had some time after graduation, so I was able to refresh my skills with Coq using Software Foundations Volume 1. Having worked with Coq before (CPDT, Chapters 1-4), being familiar with Idris and Isabelle/HOL, I found that my rate of getting through the exercises was limited by the amount of work in the exercises themselves as opposed to difficulty of new material. And also a dislike of skipping material. (I get the impression that knowing what to skip is a key-skill in post-graduate academic learning.)

Week 0: Jersey City and New York

I had the opportunity to stay with family in the area and do a bunch of touristy things: Metropolitan Museum of Art, Grove Street Market, Statue of Liberty (smaller than I expected) via the Staten Island Ferry, Central Park, Times Square, High Line, Manhattan Henge, Rockefeller/"Top of the Rock", Brooklyn Bridge Park and Exchange Place. I will not even begin to list all the amazing food I got to try. Definitely was not a fan of the heat, but the evenings were really lovely.

Week 1: VST Hackathon

A crash course in the Verified Software Toolchain (developed at Princeton) for those familiar with Coq (whilst those not so familar did the Coq Intensive). This will soon be Volume 5 of Software Foundations. Learning about VST's impredicative, higher-order, concurrent separation logic, directly from Andrew Appel (a incredibly clear and concise teacher) was nothing short of eye-opening. VST enables users to write C programs which are correct, and then using the CompCert verified C compiler, compile and run them. A high-level overview of the typical methodology is as follows:

  1. Write your specification. A specification is mathematical statement of what it means for a program to be correct. In most software development, these are usually assumed, or at best, written down in informal, English prose.

  2. Write a functional program, e.g. model that implements the specification. In this case, this means an actual, honest-to-goodness executable program in Coq's functional language Gallina.

  3. Write a C program, that refines the functional model. Let's consider the implementation of a stack: in most functional languages, to pop an element off a stack is a simple operation that returns a new, (optional) tail of the list. However, in C, a pop will involve in-place mutation of the list (with pointer re-assignments) and maybe even a call to "free". In this case, the C "pop" refines the functional "pop".

  4. Prove A: 2 implies 1 and B: 3 refines 2.

    • These proofs are very different in their nature.

    • A: 2 implies 1 (Domain-specific). An example given was for verifying cryptographic primitive: this proof requires knowledge of number theory, but not of VST's logic and C-level memory concerns.

    • B: 3 refines 2 (C-specific). Continuing the example, the prover need not know anything about what the code actually does, just that it faithfully implements the functional program. An understanding of VST's logic and C is essential for this.

I would like to give a special shout-out to William Mansky and the other students who helped at the lectures. They were incredibly friendly, empathic, enthusiastic and helpful towards ignorant and excited students such as myself and I thoroughly enjoyed learning from them. VST is surprisingly easy to get to grips with (assuming strong familiarity with Coq, which, is admittedly a high-bar) and their attention to the ergonomics of it was very much appreciated by the whole class.

Week 1: Verified Functional Algorithms

A dive into Volume 3 of Software Foundations, again taught by Andrew Appel. I finally understood the importance of inductive predicates and the importance of stating proofs in a specific way when I did the exercise on proving insertion sort correct "directly" with respect to the "obvious" specification as opposed to going via the slightly-unintuitive inductive predicate. I saw that/why selection sort is basically evil and should be avoided (from a reasoning perspective of course). I also learnt (not for the first time, and almost definitely not for the last time) the importance of stepping away from the computer and working proofs out in detail on pen-and-paper when proofs get difficult.

Week 1: CertiKOS

Ronghui Gu and Zong Shao presented the CertiKOS project, developed at Yale and Columbia. Concurrent, Certified Abstraction Layers (CCCAL) was the talk of the town with these guys. My grasp of CCAL (and its concurrent cousin) is tenuous, but it's what you'd expect from a methodology with the word 'layers' in it, except there are a lot more layers than you'd think. I particularly liked some of the meta-commentary in Shao's talk, so I've included the relevant slides below. He also had a few lovely things to say about linear-logic, such as it and other resource-aware logics being the future of reasoning about programs going forward.

Week 1: Narcissus

Ben Delaware presented this project, which best summarised by the title of its paper: Deriving Correct-by-Construction Decoders and Encoders from Binary Formats. The talk was a really awesome mix of concepts related to automatic program-synthesis and parser combinators and I'm confident there's an adjunction in the maths of it somewhere.

Week 1: Non-lecture stuff

Student talks (I rushed through mine so fast), drinks at the Ivy Inn, homework parties, long walks to the river, incredibly cold rooms, an excursion to Point Place beach (and a thorough pummeling by the waves), visiting old friends in New York. Princeton is a lovely little town, though I still prefer Cambridge ;-) I should add here, the (vegetarian and vegan) food was great: at the lecutres (breakfast and lunch), the Princeton cafeteria and the surrounding restaraunts, which was convenient for the unusually high-proportion of vegetarians and vegans at the program.

Week 2: QuickChick

The irony of many Coq projects is that although the artifacts the produce are formally verified, the actual structuring, quality of code and best practices are less than ideal. Concepts such as continuous-integration, testing and code-formatting are not emphasised a lot. QuickChick aims to change that by bringing property-based testing (à la QuickCheck) to Coq. The reason for doing so is two-fold: increasing the confidence is some supporting code you don't want or need to verify and refining statements of specifications and theorems early-on, before proving them, so that a project doesn't need to make massively-cascading changes to a definition in the late stages of proofs. I was very lucky to talk to Benjamin Pierce about the project and the gap between research and industry. Property-based testing will soon be Volume 4 of Software Foundations, where they describe the structure of QuickCheck in a "roll-your-own" manner, getting into the details of generators and shrinkers.

Week 2: Kami

I liked Adam Chlipala's (of MIT) old-school, blackboard based approach to explaining the concepts intuitively before diving deep into some intense Coq proofs. Kami uses Coq to construct verified hardware. It operates in the same space as BlueSpec, that is a meta-programming HDL, parameterised over various types and values (e.g. word-length). When the meta-program is run, it generates a "flat" description of hardware, much like a person may write in VHDL or *Verilog. The process of taking this flat description, turning it into RTL, and then eventually into silicon is a different topic, and very much the domain of expensive, proprietary tools. Fully understanding his lectures required an ridiculously broad background of Computer Science topics: the intersection of people who know about hardware, process calculi and parametric higher-order abstract syntax is probably very small. Luckily, my time at Arm, interest in Topics in Concurrency and PL research meant I could just about follow all of it and really appreciate the beauty of the ideas he was presenting.

Week 2: Verdi

Zack Tatlock of University of Washington was an invited lecturer, to present Verdi, a framework for verifying distributed systems. His style of teaching was really honest, and approachable, like a friend describing their experience of a long journey. I also liked his encouraging words about tackling research problems progressively, and never believing "oh fine, you've verified X, but that's easy because Y, you'll never be able to verify Z". Tatlock presented how to take a program (e.g. distributed protocol) written against an ideal network semantics and perform very general, program-generic transformations of it (and its associated properties) to progressively weaker and weaker network guarantees ("vertical composition"). Unfortunately, he didn't have time to present Diesel, a separation logic for reasoning about the "horizontal composition" of distributed systems. My understanding of it as follows: right now, Verdi relies on "closed-world" assumptions about a system (that the entire configuration of the system is fixed/known). Adding more nodes requires re-doing the proofs, because there's no notion of that node not affecting the rest of the system (separation/modularity/non-interfence, whatever you want to call it) built-in to the logic, which is what Diesel aims to address.

Week 2: Separation Logic Session

Andrew Appel led a very useful hour-long discussion about various C-level software verification tools and separation logics (including the talk of town, Iris), contrasting their approaches, merits and deficiencies in a way that helped contextualise and understand how different projects and research lines related to one-another.

Week 2: DeepSpec Web Server (Deep Web)

This was towards the end of the summer school and it definitely felt like my brain was melting at this point. Although Lennart Beringer and Benjamin Pierce presented the material well, there was a lot to get through and it became difficult to separate the what and how from the why. DeepWeb is an attempt to bring together various aspects of the DeepSpec project (QuickChick, CertiKOS and VST) and see if and how they can fit together to build a verified web server. This talk was about how to straddle the difference in methodology/philosophy between CertiKOS (CCAL) and VST (functional modelling & C-level refinement). I learnt about two really cool connections in DSSS18: the first was about the link process-calculi and hardware as inherently concurrent components. The second was in the DeepWeb talk: a process-calculi called interaction-trees, implemented using freer-monads. While using free(r)-monads to model effects is not particularly new, the idea that they relate to process-calculi was a bit mind-bending. My understanding of the following is a bit sketchy, and I hope a paper will soon shed some light on this approach, but until then: with these interaction-trees set-up, it is up to CertiKOS to prove they can provide the primitives and the associated properties of the trees, and VST to use those primitives to implement the web server.

 

Week 2: Non-Lecture stuff

Again, more food, more homework parties, more student talks and this week, movies as well (Troll 2)! A careers panel, party, bubble tea and some incredibly fun late night conversations with new friends I made were amongst the many things I'll remember about my time in Princeton.