16 August, 2010 1 Comment
This is a discussion about where Potential is today, and what changes need to be made in the near future. The primary aim of this report is to articulate the language’s goals and to frankly discuss what work remains; this document does not aim to evangelize.
Any comments will be incredibly useful.
Goals and philosophy
Potential should be viewed as a domain specific language for systems programming, particularly operating system kernels. It is hoped that Potential can be used to implement systems that a user can trust when there is a gun to his head.
A moving target
General purpose languages rarely see radical change. After all, if a change to the language is likely to make web programming simpler, but half of the language’s uses are outside of that domain, it is difficult to see why the language should be changed to accommodate what some may consider a niche area.
Domain specific languages, on the other hand, are built to serve a particular domain. If a change is proposed which allows the language to better serve its domain, that change should be adopted. A particularly strong example of this is when a language change has the potential to prevent domain-specific bugs from being written. We hope to see such changes take place in Potential as opportunities are identified.
Potential is embedded in Haskell and compiles to assembly. We say that Haskell is the “host” language and assembly is the “target” language. This approach allows us to offload a considerable amount of the work associated with language implementation:
- We don’t need to worry about byte-code generation because we can use an existing assembler.
- We don’t need to worry about parsers and type checkers because we can use GHC.
- We don’t need to worry about syntax highlighting because Haskell syntax is widely recognized.
Inventive and adoptive
Things like ASCII art data structures are both non-standard and very popular. Potential isn’t afraid of looking dissimilar to other languages if the result is something which better serves its domain. Systems programming is not an area well-suited for people who are afraid of learning how to use their tools; therefore, Potential should not be afraid of inventing new syntax and approaches which better serve the goals of the language.
Of course, there is considerable theoretical work in the area of typed low-level languages. Opportunities to exploit this work should be eagerly explored.
Aside from the technical details surrounding the project, there is a lot to do pertaining to the culture and community.
At present time Potential is copyrighted, with all rights reserved, by Tim Carstens. This should change in two ways:
- Tim would prefer to transfer the copyright to a body that has more longevity than he possesses.
- The license needs to change from “all rights reserved” to something more permissive.
Due to the way in which Potential code is compiled, the language is particularly well suited to “infectious” licenses: since Potential mnemonics are really just Haskell combinators, there is an opportunity to license Potential in such a way that code written in Potential necessarily adopts the same license. For instance, it appears that if Potential is licensed under GPL, then all code written in Potential would be as well.
In keeping with the goal of providing a language for trustworthy systems, the following principles are relevant:
- Users must be able to modify code written in Potential.
- Users must be able to deploy modified code on any devices they own which run code written in Potential.
- Potential should not be used as a foundation for denying users transparency into the devices and systems that they rely on.
The GPLv3 appears well suited to the protection of these principles. We will soon reach a point where we will need to have a discussion about whether or not this license is the best fit. Feedback on this issue is desperately wanted.
In terms of copyright ownership, it seems wise to keep ownership centralized, but again Tim would prefer not to be the sole arbiter of these rights. Once the language has reached further maturity, and development of core technologies has slowed down, we will need to decide on a path for transfer of ownership to an organization which is better able to protect Potential’s core values in the long term.
At present time Potential has a repository on github and a website. We need better online tools for managing bugs, documentation, and test results. In the coming months we will need better tools for discussing changes to the language, which may come in the form of an IRC channel, a mailing list, or a wiki. As the project grows into these needs, there will be a greater requirement for technical assistance in managing this infrastructure.
It is especially critical in the early stages of development that Potential’s status be accurately and openly reported. A bug tracking system will help with this, but more condensed and consumable “snapshot” status reports will become essential. Ideally no one should be able to make a reasonable argument that the project is hiding its defects.
Given the nature of our primary goal (trustworthy systems), the project’s credibility is at least as critical an asset as its technology.
Current (technical) status
This compiler is able to take Potential source and convert it to AT&T-syntax x86-64 assembly. It can also provide type signatures for functions, and check that source is well-typed.
We do have code written in Potential which is compiling, linking, and executing correctly with C code.
Functionality should be added to the compiler that makes it easier to inspect code written in Potential. Type signatures, in particular, should be presented in a more human-readable form. The compiler should also be able to generate documentation in a style similar to haddock, but tailored towards the unique restrictions and needs placed on Potential (especially the need to cut down on the verbosity of the type signatures, and the need to generate documentation for functions which are produced by Template Haskell).
We are able to update data structures. Tests need to be written to verify that our accessor functions are working correctly, and that our arrays implementation is functioning correctly. It is likely that there are (easy to fix) bugs in all of these areas.
We need to improve test coverage and automate the execution of tests. Ideally the Potential compiler should be modified to help with both tasks.
There is no documentation to speak of. This made sense when the core of the language was still in heavy flux, but we are reaching a point where the core is stable enough that we should begin authoring documentation. We are probably not at a point to start writing tutorials, though.
Major (technical) action items
Support for static analyzers
We need a framework for enabling static analyzers to work with Potential. In the extreme, we could implement the Potential mnemonics in a type-class system, allowing some ambient context (read: monad) to provide implementations. Monad stacking could then allow static analyzers to inject extra tracking into existing mnemonics, as needed. This approach has the benefit of being composable, in the sense that orthogonal analyzers could be simultaneously used simply by putting more monad transformers onto the stack.
All mnemonics are already built on a stack of monads, so this change would probably be reasonably simple to implement.
Support for multiple hardware architectures
Potential is domain specific towards systems programming, not towards x86-64 programming. The current implementation, however, only supports x86-64. It would be nice to support two modes of programming: architecture dependent (currently ubiquitous) and architecture independent (currently absent).
The type system could track which functions are architecture independent and which are targeted to a particular architecture. This is in stark contrast to the way in which this same problem is solved in C via
Such a change could probably be implemented largely by adding functionality to the language, with (hopefully) isolated changes to the rest of the system. New data types will need to be introduced to model the notion of architecture (in)dependence, and rolling out support for these types will likely effect all parts of the language.
Potential already has something like an intermediate language, though its design is heavily influenced by x86-64 assembly. This language will need to be revised to support greater generality. It is possible that a transition to llvm would make sense, although it would be necessary to first investigate ways in which low-level operations (setting interrupt tables, for instance) could be implemented in such a design.
Implementation of the memory back-end
The memory system currently is able to model the allocation and manipulation of data structures, but in terms of implementation, there is no code which actually manages regions. We need to implement a low-level
free which can be used within the memory system. We also need to implement the data structures which will be used to manage regions.
Such a change will likely reveal modifications that need to be made to the types for the memory primitives, which right now are
Unmodeled x x. It is hoped that they will be
Composable x x, since this would have the least impact on the existing code of all possible scenarios.
Support for theorems
Currently all static behavior about systems in Potential are modeled at the level of (Haskell) types. This is very expressive, but is incomplete. We will likely find ourselves needing support for some type of theorem proving, particularly within trusted “security kernels” (such as the implementation of the memory system).
That is, the type system is able to promise type safety, and in some cases is able to promise that data structures have been initialized in certain ways (such as statically checking privilege levels for interrupt gates), but it is still very easy to introduce bugs by using the wrong register or wrong type of jump statement. It’s likely that the theoretical framework for fixing these problems has been forged; this is an area that really deserves some review.
Support for complex memory schemes
At present time Potential has no concept of paged memory. This is arguably the most complex and most critical problem facing Potential in the near future.
A solution to this problem will probably involve a distinction between pointers which are always valid and pointers whose validity depends on the page table. A trustworthy implementation may require the types of theorem proving alluded to above, particularly since there will probably be some constraints which are only known at run-time. Certainly there is a lot of work to do in this area.
Potential syntax and mnemonics are still in flux. Much of the functionality is present, but the presentation has major deficiencies.
To illustrate the point, let’s look at a function which computes the factorial of a given number.
module Ctests.Factorial.DoFactorial where import Potential _doFactorial = defun "_doFactorial" $ do isCode -- rdi is the arg -- we return in rax push rbx loadInt 0 rbx loadInt 0 rax compare rdi rbx (JZ allDone) (do loadInt 1 rax loadInt 1 rbx sjmp doFactorial1) doFactorial1 = defun "doFactorial1" $ do isCode compare rdi rbx (JZ allDone) (do mul rdi rax sub rbx rdi sjmp doFactorial1) allDone = defun "allDone" $ do isCode pop rbx ret
This can be compiled and linked against a C program as a function with signature
long doFactorial(long a), following the AMD64 ABI where
rax is the return register and
rdi is the first argument to the function.
This code illustrates that computation can be done in Potential, but the syntax has a long way to go. Certainly higher-level primitives can be added to simplify things; the introduction of named variables is also clearly missing. The
compare mnemonic is a clumsy replacement for
cmp. A lot of inventing really needs to take place to make the language easier to work in.
Relating to this area, while the language’s technical and philosophical goals have been articulated to some degree, its syntactic goals have not. Feedback in this area is desperately wanted, particularly on the competition between “low level access” and “high level syntax.”