- Haskell
- Basic formatting
- Write type signatures
- Write kind signatures
- Write Haddock
- Prefer pattern-matching
- Avoid punning
- Avoid ticks in function names
- Avoid multi-line parenthesized expressions
- Use deriving strategies
- Use Text
- Derive
Show
, definePretty
, usePretty
for human-readable output - Use the package name as the root of the module name hierarchy
- Reading direction
- Avoid orphan instances except where they’re safe
- Make data strict by default, use
nothunks
for long-lived data structures - Extensions
- Imports
- Libraries
- Cabal
The purpose of this document is to codify various aspects of how we write Haskell code as we contribute to Plutus.
The goal is twofold:
-
To provide guidance in cases where it is not obvious what to do, or the choice is arbitrary.
-
To help resolve disagreements between team members. Ideally these can be settled by referring to a recommendation in this document, or if not, then perhaps the resolution of the disagreement will create a new recommendation.
The contents of this document are “recommendations”, that is:
-
They are more concrete than “principles” (like “write simple code”) because we want them to be clearly action-guiding.
-
They are more vague than “rules” (like “never use function X”) because it’s usually impossible to be that precise, and so judgement is still required in applying them.
Use 2 spaces for indentation. Full-indent for where
is recommended.
Bad:
f x = y
where
y = ...
Good:
f x = y
where
y = ...
Avoid lines over 100 characters. This is codified in .editorconfig
.
Format Haskell source with stylish-haskell
(config in .stylish-haskell.yaml
).
This is enforced in the pre-commit hook and in CI.
Formatting with stylish-haskell
must be a no-op.
Type signatures should have the double-colon on the same line as the name, separated by one space.
Bad:
foo
:: A -> B
-> C
Good:
foo :: A -> B -> C
Good:
foo ::
A ->
B ->
C
Good:
foo ::
forall a b c.
( C1 a
, C2 b
) =>
A ->
B ->
C
The indentation is arbitrary. The line length is longer than the “traditional” 80 characters, but anecdotally Haskell code tends toward longer lines, and breaking them up doesn’t help readability much.
Placing the double-colon on the same line makes it easier to find definitions. We don’t necessarily have HLS running all the time, and it may not work on every module.
stylish-haskell
is a fairly modest formatter that doesn’t touch too much. At the moment we mostly use it for formatting imports!
In the past we tried ormolu
, but its lack of support for multi-line comments was a deal-breaker.
Every top-level binding should have a type signature. Most other named bindings (e.g. let
and where
bindings) should also have type signatures.
Constraints should be uncurried.
Bad:
foo :: Eq a => Ord a => a -> a -> a
Good:
foo :: (Eq a, Ord a) => a -> a -> a
Haskell is traditionally lauded for its excellent type inference. People used to make the argument that this brought Haskell closer to dynamically-typed languages in ease of use: you don’t have to write type signatures, the compiler will infer it for you.
However, not writing type signatures has a heavy maintainability cost.
If types are inferred, then the way a binding is used can affect the type which is inferred for it. That means that a mistake in using a binding can result in an error inside the binding (or elsewhere) due to the inferred type not matching what goes on in the RHS of the binding. More generally, errors can end up appearing in unexpected and counter-intuitive places.
Pinning down the type of a binding means that any errors relating to using that binding will occur at the use site, where they belong!
It’s a Haskell truism that the type forms part of the documentation of a binding. But that requires you to be able to see the type. Of course, in this day and age, we should all have an IDE that shows us the type on hover. But sometimes you’re stuck using vim
. Or you’re reviewing the code on Github. Or the IDE is broken.
So do your colleagues a favor and just write it down.
Every type definition that has parameters which are not all of kind Type
should have a kind signature using StandaloneKindSignatures
.
Bad:
data Term tyname name uni fun a
Good:
type Term :: Type -> Type -> (Type -> Type) -> Type -> Type -> Type
data Term tyname name uni fun a
The reasoning is essentially the same as for type signatures. We’ve got used to GHC inferring all this, and in the past we didn’t even have the means to easily state kind signatures. But these days with people using fancier type-level machinery, and with better support for kind signatures, it’s time to just start writing them down.
Every top-level exported binding should have Haddock. Non-exported bindings should probably have Haddock too.
Put the module’s haddock comment right above the module M where
line, and below the PRAGMAs.
Prefer to use pattern matching where possible, unless it significantly complicates the code.
Bad:
data SortOrder = Ascending | Descending
deriving Eq
sortWithOrder' :: Ord a => SortOrder -> [a] -> [a]
sortWithOrder' order = f . sort
where
f = if order == Ascending then id else reverse
Good:
sortWithOrder :: Ord a => SortOrder -> [a] -> [a]
sortWithOrder Ascending = id . sort
sortWithOrder Descending = reverse . sort
Avoid using the same names for things at the term and type level. Except for newtype`
constructors.
Generally avoid using ticks to distinguish function names. All this conveys is that it is “another” version of the function. Try expressing the difference in the function name, even if it makes it longer.
A parenthesized expression should not span multiple lines. Pull it out to a named binding, use $
, or otherwise reorganize the code.
Bad:
foldr (\a acc -> let
x = ...
in a + x + acc) x xs
Good:
foldr meaningfulName x xs
where
meaningfulName :: ...
meaningfulName a acc = let x = ... in a + x + acc
A parenthesis forces the user to keep a stack in their head to remember when the current "argument" finishes. Line length limits this to some degree, but if we allow line breaks then the amount of stack can become quite unwieldy.
This also explains why $
is good: since it indicates there will be no closing paren, there is no need for a stack (it’s the "tail call" of bracketing).
Always use deriving strategies.
Use Text instead of String unless you have a good reason not to.
Always derive Show
, do not define it manually.
Always use Pretty
for human-readable output, not Show
.
Always define Pretty
explicitly (when you need it). It’s okay to delegate to the Show
instance if you think it’s good enough.
The derived version of Show
is always useful as a way of seeing the explicit strucuture of a value as a Haskell value.
Defining Show
can mean that this is no longer true, and you can’t do a better job than the derived version.
We use Pretty
always for human-readable output, because it’s actually friendly to layout and the derived Show
instance is not usually human friendly.
Therefore if you need to produce output for humans, define a Pretty
instance.
This will typically need to be hand-written, unless it happens that you can defer to the Show
instance, e.g. for simple enums Show
can be fine since it just prints the constructor names.
If the package is foo-bar
, then the modules should all be FooBar.X
.
See “Naming conventions” here. We do it slightly differently (“FooBar” rather than “Foo.Bar”), but the main principle is the same.
Try to keep a single line mostly reading left-to-right or right-to-left.
Avoid orphan instances, but don’t worry about it if you can be sure that they’re safe.
Use StrictData
for new code; make fields strict unless you have a good reason not to.
For data structures that might live for a long time, use nothunks
to assert that they don’t contain unexpected thunks.
A painful lesson of Haskell in production is that space leaks really suck, are a huge pain to track down, and can originate in surprising locations. This suggests that it’s worth a bit of proactive paranoia: just make things strict as much as possible, in the hopes of squashing any nascent space leaks.
This may seem like overkill to you… until you’ve experienced debugging a space leak!
These are basically all fine and can be put in default-extensions
.
-
Anything in
GHC2021
. Once we have a GHC version that supports the GHC2021 language, we will likely switch to using it. -
LambdaCase
: clear, helpful -
DerivingStrategies
: always -
GADTs
: well established, useful -
OverloadedStrings
: essential when working withText
, which you should -
NegativeLiterals
-
DerivingVia
: great -
RoleAnnotations
: if you need it, you need it
The following extensions are generally fine if you find that they’ll make your life much easier, but you probably don’t want to use them all the time.
-
RecordWildCards
-
TypeFamilies
: often very useful, but can make things tricky. Think before using. -
DataKinds
-
FunctionalDependencies
-
ViewPatterns
: can be very nice, can be a huge mess -
OverloadedLists
: sometimes a lifesaver, not as indispensable asOverloadedStrings
If you find you have:
-
A long explicit import list
-
Several hiding declarations
Then either just import the module in its entirety, or qualify it. Usually if you are using hiding you will need to qualify it.
Bad:
import Control.Lens (first, … , _Right) hiding (ix, lens)
Good:
import Control.Lens qualified as Lens
When working inside a package that exports an “umbrella module”, avoid importing that module directly.
Use the strict versions of most data structures by default unless you have a good reason not to.
Use lenses where they allow a significant simplification of the code. For simple use cases just use normal record accessors.
Don’t use data-default
, instead just define defaultX
values for your X
type.
Default
is not terribly bad. It’s truly ad-hoc polymorphism: all you get is name reuse, you can’t (or shouldn’t) write a function that’s polymorphic over Default a
. That’s fine, but it also means that the benefit is fairly marginal.
Additionally, just defining specific values is more flexible. If, say, you want multiple default values for different contexts, then that is straightforward with values but not with Default
.
Put your commonly-used extensions in default-extensions
rather than repeating them constantly.
It’s nice for files to be self-contained, but this is typically a fiction: you need to know about compilation flags from cabal files anyway. It saves a lot of typing to put the really essential stuff in the cabal file.
Haskell “languages” are basically a blessed set of extensions anyway, and people are fine putting those in the cabal file. A lot of what we’re currently doing is manually implementing the GHC2021 language!
Use the following set of warning flags:
-Wall
-Wnoncanonical-monad-instances
-Wincomplete-uni-patterns
-Wincomplete-record-updates
-Wredundant-constraints
-Widentities
-Wunused-packages
-Wmissing-deriving-strategies
Don’t set -Werror
by default, only set it in CI builds.
Use a common stanza (usually called “lang”) to include a) the language (Haskell2010), b) the default-extensions
, c) the default set of warnings.
Use multiple public libraries judiciously. For now, only use them for additional libraries to be used in test code (“testlibs”).
Multiple public libraries are a very useful feature, but they’re not entirely mature yet. One place where they are invaluable is to export a “test library” containing code for testing the main library, without forcing the main library to depend on test libraries.
In due course we may want to use them more widely.
Use internal libraries where it is useful to enforce a clear separation of a “sub-package”.