How to Think About Type Systems
Pick the programming language you know best. I’ll pick JavaScript. Now imagine every value you could create in that language. Yes, every value.
In JavaScript you’d have numbers like 0, 1, -42, and 3.14. You’d have strings
like the empty string ""
or the classic "Hello, World!"
. You’d have objects
like {id: 7, name: "Kate"}
or {props: {...}, state: {...}}
if you’re a React
developer. You’d have functions like function add(a, b) { return a + b }
or
event => this.handleChange(event.target.value)
.
That’s a lot of values. An infinite amount of values.
This is the set of all values for your programming language.
In JavaScript, there’s nobody to stop you from calling a function with any value from the entire universe of values your programming language supports.
async function fetchUser(id) {
const response = await fetch(`https://example.org/api/user/${id}`);
if (response.status !== 200) {
throw new Error(`Unexpected status code: ${response.status}`);
}
return await response.json();
}
// Gotcha! This is an object.
fetchUser({
sneakySneaky: true,
});
Let’s switch to TypeScript and annotate our fetchUser()
function.
async function fetchUser(id: string) {
// ...
}
A type represents a set of values. Here we say the type of id
is a string.
That is, the id
may be any value in the set of all string values in your
programming language.
Likewise, if we annotated id
as a number type then id
may be any value in
the set of all number values in your programming language.
TypeScript gives us a feature not available in many other type systems. Union
types. With a union type, we may write number | string
which represents the
number
type or the string
type. A value with a type of number | string
may be any value in the set of all strings or any value in the set of all
numbers.
We may ask a type system the question: “is this value a member of my type’s set of values?” This allows us to compare individual values to types. Type systems also have the difficult task of comparing two types to each other.
Let’s think about some smaller sets. Let’s say we write a TypeScript type which can be any number zero through nine.
type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;
Let’s say we write a second TypeScript type which can be either zero or one.
type BinaryDigit = 0 | 1;
We’ll then write a simple program with these types.
function isOdd(digit: Digit): boolean {
return digit % 2 == 1;
}
function toBoolean(digit: BinaryDigit): boolean {
return isOdd(digit);
}
The function isOdd()
will tell us if our digit is an odd number. While the
implementation of isOdd()
will work on any number we limit the type of its
parameter to Digit
. The function toBoolean()
will convert a BinaryDigit
into a boolean where 0
is false
and 1
is true
. It calls the isOdd()
function which expects a Digit
. But, BinaryDigit
is a different type than
Digit
! This program works because the set of values represented by the
BinaryDigit
type is a subset of the set of values represented by the Digit
type. Every value in BinaryDigit
also exists in Digit
.
In classic type system lingo, we would instead say that BinaryDigit
is a a
subtype of Digit
.
Usually, subtyping comes up in the context of classes.
class Animal {
// ...
}
class Dog extends Animal {
// ...
}
function doSomethingWithAnimal(animal: Animal) {
// ...
}
function doSomethingWithDog(dog: Dog) {
return doSomethingWithAnimal(dog);
}
Every instance of Dog
is also an instance of Animal
.
The thing is, a type system isn’t the only way to restrict the set of values for your variables.
A condition will work just as well!
if (x > 0) {
// `x` is a positive number!
}
Type systems will automatically narrow the set of possible values for your variables, but having a type system alone is not enough.
Let’s say you’re building an app where a user gets to pick a user name. Other
users will be able to reference them with an @
. User names should only contain
letters and numbers (or if you’re hardcore, UAX #31). You don’t want
spaces in user names!
If you type your user name as a string…
type UserName = string;
Your type is too broad. It contains the empty string (""
) or an entire
sentence ("The quick brown fox jumps over the lazy dog."
).
Some type systems will allow you to create custom data types. With a custom data
type, you can force the type to only be constructed in the module where the data
type is defined. That means you can have a custom UserName
data type and no
one outside of your UserName
module can create a new UserName
. (tangent:
how I do this in TypeScript)
You can put as many conditions on your UserName
as you’d like. Such as a
regular expression.
function createUserName(name) {
if (/^[a-zA-Z0-9_-]+$/.test(name)) {
return name;
} else {
throw new Error('Not a user name!');
}
}
Now you may use your type system to automatically remember an even narrower set of possible values then one already expressible in the types.
Deep End
If you like this way of thinking about type systems and you’d like to dive into the deep end then here’s some more thoughts for ya!
- In C, every value is logically a collection of bits. A 64-bit integer type, 64-bit float type, and a 64-bit pointer type all have the same set of values at runtime. Yet the type system treats them as different types. Under the intuition of type systems proposed by this post we can explain this using custom data types. Custom data types allow us to give the same set of values different semantic meaning.
- JavaScript type systems also have the intersection type operator.
A & B
. Which represents the values shared byA
’s set of values andB
’s set of values. Like a Venn diagram. One missing primitive set operation in JavaScript type systems is the difference operator. For example, if we call the difference operator/
thennumber / 10
is every number except for 10. Sure Flow has object difference and TypeScript has object difference and union difference, but there’s no true set difference operator. - Type Inference capabilities of different type systems affect how types
implicitly relate to each other. Common wisdom is that if you have
sophisticated type inference you can’t have implicit subtyping. TypeScript has
limited type inference capabilities but does support implicit subtyping. OCaml
has sophisticated type inference capabilities but only supports explicit
subtyping with the
<:
operator. Research like MLsub, however, shows a path to having implicit subtyping and sophisticated type inference. If you haven’t read the thesis, though, does not support invariant type parameters which may be inconvenient for programming styles which like mutability. - Dependent types are a thing, but I don’t know enough about them to talk knowledgeably.
Further Reading
- I highly recommend reading the Rust Virtual Structs proposal. Lots of interesting insight into code reuse and type systems all to serve an efficient runtime representation.
- The MLsub thesis describes an ML type system with implicit subtyping. It’s like the centaur (half person, half horse) equivalent of a type system.
- The “Extensible records with scoped labels” paper is an interesting read if you like object width-subtyping in TypeScript and Flow but you want ML inference.
- I gave a talk once about similar ideas.