Currently I’m looking at the “cannot move out of dereference of &-pointer” class of errors, which I personally dread the most and find difficult to deal with. I don’t have a good idea about a better general way to present that error, however. Instead, I would like to simply special-casing the error in a match expression, which is where I encounter it the most.
The problem with the way the error is currently presented in match expression is that (sometimes) it’s a half-way between “where” the value moves out and “where” we are attempting to move to. For example, from the following code snippet,
Notice that the caret and tilde span through the whole Some(num) pattern. It’s accurate, too, because here we are not allowed to move the value out of *opt, and one variant of *opt is Some(num). On the other hand, what we really want to tell to the user is that since num is capturing an owned pointer by value, it’s causing a move, and that’s essentially where the problem lies.
I believe giving the user both information will make the story about this error message clearer:
However, I’m not too fond of doing this because just injecting ref into the AST is more work than it looks. And writing a routine to generate a new arm seems overkill, especially when just telling the user to use ref num seems sufficient. In addition, I actually prefer the first note.
That said, in some cases the second way has the advantage of generating less notes. Take this code snippet as an example:
Recall from the previous post that this describes a SubSupConflict, meaning that lifetime #2 is a sub-lifetime of the lifetime of &x.y (which we are trying to infer), and the lifetime of &x.y is a sub-lifetime of lifetime #1. However, lifetime #2 is not necessarily a sub-lifetime of lifetime #1, and so we have to explicitly declare that it is. Note that the anonymous lifetime #1 came from &Foo in the function parameter, and anonymous lifetime #2 came from the return value &int. In this case, we indicate that they are the same lifetime by introducing a lifetime parameter (binding) 'a and give both the same binding:
123
fnbar<'a>(x:&'aFoo)->&'aint{&x.y}
Terminologies
One potential source of confusion when it comes to lifetime parameter is that it’s slightly different from function parameter. Here, it’s helpful to introduce a couple terminologies: free lifetime and bound lifetime.
The terms “free” and “bound” came from lambda calculus. For traditional function, a binding is analogous to a parameter, and other variables are considered free. For example, for some function fn foo(x: int) { y }, x would be a binding and y a free variable.
Free lifetime and bound lifetime slightly differ from their traditional function counterparts in the following ways:
Return value can get a lifetime binding.
If a function parameter or return value uses lifetimes, it gets lifetime bindings even when we don’t explicitly give it any. A function parameter or return value without a named binding may be given anonymous bindings of the form #n for some n (this is where the name “anonymous lifetime” came from). Note that just as you can write multiple lifetimes on a type declaration (for example, &'a &'b Foo<'c>), there may be multiple anonymous lifetimes associated with a single function parameter or return value.
When we typecheck the body of a function, we treat those lifetimes as free instead.
Thus, “free lifetime” and “bound lifetime” are not mutually exclusive; instead, they represent different concepts. The reason why we treat lifetimes from function parameters as free is because they may have arbitrary lifetimes depending on what are passed in (that’s why another way to view free lifetime is “one that’s unbounded above”). Also, Rust doesn’t attempt to infer lifetime of return value, so it’s given the same treatment.
What is the point of having lifetime binding, then? Essentially, it serves these two purposes:
To indicate that a group of function parameters and/or return value have the same lifetime
To name a lifetime in case of error (though this ends up being not very helpful when a binding is anonymous)
Case study continued
This section doesn’t describe anything new except giving a small peek inside the Rust compiler. Here is Rust’s representation of free lifetime:
Here, scope_id refers to the function block the lifetime is bounded to. BoundRegion refers to bound lifetime and has the following representation:
12345678910111213
pubenumBoundRegion{/// An anonymous region parameter for a given fn (&T)BrAnon(uint),/// Named region parameters for functions (a in &'a T)////// The def-id is needed to distinguish free regions in/// the event of shadowing.BrNamed(ast::DefId,ast::Ident),/// Fresh bound identifiers created during GLB computations.BrFresh(uint),}
From the function that we are studying,
123
fnbar(x:&Foo)->&int{&x.y}
the Rust compiler creates the free lifetimes with bindings BrAnon(0) and BrAnon(1) out of the function parameter and return value, respectively. For brevity, we will refer to the former free lifetime as free(a#0) and the latter free(a#1). Also, let 'v be the lifetime of &x.y, which we are trying to infer, and let a <= b be the constraint that lifetime a has to be a sub-lifetime of lifetime b.
As the compiler walks through this function, it creates four constraints. The two that are relevant are free(a#1) <= 'v and 'v <= free(a#0). At the beginning, it will go through all the constraints where 'v is the “bigger” lifetime, which in this case is free(a#1) <= 'v. At this point, it infers that 'v also has the lifetime free(a#1). Thus, it follows that free(a#1) <= free(a#0). However, since these lifetimes are free, we cannot be sure how “big” either is, and so it’s not necessarily the case that one is a sub-lifetime of the other. As the result, we get an error.
Improving the error message
From this case study, it occurred to me that all SubSupConflict sub_r <= ... <= 'v <= ... <= sup_r, where sub_r and sup_r are free lifetimes can use the error message “define the same lifetime parameter on r_sub and r_sup”.
This leads me to believe that I now have a strategy to make Idea #1 from my first post a reality. Namely, for functions with similar form to the one above, I want to make a concrete suggestion:
Check that sub_r and sup_r are both free lifetimes.
Check that at least one of sub_r or sup_r has anonymous binding.
Somehow identify that they come from a function declaration and obtain the AST of that function declaration.
Since the compiler appears to identify the first anonymous binding of a function with the number 0 and increment each subsequent one by 1, we may be able to use the same strategy to identify “where” to insert/replace the lifetime in the AST.
Use the Rust pretty printer to print the new function declaration.
Questions(s)
Here are some things I’m currently unclear about:
Suppose that sub_r and sup_r come from function declaration. Is sub_r always the lifetime of the return value?
Suppose that sub_r has a named binding 'a and sup_r has an anonymous binding, and we infer that sub_r and sup_r should have the same binding, would there be a case where giving sup_r a binding 'a would cause a new error?
My PR that implements ideas #2 and #3 of previous post was accepted last week, so earlier this week I set out to do idea #1. That is, I want to simplify the error message for the following code snippet:
1234
structFoo{y:int}fnbar(x:&Foo)->&int{&x.y}
It turns out that the error diagnostic for this case does not lie in the borrow checker but the region inference system (where “region” is synonymous to “lifetime”). Thus, I spent Monday and Tuesday reading the codes inside rustc::middle::typeck::infer. I felt quite down by the end of it, though, because I couldn’t figure out a straightforward way to detect the common pattern above, and I was at a loss of what to do. The purpose of this post is to sort out my thinking, console myself, and document some of what I have learned so far.
A brief description
The compiler’s documentation contains a nice description of how region inference system works. On the contrary, this description will be brief and omit many details. Its main purpose is to introduce some terminologies.
The basic problem is that many times the compiler has to infer the lifetime of certain expressions. When that happens, it creates a “region variable”. By contrast, a “concrete region” may be a lifetime associated with some lexical scope (e.g. block of a function) or a free lifetime (I don’t quite get what this means, but it appears to refer to a lifetime that’s not bounded above). What the compiler does is that as it walks through a function, it accumulates “constraints”, and then it tries to solve those constraints by the end of the function. A constraint has the form constraint(a, b), meaning that a is a subregion of (i.e., bounded by) b, where a and b may either be a region variable or a concrete region. The compiler would report the error if these constraints happen to conflict.
Types of error
When the compiler runs through these constraints and deduces region inference errors, it collects them and then reports them later. Region inference errors are categorized into three types, as described by the RegionResolutionError enum:
123456789101112131415161718192021222324
pubenumRegionResolutionError{/// `ConcreteFailure(o, a, b)`:////// `o` requires that `a <= b`, but this does not holdConcreteFailure(SubregionOrigin,Region,Region),/// `SubSupConflict(v, sub_origin, sub_r, sup_origin, sup_r)`:////// Could not infer a value for `v` because `sub_r <= v` (due to/// `sub_origin`) but `v <= sup_r` (due to `sup_origin`) and/// `sub_r <= sup_r` does not hold.SubSupConflict(RegionVariableOrigin,SubregionOrigin,Region,SubregionOrigin,Region),/// `SupSupConflict(v, origin1, r1, origin2, r2)`:////// Could not infer a value for `v` because `v <= r1` (due to/// `origin1`) and `v <= r2` (due to `origin2`) and/// `r1` and `r2` have no intersection.SupSupConflict(RegionVariableOrigin,SubregionOrigin,Region,SubregionOrigin,Region),}
As described by the comments, this is roughly what these errors correspond to:
ConcreteFailure - there is a constraint(a, b), where a and b are concrete regions, that does not hold
SubSupConflict - there are constraint(sub_r, v), constraint(v, sup_r), where sub_r and sup_r are concrete regions and v is a region variable. Since sub_r is a subregion of v and v is a subregion of sup_r, it follows that sub_r is a subregion of sup_r. However, that constraint is not satisfied.
SupSupConflict - there are constraint(v, r1) and constraint(v, r2). Since v is a subregion of both r1 and r2, they must overlap. However, they do not.
Recall that we create a region variable when we need to infer the lifetime of some expression. Here a RegionVariableOrigin is a type used to record why we created the region variable in the first place. On the other hand, SubregionOrigin records why we created the constraint. Thus, suppose some region variable v has the RegionVariableOrigin v_origin, then SubSupConflict(v_origin, sub_origin, sub_r, sup_origin, sup_r) encodes the following information:
v_origin - why the region variable v is created
sub_origin - why we created constraint(sub_r, v)
sup_origin - why we created constraint(v, sup_r)
Case study
Compiling the function at the beginning of this post gives us the following error:
It turns out that the error above falls into the SubSupConflict category. In general, SubSupConflict error message contains one error message, four notes, and has the following format (as an aside, SupSupConflict has a similar format):
The error message (+ notes) above has the following deficiencies:
It’s too long and intimidating
The description is fairly opaque
Even though it’s long, it does not even describe the problem completely
To elaborate on the last bullet point, the description of the problem is this: 1) sub_region is subregion of v, 2) v is subregion of sup_region, 3) thus, sub_region is subregion of sup_region, but that does not hold. As we can see, number 3 is missing.
Suggestion
My suggestion is to add a note to include 3. Adding another note, however, would make an already long error message even longer. Personally, I feel that since the second and fourth notes do not describe the problem directly, they are of secondary importance and should be removed. I would also swap the first and third notes and change the language a bit to make it a smoother reading experience.
Putting all the above together, we would have something as follows:
Instead of “is bounded by”, something like “is a sub-lifetime of” may be good, too.
Discussions and caveats
The example above turns out to be rather silly since anonymous lifetimes #1 and #2 seem to refer to the same block, and yet there’s conflict. I tried to investigate how this error arose in the first place by looking at the debug output, but there are too many details I do not understand as of now. I’ll try to enumerate some of them in a later section.
Also, I’m not sure if removing the second and fourth notes (about why the constraints are there) is a good idea. I personally wouldn’t miss them since I have never found them helpful, but someone more knowledgeable about Rust’s lifetime inference may. A solution to this would be to have a verbose option for the power users.
For long block, I would probably replace the current span note with the custom span note that I added in my PR from last week. For example, suppose our function is more than 6 lines long:
12345678
fnfoo(x:&Foo)->&int{2;3;4;5;6;&x.bar}
What the default span note does when displaying a span of more than 6 lines is to strip out all the remaining lines, which looks like this:
However, this does not give a good view of the whole scope of the lifetime. What my custom span note does is display the first and last lines, and blank out the middle (currently it also always add an arrow at the end, so I’ll have to modify it a bit). One added advantage is that it would make the error message takes less space:
Regrettably, all this is a far cry from giving a concrete feedback like “missing a lifetime parameter” or “you may need to insert a lifetime” (that said, one thing that I wonder lately is: are all SubSupConflict and SupSupConflict errors caused by missing lifetime parameter?), but I will need to study up more on lifetime, which seems to include a lot of subtle details. If possible, I would like to just suggest outright “perhaps you mean to declare fn bar<'a>(x: &'a Foo) -> &'a int?”
Things I still need to understand
This is the section where I get to wail like a baby and lament about all that is wrong with the world. The data structures are pretty well-commented, but since there are so many details, I end up getting confused. For example, this is what represents a region (comments removed):
What is a “region bound”, and once again, what exactly is a free region? The FreeRegion enum also has a BoundRegion associated with it: why is that the case?
This means that there is a ton of specific reasons for a constraint to be added. The ones that are marked with stars are those I’m not quite clear on yet. Also, it seems that this enum serves the dual purpose of indicating why a constraint is added and the cause of error (in particular, ReferenceOutlivesReferent and BindingTypeIsNotValidAtDecl look like they are for error reporting).
There are also many variants of the RegionVariableOrigin that I do not understand, but I think they will be clearer once I know what a region bound is.
Moving forward
I expect many of these questions will become clearer as I read the code more, but unfortunately it’s a slow process. I’m not quite clear on what to do now. Maybe I can start implementing the suggestions I made in this post.
This semester I have an opportunity to do an independent study on the Rust compiler, supervised by my professor David Evans. The particular topic I’m thinking of exploring right now is the Rust lifetime system. This blog will serve as a place where I can document my progress (unless I feel lazy about blogging and this becomes the last post). It will also (hopefully) serve the dual purpose of showing professor Evans that I’m not slacking off.
If I have to rate my current knowledge of Rust, on a scale of 0 (no clue) to 10 (mastery), I would put myself at a 3. Embarrassingly, one crucial topic that I have always had trouble dealing with is Rust lifetime. People often say that we should keep learning different languages to explore new ways of thinking. For Rust, the new thing it offers is making the programmer explicitly aware of the lifetime of objects. Thus, given my lack of understanding of lifetime, I have plenty of reasons to study up on it.
As such, if anything I write is incorrect, I would appreciate it if someone were to point it out.
My current plan is to 1) read up the documentation on lifetime and the borrow-checker code (which handles lifetime) and 2) improve the lifetime error messages. After going 1/4th of the way through the documentation middle::borrowck::gather_loans::doc, I feel that I understand it a little better now, and a few ideas have formed in my head.
Idea #1
This idea wasn’t thought of by me but given by Niko Matsakis and Patrick Walton. The idea is to detect common error patterns and suggest a fix. For example, currently, if we compile the following function:
It would be nice if in this particular case, we can just tell the user to introduce a lifetime parameter, or even better, give them something like: “perhaps you mean to declare fn bar<'a>(x: &'a Foo) -> &'a int?”
One feedback I should seek from others is: are all those notes necessary? Personally, my eyes just glance over them. I have never understood them. Perhaps there are other instances where they may be useful.
Idea #2
Here’s an idea that ezyang had. He posted this gist on IRC last month, though at the time I had no idea what he was doing.
When the Rust compiler reports an error because of a previous borrow, it may be helpful to note where that borrow ends. For example, currently the program
One case this would be helpful in is when the borrow ends earlier or later than the user expects, and so he has a chance to correct his misconception. On the other hand, it adds extra noise to the compiler, and the compiler is already pretty noisy.
Idea #3
I will begin this part with examples. First, let’s modify the program in Idea #2 a little bit. We will change the last line of the function so that z borrows from x mutably instead:
While these messages and the messages in Idea #2 are correct, they have two shortcomings:
A mutable borrow prevents both mutable and immutable borrows. The cause of these two errors are essentially the same, but currently it sounds like they are different. It is also a bit misleading to output “error: cannot borrow x as immutable because it is also borrowed as mutable” when it cannot even be borrowed as mutable.
They fail to convey one key point: a borrow of a variable subsequently restricts the usage of that variable in some ways until that borrow ends.
Thus, in this particular instance, I think we would be well served by outputting the same error message for both and elaborating the note:
1234567
test.rs:4:13:4:19error:cannotborrow`x`becauseitisalreadyborrowedasmutabletest.rs:4letz=&x// or &mut x^~~~~~test.rs:3:13:3:19note:previousborrowof`x`asmutableoccurshere;amutableborrowpreventssubsequentborrowormodificationofvariable`x`untiltheborrowends.test.rs:3lety=&mutx;^~~~~~
My only qualm is the wording does not convey explicitly that the restriction is placed on the variable only (perhaps “subsequent borrow or modification using variable x” would be better?)
To summarize Idea #3, I believe that borrow checker errors can be conveyed better by focusing on the restriction that the original borrow places rather than reporting the precise details at the error site.