Minor update on new compiler

Hi,

I’m currently in the process of converting the mx2 compiler from monkey1 to monkey2. It can’t *quite* compile itself yet, but I think it’s about 90% of the way there and I’m hoping to have it all up and running sometime next week.

In the process, I have cleaned up quite a few issues with mx2, some of which I knew about, some of which were news to me!

For starters, in old mx2 you couldn’t explicitly specify a generic function (which is an incredibly nasty phrase but much worse than it sounds. If you know a little about generic stuff, the following should make some sort of sense…).

For example, take the simple generic ‘Min’ function:

In old mx2, the ‘version’ of Min used is entirely inferred from the parameters you call it with. For example, if you call Min( 10,20 ) the int version of Min will be used (ie: type ‘T’ will be ‘Int’ inside the body of Min) because 10 and 20 are ints. Ditto, if you call Min( 10.0,20.0 ) the float version will be used.

But things get a bit messy if the parameters are of different types, for example Min( 10,20.0 ) where one parameter is int and one is float. In this case, the compiler gets fatally confused because it can’t determine whether to use the int or float version of Min.

One solution is to ‘cast’ one of the parameters to the type of the other, but the clearest way is to just ‘select’ the version of Min you want to use up front, and you can now do this using Min<Blah>. For example, Min<Float>( 10,20.0 ) will use the float version of Min, regardless of what parameters you pass. The compiler wont get confused because there is only one float version of Min to choose from so it knows to convert both parameters to Float and life is good!

This feature effectively allows you to ‘pass a type’ to a function, and it becomes super useful when used inside other templates. But more on that later…

Another thing I wanted to have a go at (but wasn’t too sure I could do!) was solving the problem of how to resolve generic function overloads. For example, this will fail under old mx2:

The problem is that the call to Test with an Int[] parameter is actually a viable match for *both* generic Test()s. In the first version, ‘T’ would be ‘Int[]’, while in the second ‘T’ would just be ‘Int’.

However, it ‘feels’ like the second version should be called – and indeed c++ does this. But I couldn’t get my head around an algorithm to decide what made it deterministically better than the first! I wasted a couple of days+ on this trying some very ugly stuff that involve counting ‘type depths’ and so on, but eventually gave up.

Anyway, I revisited this recently and it turns out the solution is remarkably simple (although perhaps not to explain) and I feel like kind of an idiot for not spotting it sooner.

The basic idea is to see if you can ‘pass’ function 1’s parameters to generic function 2. If you can, but can’t do the opposite (ie: pass function 2’s parameters to function 1), then function 1 is a better match.

For example, if we try to call Test(2) with Test(1)’s parameters, it wont work because Test(2) wants a ‘T[]’ but we are just passing a ‘T’. However, the opposite DOES work, ie: we can pass a ‘T[]’ to a generic function that takes a ‘T’. So the second version is a begtter match, ie: the first version is ‘more generic’.

Note that this ordering is completely independent of the ‘real’ types involved, ie: the functions could be sorted beforehand into a best->worse order, although new mx2 doesn’t do that yet.

Ok, that probably didn’t help at all (it’s pretty weird to pass generic types to generic functions!) but the upshot is that it works and works very well!

There are quite a few other cleanups and improvements in new mx2, but I think it’s probably time for a bit more ‘Soma’…

Bye,
Mark

Liked it? Take a second to support Monkey 2 on Patreon!

7 thoughts to “Minor update on new compiler”

  1. I very much like that you can now tell the compiler which type version to call.

    but,  there should be an order or weight of precision to measure any loss, right.  The compiler should be able to measure precision loss and choose the version that results in the least amount.   so for your example the compiler should always pick min<float> when give min(10,20.0), no?

     

    ps: I left a couple posts in Monkey 2 Development that I wonder if you’ve seen?

    1. so for your example the compiler should always pick min<float> when give min(10,20.0), no?

      Actually, the original post misses something important here – the function Min<T>:T( x:T,y:T ) cannot match *any* call with different typed arguments, since x and y must be of the same type. This is implied by the fact that x and y are both of type ‘T’, and ‘T’ can only have one type!

      In the case of classes, ‘T’ should possibly/maybe pick the ‘most derived’, but I don’t think it’s a good idea for the compiler to silently prioritize numeric types. In some cases it’s safe, eg: ints can always safely be converted to doubles, but in others it’s not, eg: ints cannot always be safely converted to floats, because floats only have 2^24 bit precision, while ints have 2^32. Floats have greater ‘magnitude’ but less ‘precision’.

      Basically, the rules for matching numeric types in monkey are pretty simple – either you have an exact match or you don’t, in which case you may need to cast/do something. And I have no problem with that. From my experiences with c++, the combination of ‘liberal’ implicit conversions and function overloading can lead to some seriously confusing situations!

      1. Can I assume this means you could get the “implicit result” mentioned if you use two generic arguments? If so, that’s fantastic. That would basically let you use ‘Min’ on any two variables whose types support comparison with each other. Makes me wonder why ‘Min’ should only take one type to begin with.

        Something to note about this strategy is that if ‘Floats’ and ‘Ints’ aren’t comparable without casting, then the same rules can apply as they do now, but you also get the benefit of a more versatile function.

        This not only stops the incompatible-type issue, but also makes the standard library more useful, so I hope you’ll consider it.

        By the way, does this kind of type inference apply to all generic functions? Does this also mean methods get the same treatment? I’m really liking the way this is working out, so I’m hoping these are all correct assumptions. As always, great work, Mark.

          1. Yes, wasn’t totally sure what you meant!

            This stuff is all really just to do with the way function overload resolution is done and doesn’t affect anything else – you can still compare ints and floats for example.

             

             

  2. Why exactly does T[] work in place of T in an argument?  Does it resolve to a memory address of the (empty) array?  I’m glad it is working out, just confused by it, and doesn’t feel like a deterministic test.  Would a char[] argument be a workable replacement for a char  ?

    1. Why exactly does T[] work in place of T in an argument?

      Not 100% what you’re referring to here, but a generic argument of type ‘T’ can take *any* type, eg: an Int, a Float, an Int[], a ‘Thing’ object, a Thing[] etc,

      So if you have a generic function such as Test<T>( t:T ), this can be called with *any* type of argument. However, a generic function such as Test<T>( t:T[] ) can *only* be called with an array argument (however, the array elements can be of any type).

      Also note that monkey2’s generic are based on c++’s templates, so they are really just fancy ‘macros’ in a way! This means the compiler generates new versions of a generic function for each ‘set’ of type parameters. So things like ‘memory address’ and so on aren’t an issue – it’s just source code at this level.

Leave a Reply