First Look at Java Valhalla: Flattening and Memory Alignment of Value Objects

[-]

emperor000@reddit

From what I have seen the only reason that proposal is complicated is for syntax reasons. Or do you just mean the actual implementation behind the syntax?

[-]

It’s not really the syntax that’s tricky, but the type model underneath. Since C# keeps value and reference types disjoint, the union proposal has to bridge them explicitly, handle boxing, nullability, and lifetime rules. In Java, the move toward unified types makes that boundary disappear, so the same idea looks simpler there.

[-]

emperor000@reddit

I don't know much about Java, but I don't think this is true.

C# already has something that is called "unified types" in that every object inherits from object. And that is how boxing and unboxing can be done, and from what I have seen, how it is being done for unions. But by "unified types" do you mean unions?

It also already has this "flattening and memory alignment of value objects", at least if that "objects" there has an abstract meaning and doesn't imply actual objects.

This value keyword in Java seems like it is essentially just C#'s struct.

Either way, I don't see how this will make unions simpler in Java. However unions work, it will still have to manage both types of data.

[-]

joemwangi@reddit (OP)

Mainly the type system. The object inheritance in C# is a nominal unification, not a representational one. As you said, types share a common root in the type hierarchy, but their runtime representation is still split such that value types live on the stack (or inline) and must be boxed to behave like references. By unified types I meant the model Valhalla is building in Java, where value and reference types share the same type system and the same representational semantics. A value class can be stored flattened inside another object or passed around without boxing, yet it’s still a class in the same hierarchy. Thus, as you point out, C#’s everything inherits from object unifies names, yet in Java’s upcoming model unifies behavior and representation.

In C#, flattening only applies within other structs or arrays of structs. Once a struct appears inside a class or interface, it’s boxed, the boundary between value and reference types stays firm. In Valhalla, flattening crosses that boundary. A value class field inside an ordinary class can be stored inline, no boxing, no reference indirection. Arrays and generics can also be specialized transparently, so List can be backed by a flat array of Points.

That’s why unions become simpler in Java’s model since there’s a single unified type system where identity, flattening, and nullability are runtime properties, not separate type categories. The same code can handle both “kinds of data” without bridging between ref- and value-worlds. It actually makes sense, because in this early build prototype, once a value class grows beyond what the JVM can efficiently scalarize or keep in registers (typically beyond \~64 bits), it just degrades into a regular heap object. The semantics stay identical; only the representation changes.

[-]

emperor000@reddit

I think there's some misunderstanding here, possibly because of your familiarity with Java and my lacking and mine with C# and maybe your lacking.

I think "boxed" probably has slightly different connotations in Java vs. C#.

Once a struct appears inside a class or interface, it’s boxed, the boundary between value and reference types stays firm.

In C# a value type inside of a reference type is not boxed, not in the way that "boxed" is normally used in C#. It's just part of the reference type that is stored on the heap.

In C#, flattening only applies within other structs or arrays of structs.

This is not true, unless we are also talking about different meanings of "flatten". I'm talking about the difference between List<int> and List<Integer>, with the latter being a reference to a collection of references to ints (i.e. boxed ints) and the former not being (previously?) possible because of Java's type system.

In C#, both are possible, but the second isn't really a thing because there's no reason to explicitly box a value type like that. The List<int> is always "flattened" in that it just contains an array of ints.

so List can be backed by a flat array of Points.

Right. And if Point is a struct (not a ref struct) in C#, then it is a flat array of Points.

The same code can handle both “kinds of data” without bridging between ref- and value-worlds.

So can C#. That is what "boxing"/"unboxing" refers to. The C# proposal(s) that I have seen for union types just use an object that can handle a reference type or a boxed value.

[-]

joemwangi@reddit (OP)

True, I can see flattening happening inside classes in C# (apologies for my oversight), but there are still fundamental limitations, and the points still stand. I had to dig deeper, and check if current inheritance and implicit rules still apply in java but not C#.

For example:

interface Coord {}
value record PointRecord(int x, int y) implements Coord {}
Coord[] pointRecords = new PointRecord[size]; //flattening occurs

//init pointRecords
for(int i = 0; i<size; i++)
  pointRecords = new PointRecord(...); //flattening intact

In C#, that would automatically box each element if PointRecord were a class, the array always stores references. But in Java with Valhalla, value record PointRecord(int x, int y) can be stored flattened inside the array, no per-element heap allocation. The JVM chooses layout adaptively.

Future covariance rules, combined with generic reification, will also allow flattening to propagate through generic abstractions like List<Coord>, where PointRecord implements Coord , something C# can’t currently do without introducing boxing or copying at interface boundaries.

For union types, this split between value and reference representations is exactly why the C# proposal had to define multiple categories (union class, union struct, ref union struct, and ad hoc union). Each exists to patch a different combination of storage rules and generic behavior, whether the union’s cases are heap-based, inline, or require ref semantics.

Right. And if Point is a struct (not a ref struct) in C#, then it is a flat array of Points.

Yes! List<Point> in C# is indeed flattened, and your absolutely right! But only for that exact concrete type. Once you introduce abstraction, say List<Coord> or Coord[] where Point implements Coord , C# can’t preserve flattening; it has to box or disallow it.

In Valhalla, that’s exactly what changes. A PointRecord implementing Coord can still be stored flat in a Coord[] or List<Coord> (not now, no reification yet) since the runtime unifies the representation. Thus flattening isn’t tied to lexical type identity anymore, it survives abstraction.

[-]

emperor000@reddit

In C#, that would automatically box each element if PointRecord were a class, the array always stores references. But in Java with Valhalla, value record PointRecord(int x, int y) can be stored flattened inside the array, no per-element heap allocation. The JVM chooses layout adaptively.

Right, because that is what a class is. Just like if in Java you didn't do value record it would store references. In C# you would make PointRecord a struct. C#'s struct is, for all intents and purposes here, to my understanding, the equivalent of value record.

Future covariance rules, combined with generic reification, will also allow flattening to propagate through generic abstractions like List, where PointRecord implements Coord , something C# can’t currently do without introducing boxing or copying at interface boundaries.

I don't think this is true. See here: https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.constrained?view=net-9.0&redirectedfrom=MSDN

For union types, this split between value and reference representations is exactly why the C# proposal had to define multiple categories (union class, union struct, ref union struct, and ad hoc union). Each exists to patch a different combination of storage rules and generic behavior, whether the union’s cases are heap-based, inline, or require ref semantics.

You must be talking about a different proposal to what I have seen. The proposal I am familiar with is basically a (struct) wrapper for object.

Yes! List in C# is indeed flattened, and your absolutely right! But only for that exact concrete type. Once you introduce abstraction, say List or Coord[] where Point implements Coord , C# can’t preserve flattening; it has to box or disallow it.

Again, I don't think that is (completely?) true.

[-]

joemwangi@reddit (OP)

Right, because that is what a class is. Just like if in Java you didn't do value record it would store references. In C# you would make PointRecord a struct. C#'s struct is, for all intents and purposes here, to my understanding, the equivalent of value record.

I was suppose to say if it was a struct and the LHS type is an abstract like an interface the array or type is heap allocated. Java value classes still obey the RHS rule. Unless atomicity comes to play >= 64 bits (for now). i.e. Valhalla can keep flattening consistent when you view it as its interface (Coord) or in a generic (a future implementation).

I don't think this is true. See here: https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.constrained?view=net-9.0&redirectedfrom=MSDN

But still boxing happens in some situations:

interface IPrintable { void Print(); }
struct P : IPrintable
{
    public void Print() => Console.WriteLine("struct");
}

class Demo
{
    static void Main()
    {
        CallGeneric(new P()); // constrained. call, no boxing
        CallInterface(new P()); // boxes to IPrintable
        CallObject(new P()); // boxes to object
    }

    static void CallGeneric<T>(T value) where T : IPrintable
        => value.Print();   // IL: constrained. !T, callvirt IPrintable::Print

    static void CallInterface(IPrintable p)
        => p.Print();       // IL: callvirt, requires boxing if value type

    static void CallObject<T>(T value)
    {
        object o = value;   // IL: box !T
        Console.WriteLine(o.ToString()); // IL: callvirt object::ToString
    }
}

You must be talking about a different proposal to what I have seen. The proposal I am familiar with is basically a (struct) wrapper for object.

Sure!

Again, I don't think that is (completely?) true.

Does this compile?

List<Point> pts = new();
List<Coord> coords = pts;

Coord[] arr = new Point[10];

[-]

emperor000@reddit

Okay, but that's because value classes aren't structs (which is something I didn't realize until I looked into it more. I had assumed that they were value types).

So your examples aren't really comparable between Java and C# because the same stuff isn't going on. Point would need to be a class to be comparable. And then List<> would need to be covariant, which it isn't (but you could just use IEnumerable<Coord> which is covariant).

And now that Point is a class, the last array example does compile because arrays support covariance.

I highly suspect that the above examples wouldn't compile in Java either, at least not without some kind of boxing/unboxing "magic". If it did, then how would Java handle something like this?

interface Vehicle{ }

value class Car : Vehicle
{

}

value class Truck : Vehicle
{

}

List<Car> cars = new();
List<Vehicle> vehicles = cars;

vehicles.Add(new Truck());

The above won't compile with classes in C# either, because List<> isn't covariant. If Java Valhalla can handle this, then it must be doing some kind of boxing/unboxing or something similar that would have performance implications. The "best" thing I can come up with is that List<Vehicle> vehicles = cars; actually gets compiled to something like List<Vehicle> vehicles = new List<Vehicles>(cars);, which would certainly be neat, but it's still essentially "boxing".

[-]

Willing_Row_5581@reddit

Which is actually quite an important distinction, for performance reasons.

[-]

joemwangi@reddit (OP)

Nope. Mutability prevents reliable scalarization because the JIT must assume aliasing, fields can change behind its back. That forces dependence on escape analysis and limits optimization to intraprocedural scopes. Without immutability, inter-method scalarization (or register-level promotion) simply can’t be guaranteed.

[-]

Willing_Row_5581@reddit

Well, I just happen to agree.