Handover TCP/UDP connection between client and server

Posted by servermeta_net@reddit | ExperiencedDevs | View on Reddit | 26 comments

Let's say Alice wants to retrieve a resource from a large distributed system.

Alice connects to Server A, in Frankfurt, but the server is not holding the resource. Anyhow it knows that Server B, in Amsterdam, has it. What's the smartest way to get Alice the resource she's looking for? Both servers and Alice are using a modern linux distro, if it matters.

Here's what I thought:

- Server A could connect to Server B, retrieve the resource, and then pass it to Alice. This seems very inefficient.

- Server A answers to Alice that it doesn't hold the resource, but that Server B has it so she could connect to it. Seems bad from a latency point of view.

Is there a way for Server A to hand over the TCP/UDP connection from Alice to Server A? What options do I have to handle this scenario?

[-]

PlanckEnergy@reddit

Either of the 2 bulleted options could work well, depending on the constraints of the project.

Option 1 (A retrieves the resource and passes it to Alice) is not necessarily inefficient, but if the same resource is requested many times from A, you probably want to cache it at A.

Option 2 (Alice gets redirected from A to B) incurs an extra round trip, but that may be acceptable as long as A usually has the resources that clients expect it to have.

Handing over Alice's TCP connection from A to B in a way that's invisible to Alice sounds very hard or impossible.

Perhaps you're looking for a CDN, Anycast, or some such technology? But don't discount Option 1 + caching.

[-]

servermeta_net@reddit (OP)

But wouldn't Option 1 still add a roundtrip? Nothing tells us that Server A <> Server B connection is any better than Alice <> Server B connection.

[-]

davvblack@reddit

think of it this way: either the file is large, in which case, option 1 consumes significant resources, but the marginal cost of the extra single roundtrip is vanishingly small for option 2, or it isn't. It's never both so big you can't proxy it, and so small that an empty roudntrip is a problem.

It's never both ways. The extra roundtrip here is just a single HTTPS handshake and dns lookup, and there are ways to amortize that too.

[-]

Itchy-Science-1792@reddit

The extra roundtrip here is just a single HTTPS handshake and dns lookup, and there are ways to amortize that too.

oh my... where do I start to break this down:

DNS lookup
TCP syn/ack
Connection established
Certificates exchange (if applicable)
Keys exchange
Encryption

[-]

davvblack@reddit

imagine a spherical router radiating packets in all directions.

point is the sub-resource is either little, in which case the proxy call doesn't add that much (and you can keepalive the connection), or it's big in which case the up-front cost of between 1 and 7 things gets amortized over the course of the download.

[-]

Ka1kin@reddit

In practice, 1 is common in tightly integrated systems that manage small values, like databases. The client is generally unaware of where in a cluster data might reside, and connects to any node which will coordinate their request.

This is especially useful if there may be more than one resource fetched in a request, or there's something complicated about the request, like a join or aggregation.

It's also a better solution when stuff can move around. The nodes in the cluster are usually made aware of that sort of thing, but the clients might not be.

Option 1 also doesn't require there to be a network route from the client to the server with the value, which can be useful: you can firewall off the stores.

With complex requests (e.g. sql queries) a lot of compute can be spent in the query layer itself, so having a dedicated tier of hosts that coordinate requests can improve scalability. Or maybe the coordinators enforce access controls, and the stores don't.

The second option is common in loosely coupled systems, like the web. An HTTP 302 is exactly this, and they can chain.

Some KV stores (Redis) do this because they assume that the client is keeping up with cluster topology changes (which requires a client library that maintains local state and periodically refreshes it, which makes implementing that client harder). Redis, in particular, is used in applications where a network round trip is a huge latency cost, so maintaining that client state is a big win.

As for handover, no. A TCP connection isn't a "thing". It's just a source and destination address, and a bit of internal state in the two OSes. And there are often side effects to establishing a connection (NAT port forwarding, for example), so there's no way to move a transport layer connection. In theory, you could have a separate session layer in your network stack (see the osi seven layer model), but in practice, basically no one does.

[-]

PlanckEnergy@reddit

I suppose! I mean server-to-server latency is generally lower than consumer-to-server, but not always. I take your point.

I don't think there's any way to avoid an extra round trip if A doesn't have Alice's resource.

[-]

Tman1677@reddit

The main pro to Option 2 and why I see it a lot more in production is that it's far easier to generalize. Option 1 (with caching) is generally a better system if you only have one API the entire service does, but once you get to multiple you have to scale handling the redirects of all these requests. The obvious solution is to have the server forward along the exact same HTTP request body and everything, but that's a big security risk as you're opening up pass through of untrusted data. I think you could get away with it without parsing the HTTP body and only parsing/wrapping the headers but I'd honestly need to consult with a security team, I'm not sure they'd go for it. Generally when doing things like this, in order for it to be security compliant you need to deserialize and validate all inputs at each layer.

With Option 2 on the other hand, it easily generalizes to an infinite amount of APIs. You just standardize on a common error with the correct updated routing info attached and then can return that for any API which hits this issue. The "client" (usually actually a FE service since you don't want routing info to leave the datacenter) can route accordingly.

[-]

Itchy-Science-1792@reddit

Funny, I have my name on a patent dealing exactly with this (in mobile communications context).

Realistically speaking you are looking at a bog standard redirect. You can't switch over TCP connections to a new endpoint (for very good reasons) and whatever you could do with UDP is going to be custom (and therefore expensive to implement and maintain).

Is there an actual business case that requires this or just idle curiosity?

[-]

servermeta_net@reddit (OP)

Business case. I'm building a database, it's already used in prod at my company

[-]

Ontological_Gap@reddit

You can't "hand over" a TCP/UDP connection to a different server because different servers have different addresses, you can either proxy (option 1) or redirect (option 2). "Handing over" an existing TCP/UDP connection to another host would require customizing both your networking hardware and kernel TCP,UPD, and IP stacks, this path lies madness.

[-]

miredalto@reddit

True for TCP. Not true for UDP. Look up "UDP hole punching".

[-]

Ontological_Gap@reddit

It's not quite hole-punching, it's close tho. You'd need to get the NAT involved by processing some special message from Server A commanding it move the NAT address over to Server B. This would need to be a special protocol that you've taught your NAT hardware to understand. And the benefit would be saving the difference in latency from your host ->NAT x2 once, almost certainly not worth it (and definitely madness). It is pretty much exactly what I was imagining tho.

[-]

Alive-Pressure7821@reddit

Numbers would help here.

If the resource is 1GiB, and you expect a “miss” 90% of the time. Server A responding not-found / redirect to B would certainly be my choice (redirect latency would be insignificant compared to transfer at typical network speeds, A would not have to proxy large resources)

OTOH, if the resource was 100bytes, and the expected miss rate was 0.1%, server A forwarding to B to return to A would make more sense. Especially if resource is cacheable by A.

The location of Alice relative to Frankfurt and Amsterdam too. The redirect goes via Alice, so incurs that round trip latency. But if Alice is close by, that isn’t materially higher than A -> B latency.

Using QUIC for the transport (from Alice) would minimize the connection establishment time (in all cases).

Finally, if latency is really so important, you could make dual simultaneous requests for the resource to both A and B (and cancel the second response if both have it). Up to you if this doubling up is worth it.

[-]

ProfBeaker@reddit

If you control Alice as well, another option is telling Alice ahead of time where things are. Basically make the client smart, instead of the server. This may or may not be practical, depending on the situation.

ie, create a mapping that says which server has which data. Distribute that to clients. Then the clients just call the right server.

This obviously hinges on your ability to create and maintain that mapping, and have it be tolerably small. There are all sorts of algorithms for this. Consistent sharding is one.

[-]

Historical_Leek_9849@reddit

Use envoy proxy. You can create a cluster on each server that will either direct traffic to the local instance based on the health of this instance or to the other server based on health.

The local health will be determined by the status of the file existing. The non local health will require some endpoints to report the healthiness that envoy will continuously health check against if the file exists.

If the local sever does not have the file, traffic will be proxies to the other server

[-]

jepperepper@reddit

Either A or Alice has to connect to B, so no savings on that connection.

Telling Alice to go to B frees up A but requires Alice to make another connection, possibly requiring wait time.

A getting the resource from B and passing it back to Alice ties up all 3 machines and still has all the wait time issues.

So you just balance those 2 options depending on the rest of your requirements. There's no right answer that I can think of.

[-]

danikov@reddit

I can think of 3 approaches:

Alice connects to all servers at the start, so A can forward the request to B and B can immediately send the resource on the established connection.

Have a gateway G that keeps a single connection to Alice, but internally maintains connections to all nodes or uses a different protocol, so a dedicated proxy.

Have A start proxying to Alice while also giving Alice the details for a direct line to B then hand-off once Alice establishes a direct line.

[-]

Few_Wallaby_9128@reddit

Maybe I dont understand you, but how i would approach this is having the serverz push to a shared front repository a unique id (hash) od each object together with the server id, then Alice always needs two calls.

As an optimization, you can have this repo forward the call to the actual server (bwecomes one call then, but makes that api a single point of contentIon/failure, which you can fix by providing a bacplane distribute cache (such as redis).

You can also play with multiple levels of cascading domains feeding one way, i.e. statsles in the us, countries in europe or continents.

Or you can play into partitioing the repos, in this case if you say split it into 4 parts, then Alice can query all four partitions in parallel.in this scenario, particularly id there ar more partitions , udp would come in handy

[-]

NotGoodSoftwareMaker@reddit

I dont have an exact answer but you should bear in mind eventual consistency with distributed systems

Server B may have the resource but depending on hit rate your algorithm may need to compensate for this

[-]

miredalto@reddit

The redirect seems by far the simplest, easiest and most reliable option absent some special requirements.

An interesting case though is where both the client and resource holder are behind NAT, and you want the intermediate server to help broker a direct connection between them. If so, look up "UDP hole punching".

[-]

teerre@reddit

"Smartest"? What does that mean?

The appropriate way to do this depends on what's the goal. What kind of latency do you need? What kind of data is it? What kind of up time do you need? What happens if server A answers is wrong? Etc. Answering all questions will lead you to the natural solution. If it doesn't lead you to a solution, then it doesn't matter, just pick one from the remaining options.

[-]

DrFloyd5@reddit

What Protocol?

[-]

servermeta_net@reddit (OP)

Self built protocol.

[-]

DrFloyd5@reddit

Then I would build a redirect into the protocol. Like HTTP 302.

[-]

nutrecht@reddit

Please mention crossposts.