Add Context and Endpoint classes to enable non-Communicator use-cases (#166)

This PR implements and closes #137. The new `Endpoint` and `Context`
classes expose the connection establishing functionality from
`Communicator`, which now is only responsible for tying together the
bootstrapper with a context.

The largest breaking change here is that
`Communicator.connectOnSetup(...)` now returns the `Connection` wrapped
inside a `NonblockingFuture`. This is because with the way `Context` is
implemented a `Connection` is now fully initialized on construction.

Some smaller breaking API changes from this change are that
`RegisteredMemory` no longer has a `rank()` function (as there maybe no
concept of rank), and similarly `Connection` has no `remoteRank()` and
`tag()` functions. The latter are replaced by `remoteRankOf` and `tagOf`
functions in `Communicator`.

A new `EndpointConfig` class is introduced to avoid duplication of the
IB configuration parameters in the APIs of `Context` and `Communicator`.
The usual usage pattern of just passing in a `Transport` still works due
to an implicit conversion into `EndpointConfig`.

Miscellaneous changes:
-Cleans up how the PIMPL pattern is applied by making both the `Impl`
struct and the `pimpl_` pointers private for all relevant classes in the
core API.
-Enables ctest to be run from the build root directory.
This commit is contained in:
Olli Saarikivi
2023-09-05 22:10:04 -07:00
committed by GitHub
parent 858e381829
commit 828be48b21
25 changed files with 626 additions and 327 deletions

View File

@@ -47,14 +47,16 @@ def setup_connections(comm, rank, world_size, element_size, proxy_service):
remote_memories.append(remote_mem)
comm.setup()
connections = [conn.get() for conn in connections]
# Create simple proxy channels
for i, conn in enumerate(connections):
proxy_channel = mscclpp.SimpleProxyChannel(
proxy_service.proxy_channel(proxy_service.build_and_add_semaphore(conn)),
proxy_service.proxy_channel(proxy_service.build_and_add_semaphore(comm, conn)),
proxy_service.add_memory(remote_memories[i].get()),
proxy_service.add_memory(reg_mem),
)
simple_proxy_channels.append(mscclpp.device_handle(proxy_channel))
simple_proxy_channels.append(proxy_channel.device_handle())
comm.setup()
# Create sm channels
@@ -66,7 +68,7 @@ def setup_connections(comm, rank, world_size, element_size, proxy_service):
for i, conn in enumerate(sm_semaphores):
sm_chan = mscclpp.SmChannel(sm_semaphores[i], remote_memories[i].get(), ptr)
sm_channels.append(sm_chan)
return simple_proxy_channels, [mscclpp.device_handle(sm_chan) for sm_chan in sm_channels]
return simple_proxy_channels, [sm_chan.device_handle() for sm_chan in sm_channels]
def run(rank, args):