💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
Collective communication means all processes within a communicatorcall the same routine. Portable applications should assume thatcollective routines include a global synchronization. The following simple code fragment employs four basic collectiveroutines to manipulate a statically partitioned regular domain(one-dimensional in this case). The full domain length is broadcastfrom a root process to all others. The initial dataset is distributed(scattered) among the processes. After each compute step, a globalmaximum is determined for use by the root. The root then gathersthe final dataset. ~~~ #include <mpi.h> { int i; int myrank; int size; int root; int full_domain_length; int sub_domain_length; double global_max; double local_max; double *full_domain; double *sub_domain; MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &size); root = 0; /* * Root obtains full domain and broadcasts its length. */ if (myrank == root) { get_full_domain(&full_domain, &full_domain_length); } MPI_Bcast(&full_domain_length, 1, MPI_INT, root, MPI_COMM_WORLD); /* * Allocate subdomain memory. * Scatter the initial dataset among the processes. */ sub_domain_length = full_domain_length / size; sub_domain = (double *) malloc(sub_domain_length * sizeof(double)); MPI_Scatter(full_domain, sub_domain_length, MPI_DOUBLE, sub_domain, sub_domain_length, MPI_DOUBLE, root, MPI_COMM_WORLD); /* * Loop computing and determining max values. */ for (i = 0; i < NSTEPS; ++i) { compute(sub_domain, sub_domain_length, &local_max); MPI_Reduce(&local_max, &global_max, 1, MPI_DOUBLE, MPI_MAX, root, MPI_COMM_WORLD); } /* * Gather final dataset. */ MPI_Gather(sub_domain, sub_domain_length, MPI_DOUBLE, full_domain, sub_domain_length, MPI_DOUBLE, root, MPI_COMM_WORLD); } ~~~ ### Broadcast ~~~ MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm); ~~~ All processes use the same count, datatype, root, and communicator.Before the operation, the root buffer contains a message. After theoperation, all buffers contain the message from the root process. ### Scatter ~~~ MPI_Scatter(void *sndbuf, int sndcnt, MPI_Datatype sndtype, void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype, int root, MPI_Comm comm); ~~~ All processes use the same send and receive counts, datatypes, rootand communicator. Before the operation, the root send buffer containsa message of length `sndcnt * N', where N is the number of processes.After the operation, the message is divided equally and dispersed toall processes (including the root) following rank order. ### Reduce ~~~ MPI_Reduce(void *sndbuf, void *rcvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm); ~~~ All processes use the same count, datatype, reduction operation, rootand communicator. After the operation, the root process has in itsreceive buffer the result of the pair-wise reduction of the sendbuffers of all processes, including its own. MPI predefines reductionoperations, including: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND,MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR. ### Gather ~~~ MPI_Gather(void *sndbuf, int sndcnt, MPI_Datatype sndtype, void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype, int root, MPI_Comm comm); ~~~ All processes use the same send and receive counts, datatypes, root andcommunicator. This routine is the reverse of MPI_Scatter(): after theoperation the root process has in its receive buffer the concatenationof the send buffers of all processes (including its own), with a totalmessage length of `rcvcnt * N', where N is the number of processes.The message is gathered following rank order.