Efficient synchronization mechanisms for scalable GPU architectures