Compiling MESONH on Apple Silicon 2 : arm64

3 minute de lecture

Mis à jour :

Recently, I’ve been trying to help someone to install the Meso-NH atmospheric model on a new Macbook Air M1. In this post, I try to document how I partially succeeded. The complete x86_64 story has been documented earlier, here I’m interested with the native arm64 version.

In three words : easy to compile, there’s a strange bug during the execution of any of the produced binaries (Illegal instruction) somewhere in hwloc that might be due to some mis-alignment. I did not resolve this issue yet but found a way to circumvent it by using mpirun, which enables to run Meso-NH natively on the M1.

Compiling Meso-NH for arm64

Easy peasy, just install homebrew, git and open-mpi. Make sure to make the symbolic links to gcc, g++ and gfortran if needed, as described in the x86_64 post. Configure, make, that’s it !

Running Meso-NH natively on BigSur on a M1

As mentioned earlier, there’s an issue in the device discovery launched by hwloc which raises a SIGILL, here’s my backtrace

➜  src git:(MNH-54-branch) ✗ lldb ./dir_obj-LXgfortran-R8I4-MNH-V5-4-4-MPIAUTO-DEBUG/MASTER/DIAG                    

(lldb) target create "./dir_obj-LXgfortran-R8I4-MNH-V5-4-4-MPIAUTO-DEBUG/MASTER/DIAG"
r
bt
Current executable set to '/Users/UQAM/Code/MESONH/MNH-V5-4-2/src/dir_obj-LXgfortran-R8I4-MNH-V5-4-4-MPIAUTO-DEBUG/MASTER/DIAG' (arm64).
(lldb) r
Process 68887 launched: '/Users/UQAM/Code/MESONH/MNH-V5-4-2/src/dir_obj-LXgfortran-R8I4-MNH-V5-4-4-MPIAUTO-DEBUG/MASTER/DIAG' (arm64)
bt
2021-07-03 22:16:43.524938-0400 orted[68890:6584537] +[MTLIOAccelDevice registerDevices]: Zero Metal services found
2021-07-03 22:16:43.525970-0400 orted[68890:6584537] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!

2021-07-03 22:16:43.525981-0400 orted[68890:6584537] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!

2021-07-03 22:16:43.525986-0400 orted[68890:6584537] [CL_INVALID_OPERATION] : OpenCL Error : Failed to retrieve device information! Invalid enumerated value!

Process 68887 stopped
* thread #1, queue = 'com.Metal.DeviceDispatch', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e220820)
    frame #0: 0x0000000117b38e94 AGXMetal13_3`___lldb_unnamed_symbol3036$$AGXMetal13_3 + 160
AGXMetal13_3`___lldb_unnamed_symbol3036$$AGXMetal13_3:
->  0x117b38e94 <+160>: fmul   s0, s1, s2
    0x117b38e98 <+164>: fcmp   s0, #0.0
    0x117b38e9c <+168>: mov    w13, #0x44600000
    0x117b38ea0 <+172>: fmov   s1, w13
Target 0: (DIAG) stopped.
(lldb) bt
* thread #1, queue = 'com.Metal.DeviceDispatch', stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e220820)
  * frame #0: 0x0000000117b38e94 AGXMetal13_3`___lldb_unnamed_symbol3036$$AGXMetal13_3 + 160
    frame #1: 0x0000000117967754 AGXMetal13_3`___lldb_unnamed_symbol484$$AGXMetal13_3 + 1796
    frame #2: 0x000000011796ae38 AGXMetal13_3`___lldb_unnamed_symbol507$$AGXMetal13_3 + 52
    frame #3: 0x0000000192b7cf1c Metal`-[MTLIOAccelService initWithAcceleratorPort:] + 400
    frame #4: 0x0000000192b7cd54 Metal`+[MTLIOAccelService registerService:] + 160
    frame #5: 0x000000018ad43ec0 libdispatch.dylib`_dispatch_client_callout + 20
    frame #6: 0x000000018ad52fb4 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
    frame #7: 0x0000000192c2cff0 Metal`MTLIOAccelServiceRegisterService + 108
    frame #8: 0x0000000192b7cb4c Metal`+[MTLIOAccelDevice registerDevices] + 264
    frame #9: 0x0000000192ba53d0 Metal`invocation function for block in MTLDeviceArrayInitialize() + 1284
    frame #10: 0x000000018ad43ec0 libdispatch.dylib`_dispatch_client_callout + 20
    frame #11: 0x000000018ad45758 libdispatch.dylib`_dispatch_once_callout + 32
    frame #12: 0x0000000192b7c904 Metal`MTLCopyAllDevices + 244
    frame #13: 0x00000001178b6598 AppleMetalOpenGLRenderer`GLDDeviceRec::initWithDisplayMask(unsigned int) + 140
    frame #14: 0x00000001178bc734 AppleMetalOpenGLRenderer`gldCreateDevice + 216
    frame #15: 0x00000001ccefd1c4 libGFXShared.dylib`gfxInitializeLibrary + 1920
    frame #16: 0x00000001cd2adce0 OpenCL`___lldb_unnamed_symbol731$$OpenCL + 432
    frame #17: 0x000000018aeef608 libsystem_pthread.dylib`__pthread_once_handler + 80
    frame #18: 0x000000018af3c04c libsystem_platform.dylib`_os_once_callout + 32
    frame #19: 0x000000018aeef59c libsystem_pthread.dylib`pthread_once + 100
    frame #20: 0x00000001cd2adab4 OpenCL`___lldb_unnamed_symbol728$$OpenCL + 116
    frame #21: 0x00000001cd27f8ec OpenCL`clGetDeviceIDs + 168
    frame #22: 0x0000000110202548 libhwloc.0.dylib`hwloc_opencl_discover(backend=<unavailable>, dstatus=<unavailable>) at topology-opencl.c:86:13
    frame #23: 0x00000001101da5d4 libhwloc.0.dylib`hwloc_discover_by_phase(dstatus=0x000000016fdfd510, phasename=<unavailable>, topology=<unavailable>, topology=<unavailable>) at topology.c:3290:5
    frame #24: 0x00000001101e1fc8 libhwloc.0.dylib`hwloc_topology_load(topology=0x0000000110a18730) at topology.c:3493:5
    frame #25: 0x00000001107154dc libopen-pal.40.dylib`opal_hwloc_base_get_topology + 3756
    frame #26: 0x000000011064f520 libopen-rte.40.dylib`orte_ess_base_proc_binding + 3484
    frame #27: 0x0000000110d0705c mca_ess_singleton.so`rte_init + 5192
    frame #28: 0x00000001106814c8 libopen-rte.40.dylib`orte_init + 676
    frame #29: 0x00000001103e7264 libmpi.40.dylib`ompi_mpi_init + 912
    frame #30: 0x0000000110398fc4 libmpi.40.dylib`PMPI_Init + 120
    frame #31: 0x00000001102bdf3c libmpi_mpifh.40.dylib`mpi_init_ + 40
    frame #32: 0x0000000106106b0c DIAG`__mode_mnh_world_MOD_init_nmnh_comm_world at spll_mode_mnh_world.f90:46:30
    frame #33: 0x000000010568460c DIAG`__mode_io_ll_MOD_initio_ll at spll_mode_io_ll.f90:90:35
    frame #34: 0x0000000100002d50 DIAG`MAIN__ at spll_diag.f90:246:16
    frame #35: 0x000000010000cdb8 DIAG`main at spll_diag.f90:94:4
    frame #36: 0x000000018af11450 libdyld.dylib`start + 4
(lldb) 

I spent some time in order to see what was going on but was’t able, yet, to understand the problem. However, I discovered that if you launch the same binary using mpirun, the calls to hwloc are not exactly similar and circumvent the problem, making it possible to run correctly !

Thanks

I’d like to thank FXCoudert which helped me prune some branches related to gfortran and the M1 ecosystem.