Now, how will they behave together? (I mean Pi4 + Tinker)

So, for compatibility, I compiled the same version of openmpi-4.0.1 on both boards, then recompiled HPL benchmark against new openmpi, joined both boards in small MPI cluster via 1Gbps lan, assigned IPs, and fired up the benchmark... So, when working in pair the total performance is...

10 Gflops
(similar to a 10 years old AMD laptop I had ages ago..

. Still there is an improvement vs. each board working separately...

Here I launched xhpl with mpirun using hostfile, listing all the hosts (chiefly just two: pi4 and tinker, repeated as many times as the number of CPUs each of therm has, obviously /etc/hosts must list their IPs to be able to find each other)

pi@tinker:~/HPL$ /

opt/openmpi4/bin/mpirun -np 8 -hostfile hosts.txt xhpl
================================================================================

HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018

Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK

Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK

Modified by Julien Langou, University of Colorado Denver

================================================================================

An explanation of the input/output parameters follows:

T/V : Wall time / encoded variant.

N : The order of the coefficient matrix A.

NB : The partitioning blocking factor.

P : The number of process rows.

Q : The number of process columns.

Time : Time in seconds to solve the linear system.

Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 20352

NB : 192

PMAP : Row-major process mapping

P : 2

Q : 4

PFACT : Right

NBMIN : 4

NDIV : 2

RFACT : Crout

BCAST : 1ringM

DEPTH : 1

SWAP : Mix (threshold = 64)

L1 : transposed form

U : transposed form

EQUIL : yes

ALIGN : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.

- The following scaled residual check will be computed:

||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )

- The relative machine precision (eps) is taken to be 1.110223e-16

- Computational tests pass if scaled residuals are less than 16.0

================================================================================

T/V N NB P Q Time Gflops

--------------------------------------------------------------------------------

WR11C2R4 20352 192 2 4 515.90

1.0895e+01
HPL_pdgesv() start time Mon Jul 15 19:23:35 2019

HPL_pdgesv() end time Mon Jul 15 19:32:11 2019

--------------------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 9.58007276e-04 ...... PASSED

================================================================================

Finished 1 tests with the following results:

1 tests completed and passed residual checks,

0 tests completed and failed residual checks,

0 tests skipped because of illegal input values.

--------------------------------------------------------------------------------

End of Tests.

================================================================================