GB Ethernet shield to have 2 connections on a Beowulf head node

MichelTrahan · April 1, 2016, 4:29am

Anybody have suggestions on how to add a second GB Ethernet connection on a BPI-M3 ?

I would use the BPI-M3 in a 4-8-16 nodes Beowulf cluster. The head node needs 2 connections … one to the internal network of the worker nodes and one for the LAN …

sinovoip · April 1, 2016, 9:10am

for A83T ,just support one ethernet port, if we try do add a Lan ,must add a extend swithc or rourter chip on bard, but all switch chip have support 5 port RJ45, so , it become same as BPI-R1:)

we now doing a new product , call it BPI-R2, but not use allwinner chip on board.

tkaiser · April 1, 2016, 10:41am

Use a GbE Ethernet USB dongle or another device as head node.

MichelTrahan · April 1, 2016, 3:01pm

Thanks ! I was thinking about it but wanted something connected directly … from what I understand that is impossible with the BPI-M3 … so usb it will be ! I really like the BPI-M3 for its speed, memory but mostly the L1 and L2 caches … L1 at 512kb is close to my heart

What is the time frame for the BPI-R2 ? I have about 3 months before I buy the products lol

Dragan · April 4, 2016, 7:46am

And what kind of workload do you plan to run on the boards then? Just curious.

MichelTrahan · April 4, 2016, 11:58am

Mostly for physics … Monte Carlo simulation … Finite elements … statistical analysis … the usual !

Dragan · April 4, 2016, 12:38pm

And why not dropping the slow cluster approach and doing this with a NVIDIA Jetson TK1 on the 192 CUDA cores using GPGPU acceleration at a fraction of costs/consumption?

When A83T has to run multithreaded workloads it needs a lot of power and heat dissipation becomes a problem: http://linux-sunxi.org/Banana_Pi_M3#Sudden_shut_offs_.2F_maximum_consumption_.2F_cooling_vs._consumption

MichelTrahan · April 4, 2016, 5:34pm

twice the price … what does : 2 GB x16 Memory with 64-bit Width mean … 32Gb ? Expandable using 2GB sticks up to 16 ? If so … wow ! ok at twice the price, I’ll take that lol

and General Purpose GPU is new to me … this is when I notice that I am only a physicist … not a computer scientist ! I come from C then Visual Basic (3 to 6) (fast prototyping platform) and now from Python (just learning … last 3 months … spare time … cool language by the way).

The goal of my project is to learn how to build a Beowulf (on a local network, distributed on all available machine on the network, or on a standalone cluster (my ultimate goal)).

I am therefore learning about R, Python, Fortran and C usage of distributed processing. I am now looking at CUDA because of your comment (even see CUDA for Python) ! I know that R and Python are interpreted languages but they are great for prototyping algorithms … even thinking of doing my own load balancing and message passing interface … but I digress …

On the heat problem, I did read that when not using peripherals it is ok … since I don’t plan to on the standalone Beowulf cluster I should be ok … Only ram, sd card and network (ok each one plugged into a KVM switch). I liked the L1 of 512K of fast memory … this is where the GPGPU might be of importance but … I lack knowledge

All this started when I bought a raspberry Pi 2 … and programmed the GPIO in python … did my own security system … I was impressed !

So now I am learning that SoC are very interesting, I just don’t know much … I do see a fruit boll (raspberry, banana, orange lol) and help is always welcome !

Dragan · April 4, 2016, 6:42pm

If you want to learn clustering I would start with a couple of cheap Orange Pi One or PC. Or wait for the first reviews of Banana Pi M2+ (but there’s not much to test since it’s more or less a clone of Orange Pi Plus regarding hardware settings and pin mappings, so you get Gbit Ethernet and eMMC as with the M3 but since H3 is pretty good supported by 3rd party distros you don’t have to use the somewhat… horrible… OS images provided here)

Regarding M3, heat and peripherals you should read again. If you improve heat dissipation (and you’ll need a strong fan + heatsinks) one then M3 will show superiour performance and you also have to take care that you can provide ~10W per board.

If your cluster workloads are CUDA capable then relying on slow ARM clusters is just a bad joke (if you really need performance and do not only want to learn building clusters). A cheap PC using one single rather cheap NVIDIA GTX 770 card comes with 1536 CUDA capable cores running at +1GHz. You will need a few hundred Bananas, network switches like crazy, liquid cooling and a bunch of fat PSUs… and won’t be faster.

Clustering is nice for a few use cases, but if your problems are ‘GPGPU capable’ then it’s a bad idea. Or the other way around: Using a server mainboard where you use a couple fat NVIDIA GPUs is always the better idea.

MichelTrahan · April 5, 2016, 3:23am

Looking at JETSON TX1 MODULE … 4 of them would make a great beowulf cluster for about 2K (1024 CUDA cores) … kind of like the MPCore Processor and the fact that linux is pre-flashed …

the pricing on the GTX 770 card is too high to fit into the relative budget (between 2K and 3K)

Harnessing the computing power of things laying around is also a goal … so clustering is a goal in itself this is more for feasibility then for performance … though, looking at the most performance for target price !

Dragan · April 5, 2016, 5:58am

I get 4 GTX 770 cards for 1K. And I don’t understand why you’re now talking about TX1 when I mentioned TK1 instead (both have nearly the same count of CUDA cores but the first costs nearly 3 times more than the latter): http://elinux.org/Jetson

Anyway: if I were you the last device i would take into account for clustering is the M3. Worst software and support of all the products here and only shining when looking at specs. If I would think about learning clustering I would choose cheap H3 devices ATM, if it’s about energy efficient stuff running on ARM cores I would choose ARMv8 (Cortex-A57 or A72) since with optimised code they are magnitudes faster then the octa core A83T used with M3. And if the problem is GPGPU ready I would get a mainboard with 4 x16 PCIe slots and benefit from 6000 CUDA processors doing the job with less latency and less interconnection problems for a fraction of the costs of any ‘cluster’ approach showing the same performance.

Beowolf clustering was a hot thing… probably 20 years ago.

sinovoip · April 5, 2016, 7:22am

we have finish BPI-R2 SCH design , need to do layout now. and main chip is MTK, once we ready ,will let you know.

Dragan · April 5, 2016, 7:35am

So you copied this time Benn Huang’s design https://mqmaker.com/product/witi-board/ and schematic can already be found here?

sinovoip · April 5, 2016, 7:42am

i also ask you: please stop crying in every possible thread

you even not know what we do. just say more about this ,and say “copy”, please just keep your objective attitude.

MichelTrahan · April 5, 2016, 3:34pm

might help to know german … I don’t … and with the GTX 770, you need a computer … and … with that we are far from the very old Beowulf cluster concept …

I finished physics in 1989 (27 years ago) and I was dreaming of building one at the time … forgot about that and now … finding it again !

It is a question of :

 1- harnessing the compute power of my 7 systems lying around my house (a project in itself, 4 raspberry pi 2, all running some variations of linux, 2 of them security systems hook to doors and Pir sensors, 2 others for research (fun), 2 tower linux systems, one for dev, one for my cnc, and last, my portable pc, all that on my Gb network), and 

 2- building a Beowulf cluster with cheaper systems available now ...

Now, imagine a Beowulf system built with super nodes … like the TX1 (thanks for your GPU heads up) … at around 2 to 3K … concentrate your suggestions in that direction please … and be happy ! At 53, bitching mode is off … I do appreciate concerns about things though but I like to be positive …

I talked about the TX1 because I found it … and I liked it … it was another possibility and it fared better than the TK1 …

TX1 has 256 cores the TK1 has 192 … ok that is close … TX1 has 4 Gb ram the TK1 has 2 … ok that is not close …

TX1 is cortex-A57 but TK1 is Cortex-A15 … and TX1 is MPCore, I have no idea about the TK1 …

and TX1 comes pre loaded with linux … that is a plus for me

I looked at Anaconda and … wow !

I’ll add a 64Gb SD card (or microSD depends on what I buy) to all of them, to have more dead memory close by each processor. The cluster will also have a 3Tb private cloud hard disk (I have not selected it yet … I do have one at home and every computer is mounting it all right)

On the fun part, have you looked at BlueDBM from MIT ? You got to love that