BPI-M3 Heatsink


(David Coles-Dobay) #1

Here are some images of the aluminum heat sink I designed for the BPI-M3.

It’s machined from 13mm 6061-t651 alloy. The bottom surface is height matched to the components on the board. I made 2 of them and will be making a rubber mold and macaque to send to BPI proper to get quote on manufacture.

Running all cores at 1.8 full load temps stay below 50C.

Without fan temps stay below 70C.

I did not draw these I mapped the board using a Renishaw Probe on a Milltronics RH30 CNC mill. Next time I am at the lab I will upload the GCode from the mill.


#2

What about real loads?

sudo apt-get -f -qq -y install libcurl4-gnutls-dev
wget http://downloads.sourceforge.net/project/cpuminer/pooler-cpuminer-2.4.5.tar.gz
tar xf pooler-cpuminer-2.4.5.tar.gz && rm pooler-cpuminer-2.4.5.tar.gz
cd cpuminer-2.4.5/
./configure CFLAGS="-O3 -mfpu=neon"
sudo make
sudo make install

Cheap $0.5 heatsink from Aliexpress combined with cheap $1 5V fan connected to pin headers:

https://i.imgur.com/xZGQroh.jpg

charles@bananapim3:~/cpuburn-arm$ minerd --benchmark
[2017-06-02 11:12:34] Binding thread 1 to cpu 1
[2017-06-02 11:12:34] Binding thread 3 to cpu 3
[2017-06-02 11:12:34] Binding thread 0 to cpu 0
[2017-06-02 11:12:34] Binding thread 5 to cpu 5
[2017-06-02 11:12:34] Binding thread 2 to cpu 2
[2017-06-02 11:12:34] Binding thread 7 to cpu 7
[2017-06-02 11:12:34] Binding thread 6 to cpu 6
[2017-06-02 11:12:34] 8 miner threads started, using 'scrypt' algorithm.
[2017-06-02 11:12:34] Binding thread 4 to cpu 4
[2017-06-02 11:12:40] thread 5: 4098 hashes, 0.64 khash/s
[2017-06-02 11:12:40] thread 0: 4098 hashes, 0.64 khash/s
[2017-06-02 11:12:40] thread 4: 4098 hashes, 0.64 khash/s
[2017-06-02 11:12:41] thread 7: 4098 hashes, 0.64 khash/s
[2017-06-02 11:12:41] thread 3: 4098 hashes, 0.64 khash/s
[2017-06-02 11:12:41] thread 1: 4098 hashes, 0.63 khash/s
[2017-06-02 11:12:41] thread 6: 4098 hashes, 0.62 khash/s
[2017-06-02 11:12:41] thread 2: 4098 hashes, 0.61 khash/s
[2017-06-02 11:12:44] thread 3: 2550 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 7: 2550 hashes, 0.64 khash/s
[2017-06-02 11:12:45] Total: 5.06 khash/s
[2017-06-02 11:12:45] thread 2: 2454 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 6: 2475 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 1: 2508 hashes, 0.59 khash/s
[2017-06-02 11:12:45] thread 5: 3210 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 0: 3201 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 4: 3201 hashes, 0.64 khash/s
[2017-06-02 11:12:45] thread 3: 642 hashes, 0.64 khash/s
[2017-06-02 11:12:49] thread 7: 3198 hashes, 0.64 khash/s
[2017-06-02 11:12:49] Total: 5.08 khash/s
[2017-06-02 11:12:50] thread 1: 2961 hashes, 0.63 khash/s
[2017-06-02 11:12:50] thread 6: 3192 hashes, 0.63 khash/s
[2017-06-02 11:12:50] thread 2: 3210 hashes, 0.63 khash/s
[2017-06-02 11:12:50] thread 5: 3222 hashes, 0.64 khash/s
[2017-06-02 11:12:50] thread 3: 3192 hashes, 0.64 khash/s
[2017-06-02 11:12:50] thread 0: 3210 hashes, 0.64 khash/s
[2017-06-02 11:12:50] thread 4: 3210 hashes, 0.64 khash/s
[2017-06-02 11:12:51] thread 7: 642 hashes, 0.63 khash/s
[2017-06-02 11:12:51] Total: 5.08 khash/s
[2017-06-02 11:12:54] thread 7: 2529 hashes, 0.64 khash/s
[2017-06-02 11:12:54] Total: 5.09 khash/s
[2017-06-02 11:12:55] thread 1: 3138 hashes, 0.63 khash/s
[2017-06-02 11:12:55] thread 6: 3159 hashes, 0.63 khash/s
[2017-06-02 11:12:55] thread 2: 3153 hashes, 0.61 khash/s
[2017-06-02 11:12:55] thread 5: 3219 hashes, 0.64 khash/s
[2017-06-02 11:12:55] thread 3: 3198 hashes, 0.64 khash/s
[2017-06-02 11:12:55] thread 7: 639 hashes, 0.64 khash/s
[2017-06-02 11:12:55] Total: 5.07 khash/s
[2017-06-02 11:12:55] thread 0: 3204 hashes, 0.64 khash/s
[2017-06-02 11:12:55] thread 4: 3204 hashes, 0.64 khash/s
[2017-06-02 11:13:00] thread 1: 3138 hashes, 0.63 khash/s
[2017-06-02 11:13:00] thread 6: 3138 hashes, 0.63 khash/s
[2017-06-02 11:13:00] thread 2: 3075 hashes, 0.63 khash/s
[2017-06-02 11:13:00] thread 5: 3219 hashes, 0.64 khash/s
[2017-06-02 11:13:00] thread 7: 3201 hashes, 0.64 khash/s
[2017-06-02 11:13:00] Total: 5.09 khash/s
[2017-06-02 11:13:00] thread 0: 3207 hashes, 0.64 khash/s
[2017-06-02 11:13:00] thread 4: 3210 hashes, 0.64 khash/s
[2017-06-02 11:13:01] thread 3: 3210 hashes, 0.64 khash/s
[2017-06-02 11:13:05] thread 3: 2556 hashes, 0.63 khash/s
[2017-06-02 11:13:05] thread 1: 3135 hashes, 0.62 khash/s
[2017-06-02 11:13:05] thread 6: 3144 hashes, 0.62 khash/s
[2017-06-02 11:13:05] thread 2: 3156 hashes, 0.62 khash/s
[2017-06-02 11:13:05] thread 5: 3216 hashes, 0.64 khash/s
[2017-06-02 11:13:05] thread 7: 3195 hashes, 0.64 khash/s
[2017-06-02 11:13:05] Total: 5.06 khash/s
[2017-06-02 11:13:06] thread 4: 3207 hashes, 0.64 khash/s
[2017-06-02 11:13:06] thread 0: 3207 hashes, 0.64 khash/s
^C

My other M3 (also 8 cores but Cortex-A53 instead of slow A7) scores almost twice as much when using friendlyarm provided fansink: http://www.friendlyarm.com/index.php?route=product/product&path=69&product_id=130

btw you know that running stock irqbalanced is a bit useless or even stupid on ARM?


(David Coles-Dobay) #3

Real ?

Look at the code for bitcoin mining and the like. You take the truncation of the cube roots of the first thirty two prime numbers and compare them against the message.

I have run 400 S1 ant miners 75KW producing 2.4 billion transactions a day. Each S1 ant miner has 64 ASIC chips.

The arm architecture is not even suitable for doing truncated cube roots. Instead an APU or AsiC is more suited.

I am actually using the chip for productive use what you see in the screen shot is MAKE running with the -s 8 option that will fully utilize ever aspect of the compiled kernel. The code mix is typical of real world use and therefore a better bench.

I am compiling a proper GCC toolchain to use with the arm7l series processors. I have been thinking about calling the distro arm7l-autosevo-linux-gnu since no one else that I have found is doing it. They have a different function set and math libraries than the cortex a8 or Cortex A6 that most of the PI stuff is using. Looking into the sunxi, linero, and aebi toolchains they are lacking software for several core advantages that the chip has. As far as calling neon from bit miner it will surely load up the registers but won’t do much work as the current 3.4 kernel does hot have the exact version of the libraries just the generic ones. There are two sets of NEON libraries one for 32 bit and one for 64 while this processor is 32bit it has the vector unit that allows the execution of 64bit code. The current 3.4 kernel implementation is lacking that. Also there are other optimized hardware calls that the Cortex upto 6 does not have and the default toolchain ignores, The toolchains match on the other subsets of ARMv7 like the arm7M and arm7TMDI but config.css and arm-mode.def are missing the correct target it instead builds a generic arm. The linero toolchain is inclusive but the GCC is version 4.9 and there is a lot mor built on 6.3 and above. SO I am going for the master branch of gcc to be up to date.

Also look at the bit miner code it is very obvious what they are doing it is simply iteration of hard coded values against an unknown.

As a side note the whole bit currency thing is a ploy to move you into bank owned digital currency and remove cash from the economy so that it can be centrally controlled. SHA256 is based on a random seed that is provided by /dev/rand this function of computing was provided by my old military unit and is only a 16bit sequence from the merzene prime list of primes between 0 and 1. The fact that we have truncation of the 32 rational base sets means that instead of the theoretical 2.23x10^43 possible combinations there are only 2^256x2^32-977 possible combinations for any message. Brute force is only one way to decrypt the sha256 package the other is just use the available key sets.

http://csrc.nist.gov/groups/STM/cavp/

As far as your cooling solution … Its good for about 2watts. You actually need to move the entire heat load @20W to enough dissipative area to allow 20LFM of air flow to warm. Pulling the heat along with it.

You should set aside what you have and try using an old northbridge cooler I am sure you can hacksaw an L shape to cover the ram processor and data chip. With the setup you have the device will not perform with any version of any testing.

Also I noticed from the photo you have rev1 of the board you for sure are not going to do much without using the 5v header for power. The voltage regulator that you are plugged into is limited to 9w 11w peak before the PMIC sends throttle signals to the chip. Get a proper power supply and cooler set and you will see improvement.


(David Coles-Dobay) #4

I have not compiled any software this is all sunxi provided and I agree. I am running the sunxi BPI provided mate image on the compiler.

My new kernel and repo will have only the correct stuff.


#5

There’s not a single ‘sunxi provided’ OS image since most developers simply avoid Sinovoip stuff. So you mean you took one of the shitty @sinovoip OS images and try to strip it down until it does not suck that much any more?

What about your minerd --benchmark results? How does your expensive milling adventure perform?


(David Coles-Dobay) #6

Minerd bench marks wha ha ha ha. Before the doubling in June of 2015 I built a 400 S1 Antminer farm. We did about 30 coins per day at 2.4 billion transactions. Not a miner but a feed source of many of the American mining pools.

Ok to keep it simple there is an ant miner usb device with asic called a ANTminer U2 it does 2.0G hash per second. There was a S1 Antminer 200G hash per second we used 400 of these units together as a pool seed. Thats 80,000G Hash per second or 80 Tera hash.

SO charles a ice sled does excellent on snow slopes but kind of sucks at the beach. The arm processor is completely the wrong tool for bit coin mining.

Compiling software is actual work and this M3 does pretty good. https://openbenchmarking.org/s/Compiler Benchmarking with bitcoin is really not

The best lite coin miner I saw before the sweep was the R9 AMD video cards I built 4 boxes with 4 cards each. The APU is more suited for coin mining than a CPU or RISC CPU.

Like I said you need an actual heat sink and actual power for the M3 board to work correctly.


(David Coles-Dobay) #7

My real interest in the processor is for a completely different application. I need a packet altering board Tilera sold MUllenex so the 72 core Tilera GX network processor became less cost effective, I need to support a TOR like backhaul that will have a class A subnet behind it. I am currently using XEON phi processors but the Cost per card just quadrupled and the software has to be compiled on the Intel IDE that does not have the core functionality I need. ARM is actually different than a conventional x86 PC. ArmK the intel version is all back revel on versions to meet the REDHAT Enterprise release schedule. The Tileras are about 28K USD a board and the software is now stuck in back rev hell. http://www.mellanox.com/repository/solutions/tile-scm/

These A83T processors have great potential for my application. I can populate 8 on a PCI-E card and network them with 9600Mbs network PCI links like the Xeon. This is kind of what I am working on. http://www.advantech.com/products/half-length_pcie_card1/dsp-8684/mod_7fa46fcd-ca8d-4f3d-808f-419cf50119aa

This is a 60 core Xeon Phi 1GHz with a 160X radiator attached for cooling it gives 120 threads per card at about 4.5K USD per card.


(David Coles-Dobay) #8

Almost all of the compiled arm stuff I have seen was built on one of three toolchains. SUNXI , EIBI and LInero. The mis configuration is in the fact that the GCC does not recognize this particular chip by it’s ident. On the SunXi tool chain it’s defaulted to arm Cortex A5 on linero it is defaulted to Cortex A6. It is partly because the chip is outside of the official ARM groups released chips and partly I think due to all the bitching and whining about it causing a lot of animosity when the group may have been better served by just doing what I am doing. Build a separate tool chain and repo. Here is the listing of codes for ARM7 that the compiler recognizes no Arm7L on the list.


#9

minerd --benchmark is able to show the efficiency of heat dissipation when throttling is likely to occur (it’s easy to understand this). Since you fail to provide numbers I guess your expensive milling adventure performs as poor as my cheap setup. Thanks for providing walls of text without any contents.

Wish you good luck.


(David Coles-Dobay) #10

You know Charles I believe that open forums are plagued by the decent into argument.

I see what you have and what you believe they are indicative of a lowly troll.

I provide you with real information to allow you to learn where your assumption is wrong.

Your 50cent investment in solving your issues does not afford you the expertise to say anything at all.

As for performance benchmarks, shows you as a complete and utter fool using a bit coin script as a benchmark tool is absolutely hilarious do you even know any code.

Your demand that some one else do the same as you is just freezing ridiculous.

I think that you must have bought a cheap board and failed to do your dream and now hate the fact that any one is pleased with the same board.

My walls of text are because my thoughts are more complex and deeper than the 140 character limit of your attention. They call them tweets because they are noises from bird brains.


#11

You’re funny. So much text and zero understanding. minerd is able to show the efficiency of heat dissipation when throttling occurs. In numbers. Since you don’t provide these numbers it’s easy: your heatsink is as inefficient as mine :slight_smile:

Again: good luck! You’ll have a great time here talking to yourself. Enjoy!


(David Coles-Dobay) #12

Charles you must be a complete and absolute troll.

If you read all of your posts and look at the child like way you attempted to cool your board you come to the conclusion that you really have no idea what you are talking about.

Your bullcrap bully technique of picking out one inane little thing and happing on it is obvious.

NO you are a complete and utter moron who is trolling people. You are that lowly pice of crap a forum troll.

If you knew even the slightest thing about bit mining or code you would know that bit coin mining is not a bench mark that is valid at all. And no I will not be showing bit coin mining stats for any device not specifically designed for the SHA256 algorithm.

When I finish the kernel toolchain and build the kernel correctly I will post all the results.

CHARLES=TROLL my last word to you.


#13

You’re so funny. So much text and insults all just to compensate for you not being able to show how efficient your heatsink is (since it’s not :slight_smile: ).


(David Coles-Dobay) #14

CHARLES = TROLL

Every post from charles is adversary troll


#15

Would you have such a ‘Cooler’ for me too? :wink:

It looks perfect. Is it alu?

Greetings from Austria


(David Coles-Dobay) #16

They are expensive for me to make +200 USD I can if you have the budget. My lab is not set up for production only prototyping so it is very costly for small items.

I am forwarding the design to Sinovoip to find asian casting at much lower cost.

If you have access to a 3d printer you may want to form a bracket that will hold a copper northbridge cooler. A cooler like this one will work nicely as well. http://www.acousticpc.com/swiftech_mcx159cu_all_cooper_universal_north_bridge_chipset_cooler.html


(David Coles-Dobay) #17

I posted REAL benchmarks and Results.


#18

Only +200 USD, that’s a bargain! Especially for the ‘performance gain’!

Ok, you got with your super cheap heatsink sysbench 20000 results of less than 52 seconds or with the other test a little above 53 seconds (what’s that ondemand/performance thing?). Now let’s google for ‘sysbench 20000 banana pi m3’:

  • first hit: ‘$0.5 heatsink that performs ok’ and ‘sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8" in less than 53 seconds’
  • second hit: the same guy asking @sinovoip but to no avail
  • third hit: without heatsink it’s 92 seconds

Impressive results! For just $200 REAL performance improves by unbelievable 75 percent compared to no heatsink. And when comparing to a heatsink that costs 400 times less it’s at least an improvement of ZERO percent. REAL benchmarks and not FAKED ones as this cpuminer stuff above!

Good luck!


(David Coles-Dobay) #19

Let’s see your numbers with my provided software.

You are proving my point for me…

I am not currently selling the heat sinks as you can see. I am providing the design to Sinovoip Inc. threw Ammorexue. The goal is to get them made inexpensively in asia.


#20

You’re so funny. You provide REAL benchmarks using sysbench 20000, get a result of ~52 seconds and don’t understand the meaning of other people without your AMAZING or let’s say REAL heatsink scoring 53 seconds with just an average and cheap heatsink?

Good luck!