Hacker News new | ask | show | jobs
Online Labs – ARM servers in the cloud (labs.online.net)
79 points by Remiii 4264 days ago
15 comments

I haven't used Online.net in a while but they're on Twitter, on IRC, they have a forum based on Discourse[1], now this. It's like a fresher OVH. French, too.

The servers have four logical processors like this

    processor       : 0
    model name      : ARMv7 Processor rev 2 (v7l)
    Features        : half thumb fastmult vfp edsp thumbee fpv3 tls idiva idivt vfpd32 lpae 
    CPU implementer : 0x56
    CPU architecture: 7
    CPU variant     : 0x2
    CPU part        : 0x584
    CPU revision    : 2
[1] https://community.cloud.online.net/
Thanx for posting this info. The cpuinfo matches Marvell Armada XP.
Oh, yeah, I forgot to append that

    Hardware	: Marvell Armada 370/XP (Device Tree)
    Revision	: 0000
    Serial	: 0000000000000000
dmesg output mentions Armada XP pinctrl and xor engine too.

  free -m                                                                       
               total       used       free     shared    buffers     cached                             
  Mem:          2020         97       1923          0          6         40                             
  -/+ buffers/cache:         49       1970                                                              
  Swap:            0          0          0
Server seems to be located in France

  {
    "as": "AS12876 ONLINE S.A.S.",
    "city": "",
    "country": "France",
    "countryCode": "FR",
    "isp": "Tiscali France",
    "lat": 48.86,
    "lon": 2.35,
    "org": "Tiscali France",
    "query": "212.47.232.90",
    "region": "",
    "regionName": "",
    "status": "success",
    "timezone": "Europe/Paris",
    "zip": ""
  }
I think the big use case for ARM in datacenters, over the next few years, is for servers whose CPU usage is very low today--they're consistently network-bound or they just act as a relatively dumb interface to RAM or disk (memcached, some distributed DBs, some dumb proxies). Baidu uses ARM for cloud storage, Facebook used AMD servers for memcached despite their lagging Intel on speed. Basically, you look elsewhere when a Xeon is too much.

Someday comes a point where apps that actually are compute-bound might want to use more, slower cores for power/density/cost/etc.--I just don't think that cutover is tomorrow for the kind of apps (most of) you or I work on.

Further out: This is a Marvell-designed core that looks slower than the Cortex-A15-based Tegra K1 in a Chromebook (posted results elsewhere in the comments; it could be a clock-speed issue, not anything inherent to the core designs). Further out, there're some 64-bit ARM cores (Cortex-A57, X-Gene, Project Denver though that may not wind up in servers) and at process bumps (like TSMC 20nm). Related, check out http://www.anandtech.com/show/8580/hp-appliedmicro-and-ti-br... if you haven't. Of course, Intel isn't sleeping, and low-power x86 chips will improve, too; there will be 14nm versions of the Atom-based Xeons someday. As ever, fun times.

This Marvell SOC has 16 Serdes integrated inside the SOC can be partition to Gige ethernet, SATA, or PCIe in any numbers of ways.

http://www.marvell.com/embedded-processors/armada-xp/

Along with low power, make it very interesting.

I can see someone put 64, 128 of them in 1U chassis. This might be interesting low cost system for someone who need simulcast live video streams to millions of users at really low cost.

"64 4 cores CPU each with integrated 16 GIGE ports for fan out live video stream that potentially fit inside 1U chassis"

128 Armada cores in 1U is nearly impossible due to cooling issues.
May be not for Armada, but most of cell phone ARM cpu, the power is 1-2 watts at max frequency.

If that hold true, 128 SOC in 1U can be 2-300 Watts. That can true ly go against x86 for "some" applications.

Actually 3-5 for current gen. And you don't count all other things that are necessary for practical system: PCI-E connectors, RAM, Ethernet PHYs, etc.

P.S. I believe in 64 in 2U, though.

More data:

  ubuntu@c1-10-1-2-29:~$ dd if=/dev/zero of=test1 bs=1M count=512                                                               
  512+0 records in                                                                                                              
  512+0 records out                                                                                                             
  536870912 bytes (537 MB) copied, 5.43834 s, 98.7 MB/s

  ubuntu@c1-10-1-2-29:~$ dd if=test1 of=test2 bs=1M count=512                                                                   
  512+0 records in                                                                                                              
  512+0 records out                                                                                                             
  536870912 bytes (537 MB) copied, 5.869 s, 91.5 MB/s

  ubuntu@c1-10-1-2-29:~$ dd if=test2 of=/dev/null bs=1M count=512                                                               
  512+0 records in                                                                                                              
  512+0 records out                                                                                                             
  536870912 bytes (537 MB) copied, 0.60429 s, 888 MB/s
Not impressive at all. Performance is not different from RK3188. Disk speed is also not impressing. Not good example of ARM server.

I wonder what advantages this product have if any?

                          | Online Net   | RK3188
  Coremark (single)       | 3288.391976  | 4745.333755
  Coremark (dual cpu)     | 6579.488445  | 8505.209441
  Coremark (4-cpu)        | 12985.958932 | 13930.001741
  dhrystones              | 3204101.2    | 5810575.0
  linpack_dp              | 96396.624    | 280673.07
  linpack_sp              | 141666.344   | 286485.812
  nbench ASSIGNMENT       | 5.4852       | 11.569
  nbench BITFIELD         | 1.8627e+08   | 3.1242e+08
  nbench FOURIER          | 4346.6       | 9248.2
  nbench FP EMULATION     | 94.734       | 143.8
  nbench HUFFMAN          | 1035         | 1444.5
  nbench IDEA             | 2035.4       | 1963.3
  nbench LU DECOMPOSITION | 183.26       | 459.03
  nbench NEURAL NET       | 8.0897       | 13.392
  nbench NUMERIC SORT     | 573.47       | 733.99
  nbench STRING SORT      | 56.463       | 108.94
  scimark Composite Score | 113.53       | 230.20
  scimark FFT             | 121.34       | 199.90
  scimark LU              | 92.63        | 279.92
  scimark MonteCarlo      | 64.20        | 81.53
  scimark SOR             | 191.28       | 420.49
  scimark Sparse matmult  | 98.23        | 169.17
  stream Add              | 1239.3447    | 1615.2325
  stream Copy             | 1168.6045    | 1147.0347
  stream Scale            | 926.4318     | 1599.6019
  stream Triad            | 1066.3372    | 1528.7064
  stream_omp Add          | 3159.7990    | 1271.3516
  stream_omp Copy         | 2603.3387    | 1193.8474
  stream_omp Scale        | 2372.8052    | 1653.1527
  stream_omp Triad        | 2595.0168    | 1245.1069
I wonder if the development of this idea will be hampered because of the ARM architecture, or if on the contrary it will boost ARM compatibility from developers. They claim they can have a better density of instances with physical ARM chips than with virtualized x64 instances, and still use less power. If this takes on it can be amazing.
> Follow @online_en on Twitter and send us a direct message with your email.

Don't they have to follow YOU to be able to do that?

Yes. They won't be getting anybody sending them a direct message.
Just tweet them and they'll follow you.
Thanks, used IRC, preferred that anyway, was just curious if I missed something regarding twitter.
For a while you could configure your account to allow DMs from people you don't follow. That was removed, but I suspect it may still exist for big companies/verified accounts.
Something that has hopefully not been overlooked for the paid version - making sure there is an ability to restart from the control panel (or whatever) - after issuing the shutdown command as root I thought refreshing the page might attempt to bring it back up but alas I was locked out for the rest of the 15 minutes.
Argh.. 15 minutes wasn't enough to download openjdk7 and run some tests :-(. Can anyone invite me?
This is really cool. The price is obviously going to be a question. The other thing I wonder is if this is more or less reliable than a VPS, in case of hardware failure.
Their next generation should include a SoC with a GPU. GPUs make these little processors even more interesting.
Isn't the whole point of "the cloud" to abstract away the specific hardware you're using?
Nope, the cloud is about automation and self-service.

If the provider tells you what you're getting you can always ignore that information if you don't care. But some customers care about hardware specifics.

Yes if you're utilizing a service built in "the cloud". No if you're implementing said service.
You can access the service preview, ask an invitation on twitter @online_en or on irc.online.net #onlinelabs

    sudo rm / -rf --no-preserve-root
but it's not as fun as it sounds
Nice option, how much compute power does these ARM servers offer though?
Really silly test

10$/mo droplet

  $ sysbench --test=cpu --cpu-max-prime=2000 run
  sysbench 0.4.12:  multi-threaded system evaluation benchmark

  Running the test with following options:
  Number of threads: 1

  Doing CPU performance benchmark

  Threads started!
  Done.

  Maximum prime number checked in CPU test: 2000


  Test execution summary:
      total time:                          1.5297s
      total number of events:              10000
      total time taken by event execution: 1.5219
      per-request statistics:
           min:                                  0.14ms
           avg:                                  0.15ms
           max:                                  4.68ms
           approx.  95 percentile:               0.16ms

  Threads fairness:
      events (avg/stddev):           10000.0000/0.00
      execution time (avg/stddev):   1.5219/0.00
C1

  $ sysbench --test=cpu --cpu-max-prime=2000 run
  sysbench 0.4.12:  multi-threaded system evaluation benchmark

  Running the test with following options:
  Number of threads: 1

  Doing CPU performance benchmark

  Threads started!
  Done.

  Maximum prime number checked in CPU test: 2000


  Test execution summary:
      total time:                          27.0053s
      total number of events:              10000
      total time taken by event execution: 26.9926
      per-request statistics:
           min:                                  2.69ms
           avg:                                  2.70ms
           max:                                  2.84ms
           approx.  95 percentile:               2.72ms

  Threads fairness:
      events (avg/stddev):           10000.0000/0.00
      execution time (avg/stddev):   26.9926/0.00
For comparison, Cortex-A15-based (32-bit Tegra K1) Acer Chromebook 13:

    sysbench 0.4.12:  multi-threaded system evaluation benchmark
    
    Running the test with following options:
    Number of threads: 1
    
    Doing CPU performance benchmark
    
    Threads started!
    Done.
    
    Maximum prime number checked in CPU test: 2000
    
    
    Test execution summary:
        total time:                          8.8170s
        total number of events:              10000
        total time taken by event execution: 8.8083
        per-request statistics:
             min:                                  0.83ms
             avg:                                  0.88ms
             max:                                 21.43ms
             approx.  95 percentile:               0.95ms
    
    Threads fairness:
        events (avg/stddev):           10000.0000/0.00
        execution time (avg/stddev):   8.8083/0.00
Total time is 2.4926s with --num-threads=4.
Not that it'd cover for the difference but note that you're only using one of the 4 threads of the C1.

If I'm not mistaken you only have 1 CPU for a droplet at this price.

  ubuntu@c1-10-1-18-157:~$ sysbench --test=cpu --cpu-max-prime=2000 --num-threads=4 run                                                                           
  sysbench 0.4.12:  multi-threaded system evaluation benchmark                    
                                                                                
  Running the test with following options:                                        
  Number of threads: 4                                                            
                                                                                
  Doing CPU performance benchmark                                                 
                                                                                
  Threads started!                                                                
  Done.                                                                           
                                                                                
  Maximum prime number checked in CPU test: 2000                                  
                                                                                
                                                                                
  Test execution summary:                                                         
      total time:                          6.7674s                                
      total number of events:              10000                                  
      total time taken by event execution: 27.0485                                
      per-request statistics:                                                     
           min:                                  2.69ms                           
           avg:                                  2.70ms                           
           max:                                  7.00ms                           
           approx.  95 percentile:               2.70ms                           
                                                                                
  Threads fairness:                                                               
      events (avg/stddev):           2500.0000/17.36                              
      execution time (avg/stddev):   6.7621/0.00
The Arm cores are also dedicated while the one on digital ocean is shared.
Not a scientific measure by ANY measure, but a similar core I googled appears to kick out about 200 bogomips whereas a virtual Xeon E5-2690 v2 core on one of my machines knocks out 5984 bogomips.

I have 20 of those Xeon cores and 128Gb of RAM in a 2U.

Comparing the ratio of bogomips you'd have to get 598 of those ARM machines in a 2U to get the same bogomips.

Like I said this isn't even slightly scientific but is at least interesting trivia.

ARMv7 means they are probably using 32-bit Cortex A9 processors. Those are quite old, and probably on a 40nm process. The state of the art right now are these from Applied Micro:

http://www8.hp.com/us/en/products/proliant-servers/product-d...

AMD will enter the market soon, too, but I think Applied Micro will hold its first mover advantage with its 3rd gen chips coming next year.

http://www.eetimes.com/document.asp?doc_id=1324104

I found other source where they said it's around 1200 bogomips for a single core. That would means that you only need 5 times more core which is far from being an issue, 100 cores, which means only 25 processors.
Interesting!

However if you want one fast core, you're screwed :)

Yes, but considering that newer CPUs do not have increasing frequencies, I guess you are more or less doomed to scale horizontally, and not vertically anymore.
Thanks for the trivia. Just to continue on this path, how many virtualized instances can you put on your machine ?
Not tried to push it but it has 20 Windows Server 2012 R2 instances running on it at the moment all with 8Gb of memory (this is overcommitted dynamic memory). Disk is on a SAN larger than my kitchen. I span up a Linux VM quickly to do a bogomips on :)

I can probably push 40 of those onto it without it bending too terribly. If I knock the RAM down to 2Gb an instance I could probably quite happily get 64-100 on it in theory. I think memory bandwidth might kill it before CPU does.

We have two almost full (18 each) 42U racks of those machines (bar switches) so across the 720 E5 cores with 4.6TiB of RAM there is about 4.3 million bogomips.

Fun :)

(most of this is corporate fileservers, exchange, AD, various crappy apps, network appliances, web servers, SQL servers and idles at around 20% in use). If it all went off you'd need earplugs and fireman's equipment.