Hacker News new | ask | show | jobs
by jperkin 4234 days ago
Briefly:

* Between Solaris 10 and Solaris 11, Oracle bought Sun and killed the open source efforts known as OpenSolaris. illumos started from the final open source bits of what eventually became Solaris 11 and has now significantly diverged. Solaris is effectively dead, illumos is very much alive.

* There are a number of illumos "forks", of which SmartOS is ours, but all forks still share the common illumos code (we merge daily) and contribute heavily back to the common base. Each fork may contain features which aren't yet ready for merging back, e.g. our work to port KVM[1] to SmartOS is not part of illumos yet, but other distributions such as OmniOS[2] have taken that work and integrated it.

* SmartOS is a minimal distribution, we have removed a lot of parts (desktop, shared storage, etc.) which do not fit in with our explicit design goals, and added tooling around virtualisation. You boot from USB/CD/PXE into a minimal live-image hypervisor known as the Global Zone[3], and then perform work in zones which are backed by local storage. To upgrade, you simply replace the USB image with a newer platform and reboot into the new live image.

* As for software, we provide userland built from pkgsrc[4], which gives you access to over 13,000 packages available under /opt/local, allowing you to use both the SmartOS tools as well as any third party software you may need (e.g. GNU stuff). There is even full desktop stuff provided, should you want to use it[5].

* In Manta, when running a job you are basically running in a zone with as many pkgsrc packages pre-installed as we can manage (currently nearly 9,000), so the chances are that the software you need is available. If not, you can easily build it yourself and the store it back in Manta to use later as an asset[6] for your jobs.

* There is definitely a free tier for Joyent zones (search for "free" on https://www.joyent.com/products/public-cloud/pricing), I thought there was also an additional free tier for Manta but I can't see it right now, however it will only cost a few cents to do some basic tests in Manta and get a feel for what it can do.

I primarily work on pkgsrc, but I know a number of other engineers will be reading this thread and can comment in more depth on SDC/Manta, so feel free to ask any more questions or pop onto #smartos or #manta on Freenode IRC.

Thanks.

[1] http://dtrace.org/blogs/bmc/2011/08/15/kvm-on-illumos/

[2] http://omnios.omniti.com/

[3] http://www.perkin.org.uk/posts/smartos-and-the-global-zone.h...

[4] http://pkgsrc.joyent.com/

[5] https://twitter.com/jperkin/status/348506063336783872

[6] https://apidocs.joyent.com/manta/jobs-reference.html#assets-...

2 comments

Ok, thank you and others very much for your detailed info. I did use OpenSolaris for a brief time, but kind of lost track after Oracle came and took it.

In regards of the free tier, now I found it, but its mention is placed as the last line, after all the paid plans and the part about "Contact sales if your requirements exceed everything we've planned a price for" ;)

About Manta, I'd like to know about what kind of data objects it supports? Like having a matrix of time series data and making quick selections and sorts, near real time?

Docs talked about hierarchical storage and search, which sounds really useful, too, but what is search in this context?

Manta objects are effectively just files stored on ZFS, so you write anything you like. Maybe this example session will help:

* Perform a directory listing of a tmp directory in my public area. mls(1) is the ls(1) equivalent for listing Manta directories.

  $ mls -l /jperkin/public/tmp
  -rwxr-xr-x 1 jperkin        540751 Oct 23 14:19 bbc.png
  -rwxr-xr-x 1 jperkin         27237 Dec 13  2013 libreoffice.tar.gz
  -rwxr-xr-x 1 jperkin        132079 Oct 25 02:01 lx64.png
  -rwxr-xr-x 1 jperkin       2397256 Jul 09  2013 nas-workdir.tar.gz
  -rwxr-xr-x 1 jperkin       1626181 Jul 10  2013 nas-workdir64.tar.gz
* Log into a Manta zone using bbc.png as my input file. This creates a zone, and maps in my file which is stored on the same machine (you are always operating on the same host as your data is stored). mlogin(1) is a nice way to prototype jobs in an interactive session, and once you have it working correctly you can use mjob(1) to run it automatically.

  $ mlogin /jperkin/public/tmp/bbc.png
   * created interactive job -- f1a2e579-34f8-4dd4-da19-db33954a0772
   * waiting for session... | established
  jperkin@manta #
* At this point it's just Unix, so I can run any command on the file (which has been mapped in under /manta) I like:

  jperkin@manta # uname -a
  SunOS 0ae1c6ec-d47a-455c-9dd6-97eec16da31b 5.11 joyent_20140628T000418Z i86pc i386 i86pc Solaris

  jperkin@manta # ls -l /manta/jperkin/public/tmp/bbc.png
  -rw-r--r-- 1 root root 540751 Nov  7 11:41 /manta/jperkin/public/tmp/bbc.png

  jperkin@manta # file /manta/jperkin/public/tmp/bbc.png
  /manta/jperkin/public/tmp/bbc.png: PNG image data, 1680 x 940, 8-bit/color RGBA, non-interlaced
* Note that only the file I chose has been mapped in:

  jperkin@manta # find /manta
  /manta
  /manta/jperkin
  /manta/jperkin/public
  /manta/jperkin/public/tmp
  /manta/jperkin/public/tmp/bbc.png
* Let's convert it to a JPEG using convert(1) from ImageMagick and store it back into Manta in the same directory using mput(1):

  jperkin@manta # convert /manta/jperkin/public/tmp/bbc.png /var/tmp/bbc.jpg

  jperkin@manta # ls -l /var/tmp/bbc.jpg
  -rw-r--r-- 1 root root 504004 Nov  7 11:46 /var/tmp/bbc.jpg

  jperkin@manta # mput -f /var/tmp/bbc.jpg /jperkin/public/tmp/
  /jperkin/public/tmp/bbc.jpg    [======================================================>] 100% 492.19KB
* This file is now available at https://us-east.manta.joyent.com/jperkin/public/tmp/bbc.jpg and ready for further Manta jobs.

Of course this is a simple and contrived example, the real power of Manta comes when you have say 1,000,000 log files stored under a particular path and want to grep them all for a particular string. To do that you'd do something like:

  $ mfind /jperkin/public/logs -n "access_log.*.gz" | mjob create -o -m "gzcat" -m "grep something" -r "cat"
This will scale to whatever size your Manta cluster is, e.g. if you have 10 hosts then the log files will be split up across those hosts and they each will spin up multiple zones to run "gzcat | grep" on the local data, before a final "cat" reduce job is used to collate the results from each map job.
Wow, thank you, again. I have to definitely take a look this. My use cases tend to vary so much that creating a Hadoop like system would require too much custom coding.

I wonder if it is possible to have compression and de-duplication, so that there could be a one big base dataset and lots of containers that only add what new data they generate.

Anyhow, looking at this it feels really approachable. What I have in mind are quick-and-dirty data-sciency scripts for ad hoc use cases, like diffing structured files and combing over matrix data.

http://www.linux-kvm.org/wiki/images/7/71/2011-forum-porting...

Your work on porting KVM to SmartOS is amazing, I don't think this accomplishment can be easily overstated.