| Manta objects are effectively just files stored on ZFS, so you write anything you like. Maybe this example session will help: * Perform a directory listing of a tmp directory in my public area. mls(1) is the ls(1) equivalent for listing Manta directories. $ mls -l /jperkin/public/tmp
-rwxr-xr-x 1 jperkin 540751 Oct 23 14:19 bbc.png
-rwxr-xr-x 1 jperkin 27237 Dec 13 2013 libreoffice.tar.gz
-rwxr-xr-x 1 jperkin 132079 Oct 25 02:01 lx64.png
-rwxr-xr-x 1 jperkin 2397256 Jul 09 2013 nas-workdir.tar.gz
-rwxr-xr-x 1 jperkin 1626181 Jul 10 2013 nas-workdir64.tar.gz
* Log into a Manta zone using bbc.png as my input file. This creates a zone, and maps in my file which is stored on the same machine (you are always operating on the same host as your data is stored). mlogin(1) is a nice way to prototype jobs in an interactive session, and once you have it working correctly you can use mjob(1) to run it automatically. $ mlogin /jperkin/public/tmp/bbc.png
* created interactive job -- f1a2e579-34f8-4dd4-da19-db33954a0772
* waiting for session... | established
jperkin@manta #
* At this point it's just Unix, so I can run any command on the file (which has been mapped in under /manta) I like: jperkin@manta # uname -a
SunOS 0ae1c6ec-d47a-455c-9dd6-97eec16da31b 5.11 joyent_20140628T000418Z i86pc i386 i86pc Solaris
jperkin@manta # ls -l /manta/jperkin/public/tmp/bbc.png
-rw-r--r-- 1 root root 540751 Nov 7 11:41 /manta/jperkin/public/tmp/bbc.png
jperkin@manta # file /manta/jperkin/public/tmp/bbc.png
/manta/jperkin/public/tmp/bbc.png: PNG image data, 1680 x 940, 8-bit/color RGBA, non-interlaced
* Note that only the file I chose has been mapped in: jperkin@manta # find /manta
/manta
/manta/jperkin
/manta/jperkin/public
/manta/jperkin/public/tmp
/manta/jperkin/public/tmp/bbc.png
* Let's convert it to a JPEG using convert(1) from ImageMagick and store it back into Manta in the same directory using mput(1): jperkin@manta # convert /manta/jperkin/public/tmp/bbc.png /var/tmp/bbc.jpg
jperkin@manta # ls -l /var/tmp/bbc.jpg
-rw-r--r-- 1 root root 504004 Nov 7 11:46 /var/tmp/bbc.jpg
jperkin@manta # mput -f /var/tmp/bbc.jpg /jperkin/public/tmp/
/jperkin/public/tmp/bbc.jpg [======================================================>] 100% 492.19KB
* This file is now available at https://us-east.manta.joyent.com/jperkin/public/tmp/bbc.jpg and ready for further Manta jobs.Of course this is a simple and contrived example, the real power of Manta comes when you have say 1,000,000 log files stored under a particular path and want to grep them all for a particular string. To do that you'd do something like: $ mfind /jperkin/public/logs -n "access_log.*.gz" | mjob create -o -m "gzcat" -m "grep something" -r "cat"
This will scale to whatever size your Manta cluster is, e.g. if you have 10 hosts then the log files will be split up across those hosts and they each will spin up multiple zones to run "gzcat | grep" on the local data, before a final "cat" reduce job is used to collate the results from each map job. |
I wonder if it is possible to have compression and de-duplication, so that there could be a one big base dataset and lots of containers that only add what new data they generate.
Anyhow, looking at this it feels really approachable. What I have in mind are quick-and-dirty data-sciency scripts for ad hoc use cases, like diffing structured files and combing over matrix data.