Hacker News new | ask | show | jobs
by nerdponx 3175 days ago
these functions' documentation is either buried in a long manual

This is a problem with lots of feature-rich software, even with meticulously-documented APIs. What we need is reverse-indexed documentation. That is, an extensive API reference is only useful for someone who already knows what functions are in the API and just needs to remember how to use them. But even the most thorough API reference does nothing to promote discovering new functionality. This is often left to the authors, who then have to go about writing a User's Guide that gradually explains concepts, idioms, etc. in prose.

Thorough User's Guides are rare because they are tough to write, and even tougher to write well. Users don't often have the time to read through potentially hundreds of pages of prose to find what they're looking for. We need a better way to let users search or browse for concepts, and then be given a list of the functions that implement each concept.

That is, addition to documentation like:

    size_t strlen(const char * s);
      RETURN: Length of string s.

    size_t strnlen(const char * s, size_t maxlen);
      RETURN: Length of string s, or maxlen (whichever is smaller).
      NOTE: Stops reading after maxlen.

    char * stpcpy(char * dst, const char * src);
      Copy src to dst.
      RETURN: pointer to trailing '\0' of dst, or dst[n] if no trailing NUL.
      NOTE: Undefined behavior if dst and src overlap.

    char * stpncpy(char * dst, const char * src, size_t len);
      Copy up to len bytes from src to dst.
      RETURN: pointer to trailing '\0' of dst, or dst[n] if no trailing NUL.
      NOTE: Undefined behavior if dst and src overlap.

    char * strcpy(char * dst, const char * src);
      Copy src to dst.
      RETURN: dst.
      NOTE: Undefined behavior if dst and src overlap.

    char * strncpy(char * dst, const char * src, size_t len);
      Copy up to len bytes from src to dst
      RETURN: dst.
      NOTE: Undefined behavior if dst and src overlap.
We also need to be able to "tag" functions. So we might have the following tags that allow us to search for concepts:

    strcpy
      TAGS: "concept":"data type":"text", "concept":"attribute":"length",
            ".input":"array", ".input":"char", ".input":"pointer",
            ".return":"string", ".return":"pointer"
    strncpy
      TAGS: "concept":"data type":"text", "concept":"attribute":"length",
            "concept":"data type":"array":"max-length operator",
            ".input":"array", ".input":"char", ".input":"pointer",
            ".return":"array", ".return":"pointer"
And a tag browsing page that looks like

    concept
    └─ data type
       └─ text
       └─ array (conceptual)
          └─ max-length operator
    └─ attribute
       └─ length / size
    data structure
    └─ char
    └─ array (implementation)
    └─ pointer
    
Which the user could then scan, and identify keywords to search for:

    '"concept":"data type":"text" AND "concept":"attribute":"length"'
And be given "strlen" and "strnlen" as the top two hits, followed by "wcslen" and "wcsnlen".

It seems like a PITA at first, but I'm pretty sure tagging functions in their docstrings is easier than writing a whole new User's Guide.

3 comments

One great approach is the Hoogle search engine for Haskell [1]. The idea with that is that you search by type, instead of name. So if you were looking for a function to take a item, and return a list with n copies of that item, you would search for `a -> Int -> [a]`, which would give you back replicate.

Looks like the Nix expression language is untyped, so this wouldn't work directly, but maybe adding a rough type signature in the docstring would get some of those benefits (and it should be a bit better for discover-ability, since you wouldn't need to guess the same tags/concept the author choose).

[1] - https://www.haskell.org/hoogle/

Some Smalltalks (Squeak and Pharo) have a wonderful tool called a "Method Finder".

The way it works is you write a sequence of arguments and an expected result, and it suggests a method to call.

For example, I just now tried it, entering

  3. 4. 7
in the input. It suggested the following methods:

  3 + 4 --> 7
  3 bitOr: 4 --> 7
  3 bitXor: 4 --> 7
Clicking on any of these opens a browser on the class and method concerned.

Another example: input of

  'hello world'. #('hello' 'world')
yielded the single suggestion

  'hello world' substrings --> #('hello' 'world')
Another:

  '  abc  '. 'abc'
yields

  '  abc  ' asLegalSelector --> 'abc'
  '  abc  ' withBlanksTrimmed --> 'abc'
;;--

It's a total hack, of course, but none the less effective or useful for that.

It has a list of methods marked "safe to experiment with", and simply tries them out.

It gets a big boost from being able to evaluate the receiver (the first in the input list) to a concrete object, and then only consider methods on that object's class.

I worked with Visual Smalltalk for years and early on I created my own tool which found all methods whose source-code contained a given string or string with wild-cards. It was surprisingly effective because it easily integrated into the Smalltalk browser-windows in general so rather than printing out the set of methods found it opened a list-browser with all the found methods in it, so I could then easily browse for their senders or methods called in them and so on.

So how did you know what to search for? A simple case was to look up the ready-built application's GUI which usually always contains some strings. So if you wanted to change something in the system which would result in a change to its GUI just search for source-code with some of the words you saw in the GUI.

That's pretty cool. Is the technique for this documented somewhere, or do I have to read the Squeak or Pharo source?
I've looked into it a bit. Visualworks smalltalk documentation gave me an idea[1].

In Smalltalk everything is an object, that is including numbers, characters etc. The standar method-lookup searches from the innermost class methods to the outermost class methods.

The method finder could simply iteratively go through all applicable methods that return the class requested and match results.

[1]: http://esug.org/data/Old/vw-tutorials/vw25/vw25ug.pdf (page 42 method-lookup)

I don't recall reading about it anywhere, but I did once look at the implementation (it's wonderful having a reflective system where a single click gets you source of anything onscreen), and it's very straightforward but also kind of gross.

One neat trick I'd forgotten is that if it doesn't find anything interesting with the inputs in the order written, it permutes them and tries again!

This just looks like you missing things like classes, namespaces etc. in C. Classes, namespaces etc. are a natural way of making API discoverable (among other things). For example if you want to know how to search for a somthing in a string just look at the methods exposed by string.
Then you just pass the buck over to classes and namespaces. You have the same fundamental problem in Ruby and Python, perhaps even worse for lack of typed function signatures.
I think this is a great idea, and something that could hugely improve semi-automated documentation sites, man pages, etc.

I also think that you could get a lot of people to agree that it is a great idea, and STILL have a lot of trouble enforcing it in a project without very rigid code review policies and very good linting rules.