Hacker News new | ask | show | jobs
by oxygen_crisis 697 days ago
> An interview question for a position that requires knowledge of bash/linux/stuff could be:

> What if you're ssh'd into a machine, you're in your trusty bash shell, but unfortunately you cannot spawn any new processes because literally all other pids are taken. What do you do?

I'd look in the /proc/[pid]/ filesystem for visibility into what processes are exhausting the PID space.

`kill` is a shell builtin in bash, you don't have to rely on forking a new process like /bin/kill. If you can find out the parent process whose children are exhausting PIDs you're well on your way to stopping it and getting a handle on things again.

And I'll be darned, this script parses /proc. No | pipes or $( .. ) substitutions that would need to spawn another bash subshell process either. Pretty clean.

7 comments

My answer in an interview was “exec Python”. Then you can call all the posix functions you need without launching separate commands.

This went over quite well.

It's interesting that mainstream Unix shells do not have a syscall function. That would be very useful.
Install this on production to almost guarantee to hear from the author in his official capacity :P
> Here is what people have been saying about ctypes.sh:

"that's disgusting"

"this has got to stop"

"you've gone too far with this"

"is this a joke?"

"I never knew the c could stand for Cthulhu."

This made me giggle.

And after the exec, if they asked me to parse a Python expression, I'd type "eval(expr)".

It's funny, because at university, you would be assessed (perhaps) on such a question, and you would not be allowed to use these things! And yet, in "real life", this is exactly how you'd go about accomplishing the task.
Or even:

import code

code.interact()

# https://docs.python.org/3/library/code.html

Heh! But for real, though. Then you have a repl with access to all the functions in the os module. You can glob files to iterate over /proc. You can send signals. You can open network connections. As far as emergency shells go, you could do far, far worse.

Edit: also, all valid JSON is valid Python. Do not `eval(input_data)` in prod or I will haunt you. But, in an emergency…

Oh, I know about the security issues with eval.

My example was just as a joke.

For real use, I would only use it with my own trusted input.

I mean realistically speaking: If I can do `foo = <paste>`, check `typeof(foo)`. and output foo again to double-check what the REPL thinks foo contains, then I'm pretty safe to `eval(foo)`.

Sure, you could fake it with custom objects and all of that, but not when I'm pasting a string value into a REPL. If you had hijacked my workstation, shell or the remote python to the point you can exploit that... Yeah. I don't think you'd need me as a user then anymore.

I'd probably just reboot the machine, honestly. You'll be back up and running faster than spending time in a hobbled environment hunting down and killing the parent processes. And if you're out of PIDs probably a lot of other things are in a bad state. Just start clean.
About as minimal as you can get with pids and command names:

ps(){ (cd /proc;for i in [0-9]*;do echo $i: $(tr '\0' ' ' < $i/cmdline);done); }

That forks twice for every iteration of the loop, though: once for the subshell, and again to run `tr`.
Fix:

  ps() { for i in /proc/[0-9]*; do readarray -d '' -t cmdline < "$i/cmdline"; printf "%s: %s\n" "${i#/proc/}" "${cmdline[*]}"; done; }
> I'd look in the /proc/[pid]/ filesystem for visibility into what processes are exhausting the PID space.

From the source code:

    # so initially i was hoping you could get everything from /proc/<pid>/status
    # because it's easy to parse (in most cases) but apparently you can't get
    # things like the cpu% :(
You can calculate a cpu% from the tick information (uticks,kticks,sticks) in /proc/[pid]/stat. I've done it once in a script after spending considerable time reading the manual of proc.
Specifically the issue here was that it's littered between `/proc/<pid>/stat{,us}` and then for some of the information you have to look in `/proc` itself for things like major number - driver mapping (for figuring out which TTY something is running on).

Realistically you can get a useful `ps` by catting/grepping `/proc/<pid>/status` for all the processes, but the goal here was to replicate exactly the output of procps `ps aux`. Except for the bugs in column alignment, she fixed those intentionally.

Re sub processes, genuinely curious, how do

   [[ $cmdline ]] && exec {cmdline}>&-
and

  exec {cmdline}< "$dir"/cmdline || continue
work?
This is actually in the POSIX standard for the shell.

"The redirection operator:

  [n]>&word

"shall duplicate one output file descriptor from another, or shall close one. If word evaluates to one or more digits, the file descriptor denoted by n, or standard output if n is not specified, shall be made to be a copy of the file descriptor denoted by word; if the digits in word do not represent a file descriptor already open for output, a redirection error shall result; see Consequences of Shell Errors. If word evaluates to '-', file descriptor n, or standard output if n is not specified, is closed. Attempts to close a file descriptor that is not open shall not constitute an error. If word evaluates to something else, the behavior is unspecified."

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

[[ is a keyword, and exec is a builtin. With the {name}< syntax, exec is opening a file descriptor and assigning it's numerical value to $name, and {name}>&- closes it
> I'd look in the /proc/[pid]/ filesystem

    cd /proc
    echo *
Argument list too long

$

'echo' is a shell builtin. argv[] length restrictions only apply to exec. it's the same reason the script works, which uses more or less the same technique, only in a 'for' loop, which, is also builtin.

even if it were an issue.. say on a terminal without working scrollback.. you can just as easily:

    echo 1*
and so forth.
echo 1*; echo 2*; ...

Break it into tenths (ninths, maybe, with no leading zeroes?), or finer granularity if necessary.

The argument list isn't nearly as constrained as it was a decade ago. "echo {00000001..10000000}" works in bash on most modern distros where shells on earlier systems would have choked on a tiny ARG_MAX.

That was my first idea too, slightly hindered by the fact I couldn't remember where it actually stored that on the fs.

Second idea was `sudo reboot now`

sudo forks at least once (bash spawns /usr/bin/sudo), but also will fork to execute the command if logging is enabled (see the manual page for sudo(8)).

you can `exec sudo` but this will hose you if it tries to fork (because now you've lost your bash).

ssh back in as root then `restart now`
If you're out of pids, you can't ssh back in (though this raises the question of how you ssh'd in in the first place). And hopefully you have root ssh logins disabled.

But I think a prerequisite is that you already have a root shell; some systems don't allow accessing all of /proc unless you're root, and if you figure out what process is exhausting all your pids and want to kill it, you probably need to be root to do that, unless you're very lucky and that process happens to be running under your regular user account.

At any rate, you'd need to `exec restart now`, because just `restart now` would try to fork. (Also, there's no `restart` command; I think you meant `reboot`, and it doesn't need arguments. `shutdown -r now` would also do it.)

Would exiting the ssh session not free up the pid again? Also yes, I meant `reboot` not `restart`, and I always forget its only shutdown that needs the `now`, not reboot
not if something is gobbling up PIDs. that's literally the hypothetical, which you have completely ignored