| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 1718627440 264 days ago

> Huh, don't think I've heard that commitment before. Do you mean that the GCC devs intend for -fanalyzer to (eventually?) guarantee catching all exploitable UB (which would be... ambitious, to say the least), or that -fanalyzer is a best-effort analysis? The docs currently state the latter more or less ("It is neither sound nor complete: it can have false positives and false negatives.")

Both actually. Any UB exploits not catched by -fanalyzer would need to be disabled. However I can't find a reference to this, so maybe my memory is deceiving me.

When writing Frama-C what I was thinking of was actually PVS-Studio (https://pvs-studio.com/), as this can also be used by students. It's also more of a standalone linter.

>> It is up to the programmer to ensure that the lifetime of a `dependent` reference is contained within the lifetime of the corresponding `owned` reference.

Yes, this is an escape hatch, when the pointer shenanigans can't be fully described. But I heard Rust also has those. If you want your program to be described by ownership semantics, you will make use of this less and less.

> named lifetimes support

I don't know enough Rust, but from what I read at https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html, yes it does. Specifying that the 'lifetime' of the return value corresponds to a parameter happens with the `returned` annotation. The cool thing in SPlint is that you can describe the lifetime of param1.foo.bar[0..42] . It also has several types of 'lifetimes': allocated, readable and writable which is useful to represent uninitialized memory, meaning after a function call some stuff is newly uninitialized, that before the call wasn't. You also can combine this with parameters, so you can say that param1.baz[0..param2] is writable and param1.baz[0..param3] is readable and also that readable param1.baz[0..X] and writable param1.baz[0..Y] always means that X > Y.

It doesn't use the term 'lifetime', but talks about owned, allocated, initialized, readable and writable memory. In addition it also supports adding other properties, so much more then 'lifetimes' can be tracked. The manual shows as an example how it can be used to track variables that are tainted by user input (10.1). What I think is missing though, are conditionals on the return value.

How much of these features can be written in Rust? (Honest question)

> view struct support

I don't know really what these are. Maybe I already described that above?

1 comments

aw1621107 264 days ago

> Both actually.

That would be quite the surprise to me. Quite unfortunate that you can't find a source given what the -fanalyzer docs currently say.

> Any UB exploits not catched by -fanalyzer would need to be disabled.

I'd be curious as to the hypothetical performance impact of this, as well as the amount of work it'd take to make -fanalyzer reliable enough.

> When writing Frama-C what I was thinking of was actually PVS-Studio (https://pvs-studio.com/), as this can also be used by students. It's also more of a standalone linter.

Ah, that's quite different.

> Yes, this is an escape hatch, when the pointer shenanigans can't be fully described.

Ah, that's fair. Bit different than what Rust offers, but what you say makes sense.

> But I heard Rust also has those.

Sort of yes, sort of no. Rust has an escape hatch in `unsafe`, but it technically doesn't disable any checks - for example, the borrow checker will check the validity of references, bounds checks will continue to be inserted, etc. regardless of whether you're in an `unsafe` block or not. What it does instead is to give you the ability to perform `unsafe` operations, which for borrow checker-related shenanigans would typically involve dealing with pointers (to be specific, dereferencing them since some other pointer operations are considered safe) since the borrow checker doesn't check pointers.

> I don't know enough Rust, but from what I read at https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html, yes it does. Specifying that the 'lifetime' of the return value corresponds to a parameter happens with the `returned` annotation.

That's one of the things named lifetimes let you do, but named lifetimes are flexible enough for more than just that use case, from slightly more complex things like dealing with multiple independent lifetimes at the same time (for example, returning two references instead of just one, or structs/functions using multiple lifetimes), to more obscure stuff like dealing with higher-ranked trait bounds [1].

Kind of a tangent, but the manual is a bit unclear to me as to what `returned` is capable of. It states "The returned annotation denotes a parameter that may be aliased by the return value.", but that's immediately followed by "Splint checks the call assuming the result may be an alias to the returned parameter." And later in the example it states "Because of the `returned` qualifier, Splint assumes the result of `intSet_insert` is the same storage as its first parameter, in this case the storage returned by `intSet_new`."

Does `returned` require that the annotated parameter correspond exactly to the return value, or can a "subset" of the parameter be returned? In more concrete terms, would something like the following be accepted (assuming the access is in bounds, of course)?

    int *get_fifth(/*@returned*/ int *slice) {
        return &slice[4];
    }

What about instances where the lifetime of the output may be tied to multiple parameters? For example, consider the following Rust function:

    fn max<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
        if a > b { a } else { b }
    }

My naive attempt at a translation to Splint would be:

    int *max(/*@returned*/ int *a, /*@returned*/ int *b) {
        if (*a > *b) { return a; } else { return b; }
    }

Would that be accepted as well?

In general, though, it does seem that Splint is able to describe many common patterns the borrow checker is also able to cover. I suspect the differences (in both directions) are probably only going to emerge for more complex use cases.

> You also can combine this with parameters, so you can say that param1.baz[0..param2] is writable and param1.baz[0..param3] is readable and also that readable param1.baz[0..X] and writable param1.baz[0..Y] always means that X > Y.

To be honest I don't quite follow, but I would guess that Rust isn't capable of anything similar since there isn't a way to describe properties for a subset of a slice in signatures.

> How much of these features can be written in Rust? (Honest question)

Assuming I'm understanding these correctly:

- owned: Typically represented via Box<T> or plain non-reference types.

- allocated, initialized, readable, writable: I believe these are generally handled via MaybeUninit [0] because everything is otherwise assumed to be properly initialized. Readable/writable might need &/&mut on top of MaybeUninit.

- Other properties: Might depend on the exact properties, but I think stuff like tainted input would usually be represented directly in the type system - in this example, taintedness would be types and transitions would be functions. There are at least two ways to implement that that I can think of right now. The first is via a simple newtype (might have syntax errors, but the gist should be clear):

    struct SafeUserString(String);
    struct UnsafeUserString(String);
    fn validate(s: UnsafeUserString) -> Result<SafeUserString, ValidationError> {
        if s.0.contains("/") {
            Err(ValidationError::InvalidChar)
        } else {
            Ok(SafeUserString{s.0})
        }
    }

The other is via constrained type parameters + PhantomData:

    trait Safety {}
    struct Safe;
    impl Safety for Safe {}
    struct Unsafe;
    impl Safety for Unsafe {}
    struct UserString<S: Safety> { data: String }
    fn validate(input: UserString<Unsafe>) -> Result<UserString<Safe>, ValidationError> {
        // ...
    }
    fn op_that_does_not_care_about_safety<S: Safety>(input: UserString<S>) -> UserString<S> {
        // ...
    }

> I don't know really what these are. Maybe I already described that above?

Basically a struct that contains a reference into data something else owns. Something like:

    struct ByteSlice<'a> {
        slice: &'a [u8]
    }

Similar patterns are quite common for iterator structs as well.

[0]: https://doc.rust-lang.org/beta/std/mem/union.MaybeUninit.htm...

[1]: https://doc.rust-lang.org/nomicon/hrtb.html

link

1718627440 263 days ago

> That's one of the things named lifetimes let you do, but named lifetimes are flexible enough for more than just that use case, from slightly more complex things like dealing with multiple independent lifetimes at the same time (for example, returning two references instead of just one, or structs/functions using multiple lifetimes), to more obscure stuff like dealing with higher-ranked trait bounds [1].

In SPlint you would describe something as `special` and then describe the individual elements.

> Kind of a tangent, but the manual is a bit unclear to me as to what `returned` is capable of.

Aliasing is mutual and storage refers to the region an object lives. It doesn't refer to exact pointer values.

> Would that be accepted as well?

    $ cat test.c

    int *
    max (/*@returned@*/ int * a, /*@returned@*/ int * b) {
     if (*a > *b) return a; else return b;
    }
    
    int
    main (void)
    {
     int * a = malloc (sizeof *a);
     int * b = malloc (sizeof *b);
     int * c;
    
     if (!a || !b) abort ();
    
     *a = 4;
     *b = 6;
    
     c = max (a, b);
    
     free (a);
     free (b);
     free (c);
     return 0;
    }

    $ splint test.c

    Splint 3.1.1 --- 05 Jan 2023

    Finished checking --- 2 code warnings
    test.c: (in function main)
    test.c:22:8: Dead storage c passed as out parameter to free: c
      Memory is used after it has been released (either by passing as an only param
      or assigning to an only global). (Use -usereleased to inhibit warning)
       test.c:20:8: Storage c released
    test.c:2:1: Function exported but not used outside test: max
      A declaration is exported, but not used outside this module. Declaration can
      use static qualifier. (Use -exportlocal to inhibit warning)
       test.c:4:1: Definition of max

link

aw1621107 263 days ago

> In SPlint you would describe something as `special` and then describe the individual elements.

Interesting. That does seem to provide more flexibility, though based on what the manual says I still feel like it's not quite to the same level as named lifetimes since `special` looks like it revolves around allocation/initialization?

At least based on a quick skim of the manual structs with non-owning pointers still seems like a potential difference since from what I can tell struct field checks are either for ownership (/@only@/ for fields by default), initialization/allocation (`partial`, state clauses), or requires overhead (/@refs@/). Nothing quite like "the data here will live for some arbitrary lifetime(s) dictated by the use context".

> Aliasing is mutual and storage refers to the region an object lives. It doesn't refer to exact pointer values.

I... think that answers my question? Some quick tests seem to bear that out as well.

> $ cat test.c

So I think I'm a bit dim and for some reason I thought Splint was not free. Sorry for the bother! I could have tried things out myself this entire time!

Good to see that /@returned@/ works like I hoped at least.

link

1718627440 263 days ago

> At least based on a quick skim of the manual structs with non-owning pointers still seems like a potential difference since from what I can tell struct field checks are either for ownership (/@only@/ for fields by default), initialization/allocation (`partial`, state clauses), or requires overhead (/@refs@/). Nothing quite like "the data here will live for some arbitrary lifetime(s) dictated by the use context".

Then I don't understand (your/Rusts) understanding of lifetimes. As to my understanding, the lifetime of an object is bound by the lifetime of the underlying allocation and is the time during which the storage of the allocation is initialized without interruption.

> I thought Splint was not free

It is for example in Debian, but I have recompiled it myself, since the Debian version has some bugs.

link

aw1621107 263 days ago

> As to my understanding, the lifetime of an object is bound by the lifetime of the underlying allocation and is the time during which the storage of the allocation is initialized without interruption.

I think that's more or less the same definition used by Rust. It's just that the borrow checker gives you some more options to use/manipulate lifetimes.

Not entirely sure this would help, the consider this type from earlier:

    struct ByteSlice<'a> {
        slice: &'a [u8],
    }

This describes a struct containing a non-owning reference to a slice of bytes where the reference has some "placeholder" lifetime 'a (where 'a could be checked at the point of use and in a more abstract way without necessarily knowing about an actual underlying object). This could be handy if you need to refer to multiple subsets of a whole and don't want to make a copy - say, if you were writing a zero-copy parser. The borrow checker will ensure that the underlying data will always be valid for as long as this struct is used.

I didn't see an obvious analogous annotation in the Splint user manual that lets it check this kind of construct. This struct doesn't own the data it references, so /*@only@*/ doesn't apply. The reference itself as well as the data it references will be initialized since `MaybeUninit` isn't involved, so `partial` and state clauses don't apply. And no reference counting is involved, so /*@refs@*/ doesn't apply either. `dependent` seems like it might fit, but that isn't checked.

The names also let you be more specific with your lifetimes. For example, you could create a struct with references with two different but overlapping lifetimes (though I think such cases would be rare in practice):

    struct ByteSlices<'a, 'b> {
        first: &'a [u8],
        second: &'b [u8],
    }

link