Hacker News new | ask | show | jobs
by crote 662 days ago
Very impressive!

It would be interesting to see a short writeup of what kind of magic was required to achieve this, as there have been multiple failed attempts before this.

I'm also curious about the performance boost from 2.81Mbit/link failure at 150MHz to 65.4Mbit/31.4Mbit at 200MHz. That doesn't sound like basic processor bottlenecks, but rather some kind of catastrophic breakdown at a lower level? Does it just occasionally completely fail to lock onto an incoming clock signal or something?

1 comments

I did some further investigating - it's apparently due to not having enough setup time on the RX pio SM. Even though the PIO clocking is fixed at 100 MHz, there are CRC errors at the lower system clocks. I tried changing the delay in the PIO instruction that starts the RX sampling, but that only made things worse (as expected). Also tried disabling the synchronizers with no improvement.
Hmm, interesting. Am I understanding it correctly that you're doing some kind of reset on the RX PIO from regular C code, and the time for "RX finish -> interrupt CPU -> reset RX PIO" is longer than the gap between packets?

If so, might it be possible to use two RX PIOs, automatically starting the next one via inter-PIO IRQ when a packet is finished? That'd give you an entire packet receive time to reset the original PIO, which should be plenty.

Nothing nearly so complex. Here's the code in question:

  .wrap_target
     irq set 0          ; Signal end of active packet
  start:
      wait 1 pin 2      ; Wait for CR_DV assertion
      wait 1 pin 0      ; Wait for RX<0> to assert, signalling preamble start
      wait 1 pin 1 [2]  ; Wait for Start of Frame Delimiter, align to sample clk
  sample:
      in pins, 2        ; accumulate di-bits
      jmp PIN, sample   ; as long as CRS_DV is asserted
  .wrap
It's run at a fixed 100 MHz, regardless of system clock speed, via controlling the PIO execution rate a fraction of the system clock speed. So, for a 300 MHz system clock, the PIO is clocked once every three system clocks. I'm speculating that the extra two clocks (at 300 MHz) allows more setup time to the PIO inputs. The [2] above enables an extra two PIO clock delays before executing the next instruction. I tried changing this from zero to three at 100 MHz system clock (i.e. a PIO system clock divisor of one), and wasn't able to fix the problem. Though it should be noted that the LAN8742 isn't a very forgiving chip - I've seen RX Data Valid (DV) go metastable when the TX clock is interrupted/changed, so another pass through might be worthwhile.

BTW, Sandeep's original code clocked the RX PIO SM at 50 MHz, pushing all the samples to the output FIFO, and relied on the processor getting interrupted at the falling edge of DV to figure out what samples constituted a packet.