Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus Priority on the RP2350 vs RP2040 #2123

Open
XZCE opened this issue Dec 2, 2024 · 22 comments
Open

Bus Priority on the RP2350 vs RP2040 #2123

XZCE opened this issue Dec 2, 2024 · 22 comments
Assignees

Comments

@XZCE
Copy link

XZCE commented Dec 2, 2024

In my Pico W project (RP2040) I use the following code to make sure the 2nd CPU doesn't get stalled (it's serving the hardware bus of an old retro computer):

bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;

I've obtained some Pico 2Ws (RP2350), compiled my project for them, and I've tracked down CPU0 being so sluggish that WIFI isn't working reliably (LED isn't reliable, connections aren't either) and SD Card reads are very slow.. to this statement.

My question is, is this no longer an appropriate thing to use on the RP2350 when expecting WIFI to work? My thoughts are that the M0+ didn't have enough grunt to soak up resources to a point of failure, but maybe the M33's do?

Thanks for any advice.

@XZCE
Copy link
Author

XZCE commented Dec 3, 2024

Remove the priority setting to get the Pico 2W's LED blinking again...

#include "pico/stdlib.h"
#include "pico/cyw43_arch.h"
#include "hardware/gpio.h"
#include "hardware/structs/bus_ctrl.h"

void setup () {
    pinMode (LED_BUILTIN, OUTPUT);
}

void loop () {
    delay (200);
    digitalWrite (LED_BUILTIN, HIGH);
    delay (200);
    digitalWrite (LED_BUILTIN, LOW);
}

void setup1 () {
    // Put this CPU(1) into high priority bus mode
    bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;
}

volatile uint32_t data [8];

void loop1 () {
    while (1) {
        data [0] = sio_hw->gpio_in;
        data [1] = sio_hw->gpio_in;
        data [2] = sio_hw->gpio_in;
        data [3] = sio_hw->gpio_in;
        data [4] = sio_hw->gpio_in;
        data [5] = sio_hw->gpio_in;
        data [6] = sio_hw->gpio_in;
        data [7] = sio_hw->gpio_in;
    }
}

@lurch
Copy link
Contributor

lurch commented Dec 3, 2024

Based on the setup() and loop() function names above I'm guessing that you're using one of the Arduino wrappers?
Please recreate this using "just" pico-sdk as that'll make it much easier for us to investigate.

@XZCE
Copy link
Author

XZCE commented Dec 4, 2024

Yes mate, I am. Here's the SDK version, archive attached.
blink.zip

@lurch lurch transferred this issue from raspberrypi/pico-feedback Dec 4, 2024
@peterharperuk
Copy link
Contributor

I didn't even know this was a thing. @kilograham @Wren6991 Can you fiddle with bus priorities on RP2350 "to make sure the 2nd CPU doesn't get stalled"

@peterharperuk
Copy link
Contributor

@XZCE for my own curiosity, how could the 2nd CPU get stalled? What is it doing with the hardware bus of the retro computer?

@kilograham
Copy link
Contributor

I mean i can imagine that core 1 could stall out core 0, since the M33 can access memory every cycle, but you'd have to be doing a lot of memory access; I guess SIO is bus accessed on RP2350 which makes a difference in the example above, though I'm still surprised you get complete starvation.

Might be worth looking at the bus performance counters

@XZCE
Copy link
Author

XZCE commented Dec 5, 2024

@peterharperuk one of the CPUs stalls when they both go to access the same resource at the same time. The timing on my Pico W project is very tight though, and I can't afford for CPU1 to stall (at least not by much) otherwise there will be memory errors (I have a forum thread full of memory error photos from other users - which are now all fixed).

To describe what it's doing on the bus of the retro computer, it's spinning on sio_hw->gpio_in looking for various signals to change, so I can change state in my finite state machine. Each state then spins again the same way, looking for the same or maybe a different signal to change, to enter the next state. So it is doing something similar to the example above, but it's not complete starvation - CPU0 moves along, just slowly.

My thought on this is, I know the M33 is faster, so it might seem like I don't need to set priority anymore on the Pico 2W, but I won't be able to guarantee it always works. My biggest consumer of cycles is the 8MB PSRAM on my device too, which is running at 125MHz (near its top speed) but still, that only leaves a small amount of wriggle room for being in the right state and calculating the command+data to send it. The priority setting is the thing that's made it work for me.

@lurch
Copy link
Contributor

lurch commented Dec 5, 2024

I'm afraid that it's not something that I know much about, but if your code is so performance-critical I wonder if you might need to use a custom linker script (or similar) to ensure that the variables being used by CPU0 are in a different memory bank to the variables being used by CPU1? So that you get to take full advantage of RP2350's crossbar bus architecture, rather than both CPUs trying to fight for access to the same memory bank.

@XZCE
Copy link
Author

XZCE commented Dec 5, 2024

Ah, yeah, I'm already doing that. The RP2040 version has most CPU1 things in SCRATCHX+SCRATCHY (code, global variables and stack, with some overspill of code [mostly local inits of loop1] into the last 64KB block before scratch). The RP2350 version (which needs a newer GCC, and unfortunately a larger space due to wider asm instructions) spills a little more into that last 64KB block, but not too much more,.. My understanding is, the 2 x 256KB spaces are "striped" across different blocks too (on the RP2350's 8) which has been mentioned as being a way to make this sort of thing work... But, no, that doesn't help, unfortunately.

Actually, the RP2350 doesn't have non-striped RAM, which would be my next move, so I can't take it past where I'm currently at.

@XZCE
Copy link
Author

XZCE commented Dec 5, 2024

It occurs to me, on the RP2350, CPU0's stack is being set in the last 256KB block (in my linker script) just before the overspill from what I have going on in CPU1. I should try to move it into the 1st 256KB block and see if that helps. It's going to be a game of twister, I can tell...

But it's late here, and I've got work tomorrow.

@XZCE
Copy link
Author

XZCE commented Dec 7, 2024

Well, my project isn't working on the new chip with things moved into their isolated (as much as can be - the two CPUs need to access some common data) data areas. I don't really expect it to though, with the round-robin algorithm for stalling a CPU I'm going to lose a random number of cycles so it's no longer deterministic.

Just a heads-up though, as I keep the priority setting in the posted blink example.. I've changed it to use:

volatile uint32_t __scratch_x("data") data [8];
void __scratch_x("proc1Entry") proc1Entry() {

Which, as far as I know, puts all of CPU1 into its own little 4KB isolated RAM block - and it still doesn't blink (at least most of the time, I think I saw it working twice, but re-powering the device puts an end to that). I've checked the .map file, it's got everything in its right place.

@Wren6991
Copy link
Contributor

My biggest consumer of cycles is the 8MB PSRAM on my device too, which is running at 125MHz (near its top speed)

I'm a little confused because there is lots of discussion around moving things around in internal RAM, but it sounds like you are also heavily dependent on the performance of the QSPI bus.

Are you using PSRAM from core 1? If so I would guess you may be starving core 0 of flash access by boosting core 1's priority. QSPI is much more likely to be the bottleneck than internal RAM.

@Wren6991
Copy link
Contributor

RP2350 has a "stalled cycles" performance event for each arbiter, which might be useful to confirm where the stalls are coming from

@XZCE
Copy link
Author

XZCE commented Dec 10, 2024

I'm a little confused because there is lots of discussion around moving things around in internal RAM, but it sounds like you are also heavily dependent on the performance of the QSPI bus.

Are you using PSRAM from core 1? If so I would guess you may be starving core 0 of flash access by boosting core 1's priority. QSPI is much more likely to be the bottleneck than internal RAM.

My project uses PIO to access my PSRAM chip, as it was written for the RP2040 chip, not the RP2350.

Nevertheless, the (broken) blink example here isn't using anything, much, other than the bus priority setting for CPU1.

Do you have any other test cases I can try, which exercise the bus priority setting? It works OK on the RP2040 for me, but completely ruins CPU0 on the RP2350 for anything other than "register to register" instructions (edit: running on CPU1).

@XZCE
Copy link
Author

XZCE commented Dec 12, 2024

I have no idea what they mean, but these are the performance counter values:

/**
 * Copyright (c) 2020 Raspberry Pi (Trading) Ltd.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

#include "pico/stdlib.h"

// Pico W devices use a GPIO on the WIFI chip for the LED,
// so when building for Pico W, CYW43_WL_GPIO_LED_PIN will be defined
#ifdef CYW43_WL_GPIO_LED_PIN
#include "pico/cyw43_arch.h"
#endif

#include "pico/multicore.h"
#include "hardware/gpio.h"
#include "hardware/structs/bus_ctrl.h"
#include "hardware/structs/systick.h"

#ifndef LED_DELAY_MS
#define LED_DELAY_MS 250
#endif

// Perform initialisation
int pico_led_init(void) {
#if defined(PICO_DEFAULT_LED_PIN)
    // A device like Pico that uses a GPIO for the LED will define PICO_DEFAULT_LED_PIN
    // so we can use normal GPIO functionality to turn the led on and off
    gpio_init(PICO_DEFAULT_LED_PIN);
    gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);
    return PICO_OK;
#elif defined(CYW43_WL_GPIO_LED_PIN)
    // For Pico W devices we need to initialise the driver etc
    return cyw43_arch_init();
#endif
}

// Turn the led on or off
void pico_set_led(bool led_on) {
#if defined(PICO_DEFAULT_LED_PIN)
    // Just set the GPIO on or off
    gpio_put(PICO_DEFAULT_LED_PIN, led_on);
#elif defined(CYW43_WL_GPIO_LED_PIN)
    // Ask the wifi "driver" to set the GPIO on or off
    cyw43_arch_gpio_put(CYW43_WL_GPIO_LED_PIN, led_on);
#endif
}

volatile uint32_t __scratch_x("data") data [8];

void __scratch_x("proc1Entry") proc1Entry() {
    // Put this CPU(1) into high priority bus mode
    bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;
    while (1) {
        data [0] = sio_hw->gpio_in;
        data [1] = sio_hw->gpio_in;
        data [2] = sio_hw->gpio_in;
        data [3] = sio_hw->gpio_in;
        data [4] = sio_hw->gpio_in;
        data [5] = sio_hw->gpio_in;
        data [6] = sio_hw->gpio_in;
        data [7] = sio_hw->gpio_in;
    }
}

#define pause() {systick_hw->cvr = 0; (void)systick_hw->cvr; while (systick_hw->cvr > 0) {}}

int main() {
    int i = 0;
    stdio_init_all();
    printf ("Starting\n");
    systick_hw->csr = 0x5;
    systick_hw->rvr = 0x00FFFFFF;
    multicore_launch_core1(proc1Entry);
    int rc = pico_led_init();
    hard_assert(rc == PICO_OK);
    bus_ctrl_hw->perfctr_en = 0;
    while (i <= 0x43) {
        bus_ctrl_hw->counter [0].sel = i;
        bus_ctrl_hw->counter [0].value = 0;
        bus_ctrl_hw->perfctr_en = 1;
        pause();
        bus_ctrl_hw->perfctr_en = 0;
        if (bus_ctrl_hw->counter [0].value)
            printf ("0x%02X=%d\n", i, bus_ctrl_hw->counter [0].value);
        i++;
    }
    while (true) {
        pico_set_led(true);
        sleep_ms(LED_DELAY_MS);
        pico_set_led(false);
        sleep_ms(LED_DELAY_MS);
    }
}


/* *** OUTPUT ***
Starting
0x03=7456545  (siob_proc1_access)
0x08=3        (apb_stall_upstream)
0x09=3        (apb_stall_downstream)
0x0B=1        (apb_access)
0x14=932068   (sram8_stall_upstream)
0x16=932068   (sram8_access_contested)
0x17=16777215 (sram8_access)
0x3B=6710887  (xip_main1_access)
0x3F=6710888  (xip_main0_access)
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] timeout waiting for ALP to be set
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] Failed to start CYW43
[CYW43] Failed to start CYW43
[CYW43] Failed to start CYW43
*/

@XZCE
Copy link
Author

XZCE commented Dec 12, 2024

The numbers are different when I comment out the set bus priority statement (and it blinks):

Starting
0x03=7456545
0x08=3
0x09=3
0x0B=1
0x14=7456544
0x16=7456545
0x17=16777215
0x3B=6710888
0x3F=6710887

@Memotech-Bill
Copy link

I am not sure what the implication is, but I found this issue by searching on "HT not ready".

My case is PicoBB. I am finding that for the Pico2 GUI build, which is using the ScanVideo library to produce VGA output on Core 1, and Core 0 running the BBC BASIC interpreter attempting to open a WiFi connection I am obtaining the following diagnostic messages:

[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not up
[CYW43] HT not ready

and the WiFi is failing to connect.

To produce the failing executable:

git clone --recurse-submodules https://github.com/Memotech-Bill/PicoBB.git
cd PicoBB/bin/pico
make BOARD=pico2_w ADDON=vgaboard_cut

Install the software on a Pico2 W on a VGA Demo Board with:

  • VGA Monitor attached
  • Keyboard on Pico USB connector
  • Power applied to the VGA board USB connector
  • Serial terminal (for diagnostics) to VGA board serial connections.

Load "bbcbasic+filesystem_gui_pico2_w_vgaboard_cut.uf2" into the pico2 W. To demonstrate failure, enter the following commands on the Keyboard attached to the pico2 W:

LOAD mysqldem
RUN

I still need to add further diagnostics to investigate.

@Memotech-Bill
Copy link

I managed to enable the CYW43 driver verbose diagnostics.

I hope that this is meaningful to someone.

test.txt

@Memotech-Bill
Copy link

A few more diagnostics

test.txt

@XZCE
Copy link
Author

XZCE commented Dec 28, 2024

The "ScanVideo library" does have an option to configure the bus priority, so I'd try to "configure it not to" and see if it works any better.

I've been knocked out with the flu the past 2 weeks, but it would be good to get an update from the Pi guys soon on this. The last tests I had done show, with priority not being set, the RP2350 is slightly worse for CPU1 timing in my project, so I'm kind of stuck here. Is this going to end up being a note in the errata?

@Memotech-Bill
Copy link

The "ScanVideo library" does have an option to configure the bus priority, so I'd try to "configure it not to" and see if it works any better.

I have now tried with both PICO_SCANVIDEO_ADJUST_BUS_PRIORITY=0 (which I think is default) and PICO_SCANVIDEO_ADJUST_BUS_PRIORITY=1. It does not make any difference.

I don't know whether or not my issue is one of bus priority. As mentioned earlier, I posted here because this was the only place I found a mention of the "HT not ready" message.

@Memotech-Bill
Copy link

I no longer think my issue is the same as the OP. I have therefore opened a separate issue.

I apologise for intruding on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants