-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bus Priority on the RP2350 vs RP2040 #2123
Comments
Remove the priority setting to get the Pico 2W's LED blinking again... #include "pico/stdlib.h"
#include "pico/cyw43_arch.h"
#include "hardware/gpio.h"
#include "hardware/structs/bus_ctrl.h"
void setup () {
pinMode (LED_BUILTIN, OUTPUT);
}
void loop () {
delay (200);
digitalWrite (LED_BUILTIN, HIGH);
delay (200);
digitalWrite (LED_BUILTIN, LOW);
}
void setup1 () {
// Put this CPU(1) into high priority bus mode
bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;
}
volatile uint32_t data [8];
void loop1 () {
while (1) {
data [0] = sio_hw->gpio_in;
data [1] = sio_hw->gpio_in;
data [2] = sio_hw->gpio_in;
data [3] = sio_hw->gpio_in;
data [4] = sio_hw->gpio_in;
data [5] = sio_hw->gpio_in;
data [6] = sio_hw->gpio_in;
data [7] = sio_hw->gpio_in;
}
} |
Based on the |
Yes mate, I am. Here's the SDK version, archive attached. |
I didn't even know this was a thing. @kilograham @Wren6991 Can you fiddle with bus priorities on RP2350 "to make sure the 2nd CPU doesn't get stalled" |
@XZCE for my own curiosity, how could the 2nd CPU get stalled? What is it doing with the hardware bus of the retro computer? |
I mean i can imagine that core 1 could stall out core 0, since the M33 can access memory every cycle, but you'd have to be doing a lot of memory access; I guess SIO is bus accessed on RP2350 which makes a difference in the example above, though I'm still surprised you get complete starvation. Might be worth looking at the bus performance counters |
@peterharperuk one of the CPUs stalls when they both go to access the same resource at the same time. The timing on my Pico W project is very tight though, and I can't afford for CPU1 to stall (at least not by much) otherwise there will be memory errors (I have a forum thread full of memory error photos from other users - which are now all fixed). To describe what it's doing on the bus of the retro computer, it's spinning on sio_hw->gpio_in looking for various signals to change, so I can change state in my finite state machine. Each state then spins again the same way, looking for the same or maybe a different signal to change, to enter the next state. So it is doing something similar to the example above, but it's not complete starvation - CPU0 moves along, just slowly. My thought on this is, I know the M33 is faster, so it might seem like I don't need to set priority anymore on the Pico 2W, but I won't be able to guarantee it always works. My biggest consumer of cycles is the 8MB PSRAM on my device too, which is running at 125MHz (near its top speed) but still, that only leaves a small amount of wriggle room for being in the right state and calculating the command+data to send it. The priority setting is the thing that's made it work for me. |
I'm afraid that it's not something that I know much about, but if your code is so performance-critical I wonder if you might need to use a custom linker script (or similar) to ensure that the variables being used by CPU0 are in a different memory bank to the variables being used by CPU1? So that you get to take full advantage of RP2350's crossbar bus architecture, rather than both CPUs trying to fight for access to the same memory bank. |
Ah, yeah, I'm already doing that. The RP2040 version has most CPU1 things in SCRATCHX+SCRATCHY (code, global variables and stack, with some overspill of code [mostly local inits of loop1] into the last 64KB block before scratch). The RP2350 version (which needs a newer GCC, and unfortunately a larger space due to wider asm instructions) spills a little more into that last 64KB block, but not too much more,.. My understanding is, the 2 x 256KB spaces are "striped" across different blocks too (on the RP2350's 8) which has been mentioned as being a way to make this sort of thing work... But, no, that doesn't help, unfortunately. Actually, the RP2350 doesn't have non-striped RAM, which would be my next move, so I can't take it past where I'm currently at. |
It occurs to me, on the RP2350, CPU0's stack is being set in the last 256KB block (in my linker script) just before the overspill from what I have going on in CPU1. I should try to move it into the 1st 256KB block and see if that helps. It's going to be a game of twister, I can tell... But it's late here, and I've got work tomorrow. |
Well, my project isn't working on the new chip with things moved into their isolated (as much as can be - the two CPUs need to access some common data) data areas. I don't really expect it to though, with the round-robin algorithm for stalling a CPU I'm going to lose a random number of cycles so it's no longer deterministic. Just a heads-up though, as I keep the priority setting in the posted blink example.. I've changed it to use: volatile uint32_t __scratch_x("data") data [8]; Which, as far as I know, puts all of CPU1 into its own little 4KB isolated RAM block - and it still doesn't blink (at least most of the time, I think I saw it working twice, but re-powering the device puts an end to that). I've checked the .map file, it's got everything in its right place. |
I'm a little confused because there is lots of discussion around moving things around in internal RAM, but it sounds like you are also heavily dependent on the performance of the QSPI bus. Are you using PSRAM from core 1? If so I would guess you may be starving core 0 of flash access by boosting core 1's priority. QSPI is much more likely to be the bottleneck than internal RAM. |
RP2350 has a "stalled cycles" performance event for each arbiter, which might be useful to confirm where the stalls are coming from |
My project uses PIO to access my PSRAM chip, as it was written for the RP2040 chip, not the RP2350. Nevertheless, the (broken) blink example here isn't using anything, much, other than the bus priority setting for CPU1. Do you have any other test cases I can try, which exercise the bus priority setting? It works OK on the RP2040 for me, but completely ruins CPU0 on the RP2350 for anything other than "register to register" instructions (edit: running on CPU1). |
I have no idea what they mean, but these are the performance counter values: /**
* Copyright (c) 2020 Raspberry Pi (Trading) Ltd.
*
* SPDX-License-Identifier: BSD-3-Clause
*/
#include "pico/stdlib.h"
// Pico W devices use a GPIO on the WIFI chip for the LED,
// so when building for Pico W, CYW43_WL_GPIO_LED_PIN will be defined
#ifdef CYW43_WL_GPIO_LED_PIN
#include "pico/cyw43_arch.h"
#endif
#include "pico/multicore.h"
#include "hardware/gpio.h"
#include "hardware/structs/bus_ctrl.h"
#include "hardware/structs/systick.h"
#ifndef LED_DELAY_MS
#define LED_DELAY_MS 250
#endif
// Perform initialisation
int pico_led_init(void) {
#if defined(PICO_DEFAULT_LED_PIN)
// A device like Pico that uses a GPIO for the LED will define PICO_DEFAULT_LED_PIN
// so we can use normal GPIO functionality to turn the led on and off
gpio_init(PICO_DEFAULT_LED_PIN);
gpio_set_dir(PICO_DEFAULT_LED_PIN, GPIO_OUT);
return PICO_OK;
#elif defined(CYW43_WL_GPIO_LED_PIN)
// For Pico W devices we need to initialise the driver etc
return cyw43_arch_init();
#endif
}
// Turn the led on or off
void pico_set_led(bool led_on) {
#if defined(PICO_DEFAULT_LED_PIN)
// Just set the GPIO on or off
gpio_put(PICO_DEFAULT_LED_PIN, led_on);
#elif defined(CYW43_WL_GPIO_LED_PIN)
// Ask the wifi "driver" to set the GPIO on or off
cyw43_arch_gpio_put(CYW43_WL_GPIO_LED_PIN, led_on);
#endif
}
volatile uint32_t __scratch_x("data") data [8];
void __scratch_x("proc1Entry") proc1Entry() {
// Put this CPU(1) into high priority bus mode
bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;
while (1) {
data [0] = sio_hw->gpio_in;
data [1] = sio_hw->gpio_in;
data [2] = sio_hw->gpio_in;
data [3] = sio_hw->gpio_in;
data [4] = sio_hw->gpio_in;
data [5] = sio_hw->gpio_in;
data [6] = sio_hw->gpio_in;
data [7] = sio_hw->gpio_in;
}
}
#define pause() {systick_hw->cvr = 0; (void)systick_hw->cvr; while (systick_hw->cvr > 0) {}}
int main() {
int i = 0;
stdio_init_all();
printf ("Starting\n");
systick_hw->csr = 0x5;
systick_hw->rvr = 0x00FFFFFF;
multicore_launch_core1(proc1Entry);
int rc = pico_led_init();
hard_assert(rc == PICO_OK);
bus_ctrl_hw->perfctr_en = 0;
while (i <= 0x43) {
bus_ctrl_hw->counter [0].sel = i;
bus_ctrl_hw->counter [0].value = 0;
bus_ctrl_hw->perfctr_en = 1;
pause();
bus_ctrl_hw->perfctr_en = 0;
if (bus_ctrl_hw->counter [0].value)
printf ("0x%02X=%d\n", i, bus_ctrl_hw->counter [0].value);
i++;
}
while (true) {
pico_set_led(true);
sleep_ms(LED_DELAY_MS);
pico_set_led(false);
sleep_ms(LED_DELAY_MS);
}
}
/* *** OUTPUT ***
Starting
0x03=7456545 (siob_proc1_access)
0x08=3 (apb_stall_upstream)
0x09=3 (apb_stall_downstream)
0x0B=1 (apb_access)
0x14=932068 (sram8_stall_upstream)
0x16=932068 (sram8_access_contested)
0x17=16777215 (sram8_access)
0x3B=6710887 (xip_main1_access)
0x3F=6710888 (xip_main0_access)
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] timeout waiting for ALP to be set
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] core not in reset
[CYW43] HT not ready
[CYW43] Failed to start CYW43
[CYW43] Failed to start CYW43
[CYW43] Failed to start CYW43
*/ |
The numbers are different when I comment out the set bus priority statement (and it blinks): Starting |
I am not sure what the implication is, but I found this issue by searching on "HT not ready". My case is PicoBB. I am finding that for the Pico2 GUI build, which is using the ScanVideo library to produce VGA output on Core 1, and Core 0 running the BBC BASIC interpreter attempting to open a WiFi connection I am obtaining the following diagnostic messages:
and the WiFi is failing to connect. To produce the failing executable:
Install the software on a Pico2 W on a VGA Demo Board with:
Load "bbcbasic+filesystem_gui_pico2_w_vgaboard_cut.uf2" into the pico2 W. To demonstrate failure, enter the following commands on the Keyboard attached to the pico2 W:
I still need to add further diagnostics to investigate. |
I managed to enable the CYW43 driver verbose diagnostics. I hope that this is meaningful to someone. |
A few more diagnostics |
The "ScanVideo library" does have an option to configure the bus priority, so I'd try to "configure it not to" and see if it works any better. I've been knocked out with the flu the past 2 weeks, but it would be good to get an update from the Pi guys soon on this. The last tests I had done show, with priority not being set, the RP2350 is slightly worse for CPU1 timing in my project, so I'm kind of stuck here. Is this going to end up being a note in the errata? |
I have now tried with both PICO_SCANVIDEO_ADJUST_BUS_PRIORITY=0 (which I think is default) and PICO_SCANVIDEO_ADJUST_BUS_PRIORITY=1. It does not make any difference. I don't know whether or not my issue is one of bus priority. As mentioned earlier, I posted here because this was the only place I found a mention of the "HT not ready" message. |
I no longer think my issue is the same as the OP. I have therefore opened a separate issue. I apologise for intruding on this one. |
In my Pico W project (RP2040) I use the following code to make sure the 2nd CPU doesn't get stalled (it's serving the hardware bus of an old retro computer):
bus_ctrl_hw->priority = BUSCTRL_BUS_PRIORITY_PROC1_BITS;
I've obtained some Pico 2Ws (RP2350), compiled my project for them, and I've tracked down CPU0 being so sluggish that WIFI isn't working reliably (LED isn't reliable, connections aren't either) and SD Card reads are very slow.. to this statement.
My question is, is this no longer an appropriate thing to use on the RP2350 when expecting WIFI to work? My thoughts are that the M0+ didn't have enough grunt to soak up resources to a point of failure, but maybe the M33's do?
Thanks for any advice.
The text was updated successfully, but these errors were encountered: