-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'Service Bus Namespace' Continues Running Even After $ make tre-stop
#3953
Comments
It's not possible to temporarily stop the Service Bus (or suspend the billing) without deletion. Thread below when I posed a similar question: |
Thank you @jonnyry! I deleted the 'Service Bus Namespace', but this resulted in abnormal activity in the 'Log Analytics workspace' and a lot of data ingestion, which caused a higher cost than the Service Bus Namespace itself. Is there a way to prevent abnormal activity after removing the Service Bus Namespace? @marrobi |
@BiologyGeek I guess the logs are coming from the API web app and resource processor VMSS. So if you stop both of them, as per the other issue you raised, stopping the web app won't save money, but would hopefully stop these errors being logged. |
It might be someone could look at using standard SKU service bus and having a config for users who don't require the service bus to be on a private network - for example for development purposes. |
Would it possible to switch out for one of the other (less expensive) queue/event type Azure services... Queue Storage, Event Grid, Event Hubs... are there features/characteristics of the Service Bus that we specifically require? |
I think its session support. @damoodamoo @tamirkamara may be able to advise. |
we do require session support for ordered delivery unfortunately. think that's also part of standard SKU though, so a 'dev' switch to allow it to be deployed in standard would probably be the best bet... |
Thanks @marrobi @damoodamoo I'm guessing some additional network configuration would be required once the Service Bus SKU was switched to Standard, since the private endpoints & VNET integration are no longer available...
Would that do it? In terms of locking down the source IPs/subnets, am I right in thinking the following components connect to the Service Bus?:
|
I think that's it. Re the firewall rules, you can do it in an ARM template ( https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-ip-filtering#use-resource-manager-template ), so if not supported in TF, would think can do it using AzAPI provider. |
Thanks. Re opening the Azure Firewall outbound to the service bus FQDN, is looking trickier than I first thought -
Attempted to use a network rule with a Service Tag instead of FQDN:
An network rule on IP does work, however I'm imagining the IP will not stay same for long. Not sure there's any easy solution to this one! |
Is the purpose of this PR to reduce costs when NOT in production? If so then does the IP filtering matter? As long as the none VNet service buss is only enabled by a clear flag. |
Yes correct - to reduce the cost of dev/test instances. Production would use premium SKUs (private endpoints/VNET integration etc).
I suppose not (or less so anyway). The firewall still needs opening to allow traffic out to the service bus public IP - and would be preferable if it wasn't open to any destination. |
We already do this for local dev:
I'm not sure it matters if its open to the internet for dev purposes? |
adding my £0.02, I don't think we need the Service Bus at all. It's just a FIFO queue, and there are much cheaper ways to implement that than a premium tier Service Bus, especially given that the traffic is so low that performance will never be an issue. I also don't think it's a good idea to have different architectural flavours in dev/test vs. production, that's asking for trouble. So I'd like to see this expense removed from the production instance(s), not just dev or test. |
@TonyWildish-BH There probably are other ways, but as with everything there is a time and effort to implement vs the actual cost of using the managed offering. Maybe you can suggest a design and submit a PR? (agree test and prod should be consistent, but for dev, less so - we often develop using local compute for the API, resource processor etc so we can debug and have a shorter dev loop) |
@TonyWildish-BH We use Service Bus with sessions for guaranteed ordered delivery. This is required when multiple operations stack up against a single resource, and there are multiple nodes/threads servicing those requests. I'm not hearing that the cost is really a factor in production, so if it's a case of saving costs in a dev then implementing a switch to use Standard SKU and skip a few PEs sounds pretty reasonable to me. It's a pain to pay so much more for private networking, but it's definitely a requirement for most prod workloads. |
@TonyWildish-BH yes I've recently come to that conclusion also. An enterprise message queue seems unnecessary (and costly) for tens or hundreds of messages a day. I've parked trying to refactor the service bus to a use a Standard SKU for dev/test as there's too many gnarly changes required to make it work - as you say asking for trouble when your dev/test flavour is that different from prod. @damoodamoo unfortunately more than just removing a few PEs. Here's are the key issues:
AzureTRE/core/terraform/servicebus.tf Lines 43 to 45 in 9e49ed6
Service Bus Standard SKU won't cope with these. |
thanks for the quick feedback. We've got it on our backlog to do something about the Service Bus, but it's not risen far enough up the stack yet, probably in a couple of months. Will be happy to post more details here when we get there. |
@jonnyry - thanks for the comments there, was a bunch of stuff i'd not realised. |
This might be another option to reduce dev costs when available: Azure/azure-service-bus#223 (comment) |
OK useful to know. Summarising potential options (and adding a few new ones) - please add any more to the list you have! 1. Resolve issues (above) in Service Bus SKU Standard See above for issues encountered when attempting to change Service Bus to SKU Standard. 2. Delete the Service Bus on TRE stop Previously proposed by @marrobi - alter the TRE stop/start scripts to terraform destroy and apply the Service Bus and directly associated resources (PEs etc). 3. Integrate the Service Bus emulator Wait for Service Bus emulator (see above) and integrate that in dev/test. Currently estimated for end of 2024. 4. Use different queue product Swap out Service Bus for a different queue product, e.g. RabbitMQ. 5. Use other Azure queue technology Consider using one of the other queuing products in Azure and determine whether its behaviour meets the needs of the TRE - FIFO & large messages. Can we augment/workaround any limitations in behaviour (e.g. no large messages/lack of FIFO)? 6. SQL queue library Given the relatively low message throughput consider a queue library on top of a SQL instance, e.g. https://pypi.org/project/pq/ |
@marrobi wondering what your thoughts are on replacing the Service Bus for a containerised version of RabbitMQ? Would this be accepted as a PR? |
For dev purposes as an option or completely? What compute would it run on? @damoodamoo thoughts? |
Dev and production. Not looked closely at the compute, but something like Azure Container Apps. |
It could go on the resource processor VM, we used a VMSS as expected to scale out to multiple instances, but never seen the need. the TF deployments tend to be low CPU. FWIW, could also put the API on it rather than the web app, and use something like Portainer to manage. Only worry is in production is single instance and needs to be "supported" by the user rather than being a managed service. |
Another option for RabbitMQ is to use the service from the Azure Marketplace. Trade some of the savings for better maintenance etc. |
Thanks @TonyWildish-BH . Marketplace are a challenge for us as we have to have a credit card associated with the subscription to pay the vendor. This isn't an option for us internally, as we couldn't deploy the marketplace option. |
I'd have a concern that in order to reduce cost, we're increasing complexity. What is the target running cost of a TRE in Prod or Dev that we're trying to get to? |
Guys, as RabbitMQ is open source, is it really necessary to use marketplace offers? |
that depends entirely on who you want to pay for the maintenance. You can DIY, or you can use Marketplace and let someone else do it for you. I prefer the latter, we don't have bandwidth to be sysadmins in the cloud as well. |
I like that idea, although given the resource processor is a VM scale set, does it have a stable IP / hostname? |
May be of interest - https://github.com/Azure/azure-service-bus-emulator-installer |
@jonnyry @TonyWildish-BH I had a brief look at the emulator for development/test scenarios. It was looking promising until get onto event grid subscriptions with the airlock. Event grid is heavily used by the airlock, and potentially future scenarios. These integrations requires a service bus resource ID, as per https://learn.microsoft.com/en-us/azure/event-grid/handler-service-bus, which the emulator does not have. If anyone has a suggestion would be good to hear, but at the moment, moving away form Azure PaaS resources will require a lot of DIY effort in stringing things together. |
This should reduce service bus cost down to under c £10 a month - #4256 Would only recommend for development purposes and isn't fully tested. Couple of bits to finish off. @jonnyry @TonyWildish-BH is this something that would be valuable? |
Actualyl just seen this - #3953 (comment) :-D |
@marrobi very much so :-) though few gnarly challenges to solve above. |
Here's my original attempt if it's any use: https://github.com/nwsde/nwsde-azuretre/commits/jr/54-service-bus-sku/ |
Indeed, I don’t like the emulator approach if it means dev and production are different. And it still doesn’t solve the problem that production just costs too much.
On Jan 6, 2025, at 5:11 PM, Marcus Robinson ***@***.***> wrote:
This message originated from outside of NHSmail. Please do not click links or open attachments unless you recognise the sender and know the content is safe.
Actualyl just seen this - #3953 (comment)<#3953 (comment)> :-D
—
Reply to this email directly, view it on GitHub<#3953 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEQ2NMTOTJU3AFN5NFR2MXD2JK2KTAVCNFSM6AAAAABILKSBOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZTGU2DENZYG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
************************************************************************************** ******************************
This message may contain confidential information. If you are not the intended recipient please:
i) inform the sender that you have received the message in error before deleting it; and
ii) do not disclose, copy or distribute information in this e-mail or take any action in relation to its content (to do so is strictly prohibited and may be unlawful).
Thank you for your co-operation.
NHSmail is the secure email, collaboration and directory service available for all NHS staff in England. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services.
For more information and to find out how you can switch visit Joining NHSmail – NHSmail Support<https://support.nhs.net/article-categories/joining-nhsmail/>
|
Hello team,
Is it expected behavior for the 'Service Bus Namespace' to keep running even after executing the
$ make tre-stop
command?This screenshot was captured after running the
$ make tre-stop
command:Given that the Premium tier of this service is not inexpensive, is there a way to turn it off or disable it when not needed?
The text was updated successfully, but these errors were encountered: