You'll struggle to find an 'app' or something that just does what you're looking for. A bit time-consuming but not too complicated.
As
2klearz said, you're best to extract the audio with FFMPEG first so you're not trying to handle the whole file.
I've played around with a few online AI tools. Most are designed for music and separating out the different instruments.
Audacity have released some plugins that, while basic, I've found to work quite well.
Breaks it down into Lyrics, Bass, Drums and Melody. Just mute/delete what you don't need. Note that trying to separate actual song lyrics from moans or shouts is a whole different level.
Extract the audio with FFMPEG > Import Audio to Audacity> Split it into tracks > Edit/Delete as needed > Export Audio > ReEncode with FFMPEG.
Someone much much smarter than me could probably script a way to automate it with FFMPEG and the OpenVINO tools. Way above my head.