I'm actually using WD14 captioning now. I got better results with this method, I followed the instructions on
pkmngotrnr tutorial for it. As for the other settings, I changed some stuff, mainly to work on my hardware since I only have 8gb of VRAM.
Besides captioning, the most important thing is the dataset, I tried to choose some better pictures and did not resize them to 512x512. The results improved a lot just by doing it.