I've been trying to train my own loras on the last few weeks and finally managed to get good results and decided to share it with you guys.
Before I start, I just want to say that I'm not a pro at this stuff or fully understand all the technical stuff behind this, I just followed a bunch of tutorials and read some articles to be able to do this. This tutorial is also made to create Lycoris instead of traditional Loras, I did some tests and believe that for real people Lycoris is the best training for it.
Anyway, lets start. If you want to train your own models you need to have a GPU with at least 6gb of VRAM.
1. Installing kohya_ss
This is easier to understand with a video tutorial so I’m gonna leave this link here and follow the steps for the installation, I used this tutorial to install kohya in my computer:
Btw, most of the stuff on this guide already has on this video with some tweaks I made so you can follow along with my tutorial if that makes it easier to understand.
2. Preparing the dataset
You should have at least 15 pictures of the subject but the more pictures the better, most images should focus on the face but if you want the AI to also know the body shape of the person you should also get some full body pictures so you won’t have to prompt for the body when you use the Lora. Remember to always use high quality pictures because they'll make the results better. I mostly use duckduckgo to find the images since it's easier to download there compared to Gooogle Images, imagefap(dot)com also have many high quality pictures, you can also use this website to find more images.
After downloading your pictures you should resize them, in SD1.5 most models are trained on 512x512 resolution but you can use a resolution of 768x768 for better results but it takes more time, especially if you have a low VRAM GPU, I tested both resolutions and didn’t see much difference so I’m using 512x512 resolution.
To resize your images go to birme.net and upload the images there and you can crop to the desired resolution, remember to always focus on the face of the subject, here is an example on how I cropped some images so you can do the same:
After cropping the image click on SAVE AS ZIP button and download the results.
3. Folder preparation
You should create 3 folders for the training, one for images, one for the result and another one for the log, this is how I create them:
Inside the image folder, create a new folder and unzip the cropped pictures there, we will come back on that folder later.
4. Captioning the images
This is one of the most important steps on the training, captioning tells the AI what to look for when training the pictures, but it’s very easy to do since kyoha has a captioning tool on the utilities tab, for realistic models I prefer using BLIP Captioning and add “photo of” in the prefix since we will use it in the beginning of almost every caption. Here’s my settings for the captioning on kyoha:
The next step is checking the captions and editing them, you should use a token to trigger the lora when used on Stable Diffusion, a token is a word that will be related to the lora you trained, you should avoid using the name of the subject since Stable Diffusion may already have it trained on its database and will mess up with the end result so it’s best to use random characters that has no meaning, I’m using “zwx” as a token and had no problem with it. Always write “woman” or “man”, depending on the subject after the token so the AI will understand that “zwx” is that person. Here’s an example of a picture with the caption that I used for it:
Avoid describing things like hair or eye color because it will be harder to change it with prompts when you use it on SD, besides, the AI will know how these stuff looks like by the pictures, try focusing on clothes, facial expressions and poses when captioning.
5. Training setup
Like I said before, I’m no expert on the info, I just followed a bunch of tutorials and found a good setup for the training, bear in mind that I’m training for a LoCon Lycoris instead of a traditional Lora so the training parameters are different.
Your first step is choosing your model to train, I use RealisticVision V5.1 since I got the best results on it and it’s very flexible so I can use the Lora on other models as well, you can download it here:
To select the model, go on the Lora tab on kyoha > Training > Source Model > Model Quick Pick > custom and select the model on the folder you download it, if you already use it on SD you can select the file from the A1111 models folder. The Source Model tab should look like this after you select the model:
For the next step you should go to the Folders tab and select the folders you created earlier. On the Model output name you will use the name of the file, I like to use the name of the subject that I’m training plus the initials of the model so it’s easier to find on A1111. While it’s not required I recommend writing the token you used on training comment so you know what you used to trigger the model, in my case it is “zwx woman”. The Folders tab should look like this:
Before we move on to the last step we will go back to that folder we extracted the cropped pictures, that folder will be obviously used to train the picture but we have to tell kyoha how many repeats of each image it will train, I use a batch size of 2 on my training, which means kyoha trains two pictures at once with 4 epochs, for more info on that click on this article: rentry(dot)org/59xed3#number-of-stepsepochs, since I’m not very good at maths I use a very nice spreadsheet that does the calculation for me, you can download it here: . Through my tests I think that 3000 training steps is the sweet spot for real people so in this case I’ll train each image 38 times based on the calculation of the spreadsheed.
Back to the folder with images, we should rename it with xx_whatevernameyouwant (xx is the number of steps, 20 for example), kyoha will understand that it needs to train each image for the specified amount of time if you do that, change the number appropriately for the number of images you have. In my case I renamed the folder like this:
The last step are the technical stuff so I’ll just leave the setup here, paste it on notepad and save it as .json file, to use it, click on Open in the Configuration File menu and it will load everything, just change the folders to the one you will be using for training.
If you setup everything correctly, click on Start training button and wait for it to finish, I have an RTX 3070 and a training of 3000 steps takes about 30 minutes with this configuration.
After the training finish, go to the Model folder you created and select the .safetensors files there and move it to the Loras folder on A1111, if you followed my instructions you’ll have 4 files, each one representing an epoch, usually the best results are from the 03 or the file without numbers since they were trained later on, most times you will use the file with no numbers but you can test all of them on SD to see each one was better.
These are the results for the Lora that I trained I couldn't put the prompts because of the character limits but I used the models epiCrealism Pure Evolution V5, Comic Babes and Rev Animated with the 3D animation Lora, you can find all of them on Civitai, my prompts were some variations of the ones used in the demos for theses models, remember to always use hires fix and adetailer to fix faces
If you want to test this Lora I uploaded on this link:
Feel free to message me if you have any questions
Before I start, I just want to say that I'm not a pro at this stuff or fully understand all the technical stuff behind this, I just followed a bunch of tutorials and read some articles to be able to do this. This tutorial is also made to create Lycoris instead of traditional Loras, I did some tests and believe that for real people Lycoris is the best training for it.
Anyway, lets start. If you want to train your own models you need to have a GPU with at least 6gb of VRAM.
1. Installing kohya_ss
This is easier to understand with a video tutorial so I’m gonna leave this link here and follow the steps for the installation, I used this tutorial to install kohya in my computer:
Btw, most of the stuff on this guide already has on this video with some tweaks I made so you can follow along with my tutorial if that makes it easier to understand.
2. Preparing the dataset
You should have at least 15 pictures of the subject but the more pictures the better, most images should focus on the face but if you want the AI to also know the body shape of the person you should also get some full body pictures so you won’t have to prompt for the body when you use the Lora. Remember to always use high quality pictures because they'll make the results better. I mostly use duckduckgo to find the images since it's easier to download there compared to Gooogle Images, imagefap(dot)com also have many high quality pictures, you can also use this website to find more images.
After downloading your pictures you should resize them, in SD1.5 most models are trained on 512x512 resolution but you can use a resolution of 768x768 for better results but it takes more time, especially if you have a low VRAM GPU, I tested both resolutions and didn’t see much difference so I’m using 512x512 resolution.
To resize your images go to birme.net and upload the images there and you can crop to the desired resolution, remember to always focus on the face of the subject, here is an example on how I cropped some images so you can do the same:
After cropping the image click on SAVE AS ZIP button and download the results.
3. Folder preparation
You should create 3 folders for the training, one for images, one for the result and another one for the log, this is how I create them:
Inside the image folder, create a new folder and unzip the cropped pictures there, we will come back on that folder later.
4. Captioning the images
This is one of the most important steps on the training, captioning tells the AI what to look for when training the pictures, but it’s very easy to do since kyoha has a captioning tool on the utilities tab, for realistic models I prefer using BLIP Captioning and add “photo of” in the prefix since we will use it in the beginning of almost every caption. Here’s my settings for the captioning on kyoha:
The next step is checking the captions and editing them, you should use a token to trigger the lora when used on Stable Diffusion, a token is a word that will be related to the lora you trained, you should avoid using the name of the subject since Stable Diffusion may already have it trained on its database and will mess up with the end result so it’s best to use random characters that has no meaning, I’m using “zwx” as a token and had no problem with it. Always write “woman” or “man”, depending on the subject after the token so the AI will understand that “zwx” is that person. Here’s an example of a picture with the caption that I used for it:
Avoid describing things like hair or eye color because it will be harder to change it with prompts when you use it on SD, besides, the AI will know how these stuff looks like by the pictures, try focusing on clothes, facial expressions and poses when captioning.
5. Training setup
Like I said before, I’m no expert on the info, I just followed a bunch of tutorials and found a good setup for the training, bear in mind that I’m training for a LoCon Lycoris instead of a traditional Lora so the training parameters are different.
Your first step is choosing your model to train, I use RealisticVision V5.1 since I got the best results on it and it’s very flexible so I can use the Lora on other models as well, you can download it here:
To select the model, go on the Lora tab on kyoha > Training > Source Model > Model Quick Pick > custom and select the model on the folder you download it, if you already use it on SD you can select the file from the A1111 models folder. The Source Model tab should look like this after you select the model:
For the next step you should go to the Folders tab and select the folders you created earlier. On the Model output name you will use the name of the file, I like to use the name of the subject that I’m training plus the initials of the model so it’s easier to find on A1111. While it’s not required I recommend writing the token you used on training comment so you know what you used to trigger the model, in my case it is “zwx woman”. The Folders tab should look like this:
Before we move on to the last step we will go back to that folder we extracted the cropped pictures, that folder will be obviously used to train the picture but we have to tell kyoha how many repeats of each image it will train, I use a batch size of 2 on my training, which means kyoha trains two pictures at once with 4 epochs, for more info on that click on this article: rentry(dot)org/59xed3#number-of-stepsepochs, since I’m not very good at maths I use a very nice spreadsheet that does the calculation for me, you can download it here: . Through my tests I think that 3000 training steps is the sweet spot for real people so in this case I’ll train each image 38 times based on the calculation of the spreadsheed.
Back to the folder with images, we should rename it with xx_whatevernameyouwant (xx is the number of steps, 20 for example), kyoha will understand that it needs to train each image for the specified amount of time if you do that, change the number appropriately for the number of images you have. In my case I renamed the folder like this:
The last step are the technical stuff so I’ll just leave the setup here, paste it on notepad and save it as .json file, to use it, click on Open in the Configuration File menu and it will load everything, just change the folders to the one you will be using for training.
If you setup everything correctly, click on Start training button and wait for it to finish, I have an RTX 3070 and a training of 3000 steps takes about 30 minutes with this configuration.
After the training finish, go to the Model folder you created and select the .safetensors files there and move it to the Loras folder on A1111, if you followed my instructions you’ll have 4 files, each one representing an epoch, usually the best results are from the 03 or the file without numbers since they were trained later on, most times you will use the file with no numbers but you can test all of them on SD to see each one was better.
These are the results for the Lora that I trained I couldn't put the prompts because of the character limits but I used the models epiCrealism Pure Evolution V5, Comic Babes and Rev Animated with the 3D animation Lora, you can find all of them on Civitai, my prompts were some variations of the ones used in the demos for theses models, remember to always use hires fix and adetailer to fix faces
If you want to test this Lora I uploaded on this link:
Feel free to message me if you have any questions
Last edited: