LoRA vs LoCON vs LoHA vs LoKR
LoRA (aka LoRA-LierLa)
Low-Ranking Adaptation. Core technology, that began everything. In short, imagine any model as a cake with a lot of layers (let's say, 22). Instead of modifying whole cake, you can separate each and every layer, and put in your own stuff between each layer - getting 'better' cake without re-baking it whole.
You can specify essentially only 2 settings in LoRA-LierLa.
- Network Dimension Count - how big your own 'layers' between 'cake layers' will be. Cake consists of layers with size of 4096, so, there is absolutely no meaning to make your own layers bigger than that number. More than that, you probably would WANT to keep it very low. Imagine it like flavour of cake. If you add too many strawberries, then you will not be eating cake anymore, it will be strawberries only. If you give network too many dimensions to work with, it will simply remember your dataset and won't allow to get any other result. However, too small of a number may drop crucial details. It is recommended to start with 8 to 16 dimensions, and try to increase/decrease them by halves of itself (so 16 will be either decreased or increased by 8 → become 8 or 24). Don't make network smaller than 4 and bigger than 256 dimensions - first one will simply equal to 0; other will overtrain frequently
- Network Dimension Alpha Value - is a measure to prevent LoRA to turn into zeroes (and therefore resulting in black images). When you train networks, weights of individual layers may become so small, they will be simply rounded to zero; winch will result in absolute absence of training. To prevent this, weights deliberately multiplied by large number (alpha), and then divided by rank (dimension count), so changes to them will be more 'visible' to network. If you set the value to 1, then all weights will be as large as your dimension count (should help you capture every single change, but will require more space and effort). If you set the value to half of your dimension, then weight multiplier will be 0.5 (preventing most of the collapses, but probably not all of them, and saving you time and memory). And, if you set the value equal to your dimension count, multiplier will be 1; therefore disabling Alpha at all - giving you high risk of losing progress in certain situations. Most resources recommend by starting with dim/2 value (so, if your Dimension Count = 16; then set Alpha to 8).
LoCON (aka LoRA-C3Lier, aka LoRA w/support for CONvolutional layers)
A direct upgrade to LoRA-LierLa. Instead of inserting a single layer between cakes, you insert three of them. Like 'oreo' meme, but 'orerereo' :D
Since you control and teach more than one layer, you should have better control over details and accuracy, and therefore, higher quality of resulting LoRAs, therefore LoCONs are advised. Also, now only maximum of 64 Dimensions is advised, instead of 256.
It also adds two more settings for you to play with:
- Conv 2D 3x3 Dimension Count - how big is 'middle' layer between layers of Network Dimension Count. Works the same way as Network Dimension Count. Recommended value - half of your Network Dimension Count (so if you have network_dim=16, then set your conv_dim to 8). However, if you don't set it, by default it will be equal to size of network dimensions. If you set it to 0, it will become LoRA-LierLa.
- Conv 2D 3x3 Alpha Value - same as Network version. This time, since Conv2D is usually pretty small, it is recommended to keep this value at 1. If you don't set it, it will equal to the size of Network dimension alpha.
You should use LoCON when you need something exact - like character. It also works for other things, tho, just will tend to 'copy' instead of 'generalization'.
LoHA (LOw-rank HAdamard product)
Essentially, LoHA is two separately trained networks, that, when intersected, result in much wider field of work. If we still talking about cake, then LoHA will be not the icing between cake layers; but a trained pair of hands that put that icing here. You train two hands separately, but when they work in tandem they will result in a lot of different, but generally the same cakes in terms of flavour.
Because LoHA is essentially a matrix, it will always result in squared actual rank of itself. So if you set it to Dimension = 8; then it will act like LoCON with dimension of 64. Therefore, there is absoluely no meaning to set Dimension to any number higher than 64, since 64 squared is 4096.
You should try to use LoHA when you try to 'generalize' something - for example, drawing style or some form of concept/pose.
It accepts same parameters as LoCON - both network_dim, network_alpha and conv_dim, conv_alpha.
You are most likely to love to use LoHAs everywhere, not 'just styles'. Reasoning is quite simple - it 'generalizes' instead of 'replicating', and therefore allows easier modification of subject in training - adding it new colors, poses, etc. LOHA with dim 8 is extremely optimal.
LoKR (LOw-rank KRonecker product)
Basically, same idea as LoHA, but a try to make it more efficient, by using a separate matrix for weights.
In addition to every setting from LoHA, it introduces couple changes and one new setting:
- Kronecker Factor: a value from -1 to 8. Smaller factor results in smaller file, larger factor - tries to emulate LoRAs.
- You can set Dimension to 10000 or more. This prevents second block from being decomposed, and disables Alpha value for this block as well.
LoKR is harder to transfer between different models; so if you trained it on Stable Diffusion - you might not be able to run this model on, for example, Furry models.
Low LR vs High LR
High LR (0.0001 '1e-4' or higher)
When using algorithms like LoCON, you will need much less time for training, but you might really miss on some details. When using LoHA or LoKR, it may overtrain pretty quickly.
Low LR (0.00001 '1e-5' or lower)
Gives you much better result, but requires more time AND may not be able to find an optimal point when falling into pitfalls. Notice - when training larger images (such as 640x640, 768x768) - you REALLY want to lower your LR.
Low Tag Dropout Rate vs High Tag Dropout Rate
Low Tag Dropout Rate (10% or less)
Requires you to make much more tagged prompts when using this LoRA, or it wouldn't work.
High Tag Dropout Rate (20% or more)
Will likely embed style in LoRA, so if you train character or concept, you might want to keep out of high dropouts. However, keeping high dropout allows you to get more consistent result on SKS-less images when training style, that's for sure.
AdamW8bit vs Lion8bit
AdamW8bit
Very good staring point. Generally good for small datasets (30-100 images). The lower LR is, the better it performs.
Lion8bit
Evolved AdamW. Gets to optimas extremely quickly, so it is either requires small epochs OR huge datasets (1000+ images). It seems like it literally ignores Learning Rate, as it got overtrained even when I set it to 1e-6.