• j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    SD3 is super powerful under the hood. The tools given in ComfyUI are just a start. There is a ton of code to run that thing and a ton of potential to modify its behavior. I suspect these are image to image composites, but with SD3 there are 16 layers. I have no idea half the stuff they are doing. Like the google T5xxl LLM is using pytorch to swap out a whole layer; not a custom trained model, not a LoRA finetune layer, a whole layer is swapped - that’s it. I don’t even know where to start dissecting what is going on in that paradigm. From what I’ve seen, the example tools for SD3 certainly can’t do this, but my intuition is confident that there is a way within that toolchain.

    What I have done is followed Two Minute Papers on YT by Dr. Károly Zsolnai-Fehér. They are a light researcher in this space. In my typical abstractions, I’m aware of the general limitations in complexity. There are a series of three images this person has uploaded where something just didn’t feel quite right to me. This was the third, and when I chose to say something, mostly because it was less obvious than the other two. There is also a coincidence of timing with this person and another account I find curious, but that is an aside. I don’t care what they are doing or why, as much as, if correct in my assumptions, I admire them, but it is still highly speculative.

    Anyways, generative AI still struggles with complex environmental reflected light and especially color. Each of the three images look like I am in my old photo studio and placed a softbox to the side. It makes the subject pop subtly. It almost looks like a green screen like setup but not as extreme as even a high quality setup. There is an unnatural monocolor like consistency to the reflections.

    I did a lot of low lighting product photography in a makeshift studio. I spent a lot of time playing with this kind of lighting for accents and hair lights. There is a familiar artificial lighting aspect that is in line with what should be easy to train with a model and captions. I expect this simplicity to be present still within accessible generative models.

    If really good image searches were still possible, I bet this background is somewhere obscure on the internet with a different subject entirely. I could easily be wrong. This was just a stack of 3 images that all felt around 55-60% likely diffusion AI to me. Calling it out is a fun puzzle game now. I would not bet the farm, but might wager a coffee that it’s a gen.