Without the heavy processing overhead of older models, this new imaging model creates images in seconds while running on mainstream GPUs.
Stability AI has released a stable release for researchers and the public. This text-to-image generation model can run on mainstream GPUs and creates 512×512 pixel images in seconds.
The model dramatically speeds up image generation without the heavy processing overhead of previous models. It is housed under a Creative ML OpenRAIL-M license that permits commercial and non-commercial use. The software package also includes a security classifier so that users can suppress unnecessary or unwanted outputs.
Researchers and commercial users are encouraged to provide feedback on the image model and note discrepancies between the inputs and the final images. The organization notes that the models were trained on image-text pairs from wide internet scraping and may still result in some biases. With the feedback, they are confident they can improve the model to reduce and even eliminate these biases.
Team plans future datasets to expand generation options
The release will also lay the groundwork for future datasets and projects that are expected to come out at a later date. The result will also provide the basis for an open synthetic data set for research. The team will continue to share updates as they refine new models and are still accepting referral collaborators to troubleshoot further issues and refine the output.
The ultimate goal is to reduce the processing required to create models and allow more developers to take advantage of image generation for various projects. Patrick Esser from Track and Robin Rombach from Machine Vision & Learning research group at LMU Munich (Previously CompVis laboratory at the University of Heidelberg) paved the way for publication, building on their earlier work on Latent diffusion models at CVPR’22. Additionally, communities in Eleuther IA, LIONand Stability AI’s generative AI team offered full support.