Advanced AI Art Techniques: ControlNet, LoRA, Inpainting & Professional Workflows

ControlNet Deep Dive

ControlNet is the single most important advancement in AI art for people who want precise control over their outputs. It works as an extension to Stable Diffusion that adds a parallel neural network, allowing you to condition image generation on spatial information extracted from reference images. In practical terms, it lets you define the exact structure, pose, or layout of your image while still using prompts to control style and content.

How ControlNet Works

ControlNet uses preprocessors to extract specific types of structural information from a reference image, then feeds that information into the generation process alongside your text prompt. The result is an image that follows the spatial structure of your reference but renders with the content and style described in your prompt.

Key Preprocessors and When to Use Them

Canny Edge Detection: Extracts sharp edge outlines from an image. Best for: architectural scenes, mechanical objects, precise structural reproduction. Use when you want the exact silhouette and contours of your reference preserved.
Depth Map: Estimates the distance of each pixel from the camera, creating a grayscale depth image. Best for: landscape compositions, maintaining spatial relationships, creating parallax-like depth. Use when you care about foreground/background separation but not exact edges.
OpenPose: Detects human body poses and outputs a skeleton diagram. Best for: character art with specific poses, group compositions, action scenes. Use when you need characters in exact positions but want creative freedom in their appearance.
Lineart: Extracts clean line drawings from images. Best for: converting sketches to finished art, maintaining illustrative structure. Use when working from hand-drawn input or when you want clean, defined shapes.
Scribble: Accepts rough, hand-drawn scribbles as spatial guidance. Best for: rapid prototyping from rough sketches. Use when you have a loose concept and want the AI to interpret it with artistic freedom.
Tile: Preserves texture and detail patterns from a reference while allowing stylistic changes. Best for: texture transfer, maintaining fine detail consistency during upscaling. Use for upscale workflows and texture-preservation tasks.
IP-Adapter: Uses a reference image to guide the overall style and composition without structural extraction. Best for: style transfer, mood matching, visual consistency across a series.

ControlNet Configuration

Two critical parameters control how strongly ControlNet influences your output:

Control Weight (0.0-2.0): How much influence the ControlNet has. Default is 1.0. Lower values (0.3-0.7) give the AI more creative freedom; higher values (1.0-1.5) enforce stricter adherence to the reference structure. Going above 1.5 often causes artifacts.
Guidance Start/End (0.0-1.0): When during the denoising process ControlNet activates. Starting at 0.0 and ending at 1.0 (default) applies influence throughout. Setting end to 0.5 lets ControlNet define the rough structure early, then allows the AI to freely add detail in the second half of generation. This produces cleaner, more natural results for many use cases.

Advanced ControlNet Prompt Example

masterful oil portrait of a Renaissance noblewoman, rich velvet dress, pearl jewelry, chiaroscuro lighting, warm candlelight, Rembrandt style, museum quality, ultra-detailed brushwork, 8K --controlnet canny --weight 0.85 --guidance_end 0.7

This prompt uses Canny edge detection from a reference photo to maintain the subject's exact pose and proportions while completely transforming the style into a Renaissance oil painting. The guidance end of 0.7 allows the AI to freely handle fine detail rendering in the final 30% of the generation process.

img2img Workflows

Image-to-image (img2img) generation takes an existing image as a starting point and transforms it according to your prompt. Unlike txt2img (which starts from pure noise), img2img begins with the structure, colors, and composition of your input image and modifies it to varying degrees based on the denoising strength you set.

Denoising Strength: The Critical Parameter

Denoising strength (0.0-1.0) controls how much the AI changes the input image:

0.1-0.3 (Low): Subtle changes. Colors shift, minor details change, but the overall composition and structure remain nearly identical. Good for color correction, subtle style adjustments, and gentle refinement.
0.4-0.6 (Medium): Significant transformation. Structure is partially preserved, but the AI has room to reinterpret elements. Ideal for style transfer and moderate composition changes.
0.7-0.9 (High): Major transformation. Only the broad composition and color palette of the original are preserved. The AI aggressively reinterprets the image. Good for converting rough sketches into finished art.
1.0: Complete regeneration. The input image has virtually no influence — this is functionally equivalent to txt2img with the same seed.

Practical img2img Workflows

Sketch to Finished Art

Draw a rough sketch (even on paper, photographed with your phone), use it as the img2img input with denoising strength 0.6-0.8, and provide a detailed prompt describing the final desired style. The AI preserves your composition while rendering it as finished artwork.

[img2img from rough pencil sketch, denoising: 0.75] detailed fantasy landscape, towering crystal spires rising from a misty valley, bioluminescent flora, ethereal twilight atmosphere, concept art, digital painting, vivid colors, highly detailed, artstation trending

Style Transfer

Take a photograph and convert it to a completely different artistic style while preserving the composition. Use denoising strength 0.45-0.65.

[img2img from photograph, denoising: 0.55] Studio Ghibli style animation background, hand-painted, soft watercolor textures, warm pastel colors, whimsical atmosphere, Hayao Miyazaki inspired, detailed environment art

Iterative Refinement

Use the output of one generation as the input for the next. Each pass refines detail and consistency. Start with high denoising (0.7) for the first pass, reduce to 0.3-0.4 for subsequent passes. This builds complexity layer by layer while maintaining coherence.

[img2img iterative pass 3, denoising: 0.35] same scene, enhanced detail on foreground foliage, sharper textures on stone architecture, improved water reflections, maintain existing color palette and composition, 8K ultra-detailed

Inpainting and Outpainting

Inpainting and outpainting are surgical editing tools that let you modify specific parts of an image without regenerating the whole thing. They are essential for professional-quality output where you need to fix details, add elements, or extend compositions.

Inpainting: Selective Regeneration

Inpainting works by masking (painting over) the area you want to change, then generating new content to fill that masked area while blending seamlessly with the surrounding image. The unmasked areas remain completely untouched.

Key Inpainting Settings

Mask blur (0-64 pixels): How much the mask edges are softened. Higher values create smoother blending between old and new content. For most work, 4-12 pixels is effective.
Inpaint area: "Whole picture" processes the entire image at the set resolution (better context understanding). "Only masked" processes just the masked area at full resolution (better detail in the repainted region). Use "only masked" for small detail fixes; "whole picture" for larger compositional changes.
Masked content: "Original" initializes the masked area from the existing image content. "Latent noise" starts from random noise. Use "original" for refinements (fixing hands, changing expressions) and "latent noise" when you want completely new content in the masked area.
Denoising strength: For inpainting, 0.4-0.7 works best for most corrections. Lower values preserve more of the original masked content; higher values allow more dramatic changes.

Inpainting Prompt Example

[Inpaint mask over hands, denoising: 0.55, mask blur: 8, only masked area, original content] beautifully detailed hands, correct anatomy, five fingers per hand, natural pose, matching skin tone and lighting of surrounding image, photorealistic, sharp detail

Outpainting: Extending the Canvas

Outpainting expands your image beyond its original borders. The AI generates new content that seamlessly continues the existing scene in any direction. This is invaluable for adjusting aspect ratios, adding environmental context, or creating panoramic versions of existing images.

Outpainting Best Practices

Extend in small increments (128-256 pixels at a time) for the most seamless results.
Include generous overlap between the existing image and the extension area so the AI has strong context for blending.
Use a prompt that describes the overall scene and specifically the content you expect in the extended area.
Match the denoising strength to the complexity of the extension. Simple sky or ground extensions work well at 0.5-0.7. Complex scene continuations may need 0.7-0.85.

[Outpaint 256px to the right] continuation of the forest path, same autumn foliage and lighting, trees becoming denser, soft fog in the distance, maintaining consistent perspective, golden hour light filtering through canopy, photorealistic

LoRA Training Basics

LoRA (Low-Rank Adaptation) is a fine-tuning technique that lets you teach a Stable Diffusion model new concepts without retraining the entire model. A LoRA adds a small set of trainable parameters (typically 10-200MB) that modify how the base model behaves when you use specific trigger words. This is how you create consistent characters, replicate specific art styles, or teach the model concepts it does not know.

What You Can Train a LoRA For

Style LoRA: Captures a specific artistic style from 15-30 reference images. Once trained, any prompt combined with your trigger word will render in that style.
Character LoRA: Learns the visual appearance of a specific character (real or fictional) from 10-20 reference images at different angles and lighting conditions. Enables consistent character reproduction across many different scenes and poses.
Concept LoRA: Teaches the model a specific object, texture, pattern, or visual concept it does not handle well natively. Useful for niche subjects, branded elements, or unusual visual concepts.

Preparing Your Training Dataset

Data quality is the most important factor in LoRA training. Follow these guidelines:

Image count: 15-30 images for styles, 10-20 for characters, 20-40 for complex concepts.
Resolution: All images should be at least 512x512 pixels. 768x768 or 1024x1024 is better. Crop and resize consistently.
Variety: For characters, include multiple angles, expressions, lighting conditions, and backgrounds. For styles, include a range of subjects rendered in the target style.
Consistency: Remove any images that do not clearly represent the target concept. One off-topic image can skew the entire training.
Captioning: Each training image needs a text caption describing what is in it. Use BLIP or WD Tagger for automatic captioning, then manually review and correct. Include your chosen trigger word in every caption.

Key Training Parameters

Learning rate: Start at 1e-4 for most LoRAs. Lower (5e-5) if you are overfitting, higher (2e-4) if the LoRA is not learning effectively.
Training steps: Typically 1500-3000 steps for a style LoRA with 20 images. Use a repeats multiplier so that total steps = images x repeats x epochs.
Network rank (dim): Controls the LoRA's capacity. Rank 16-32 works for most use cases. Higher ranks (64-128) capture more detail but risk overfitting and produce larger files.
Network alpha: Scaling factor, typically set equal to the rank or half the rank. Alpha = rank is a safe default.
Optimizer: AdamW8bit or Prodigy. Prodigy automatically adjusts the learning rate and is increasingly the recommended choice.

Using Your Trained LoRA

a majestic dragon perched on a mountain peak at sunset, dramatic storm clouds, volumetric lighting, epic fantasy scene <lora:my_custom_style:0.8>

The <lora:name:weight> syntax activates your LoRA with a given strength. Weight 0.6-0.9 is typical. Lower weights blend more subtly; higher weights apply the style more aggressively. You can combine multiple LoRAs in a single prompt by including multiple lora tags, though total combined weight should generally stay under 1.5 to avoid artifacts.

Seed Control and Variation

Every AI image generation starts from a random noise pattern, and the seed is the number that determines this starting pattern. The same seed + same prompt + same settings will always produce the same image. Understanding seed control unlocks reproducibility and systematic variation.

Why Seeds Matter

Reproducibility: Found an image you like? Record the seed and you can regenerate it exactly. Change the prompt slightly while keeping the same seed, and the overall composition will remain similar while incorporating your changes.
A/B Testing: Lock the seed and change one variable at a time (style, lighting, mood). This isolates the effect of each change, letting you learn exactly what each keyword does.
Batch Consistency: When creating a series of related images (story panels, product variants), using related seeds or the same seed with slight prompt variations helps maintain visual coherence.

Variation Seeds

Variation seeds (or subseed) let you explore controlled departures from a base composition. The variation strength (0.0-1.0) determines how far the variation deviates from the original seed's composition:

0.0-0.1: Nearly identical to the original. Micro-variations in fine detail.
0.1-0.3: Same composition, noticeably different details. Good for finding the best version of a concept.
0.3-0.5: Recognizably related but with significant differences. Good for creating series with visual family resemblance.
0.5+: Substantially different. Only broad compositional echoes of the original remain.

ethereal forest temple, ancient stone columns wrapped in luminous vines, soft beam of light through canopy, fantasy environment, detailed matte painting, 4K --seed 42 --subseed 100 --subseed_strength 0.2

Multi-Pass Generation

Multi-pass generation is the practice of building an image through sequential stages, where each pass refines or adds to the previous result. This technique produces significantly more detailed and coherent images than any single generation pass can achieve, and it is how most professional AI artists work.

The Three-Pass Workflow

Pass 1: Composition (txt2img)

Generate at a moderate resolution (512x768 or 768x1024) with your full prompt. Focus on getting the right composition, subject placement, and overall color scheme. Do not worry about fine detail yet — it will be built in later passes. Run multiple seeds until you find a composition you like.

[Pass 1 - txt2img at 768x1024] a vast underwater city with bioluminescent coral architecture, schools of exotic fish, shafts of light from the surface above, deep ocean blue and turquoise palette, fantasy concept art, wide establishing shot

Pass 2: Detail Enhancement (img2img)

Take the best output from Pass 1 and run it through img2img at higher resolution (1024x1536 or higher) with denoising strength 0.35-0.5. Refine the prompt to emphasize detail you want enhanced. This pass adds texture, sharpens features, and fills in fine detail while preserving the composition you chose.

[Pass 2 - img2img at 1536x2048, denoising: 0.4] same underwater city, enhanced architectural detail on coral buildings, intricate window patterns, visible fish scales and fin detail, light caustic patterns on surfaces, enhanced depth and atmosphere, ultra-detailed, 8K

Pass 3: Regional Refinement (Inpainting)

Identify any areas that need correction or enhancement. Use inpainting to selectively regenerate problematic regions: fix faces, correct anatomy, add missing detail, or adjust elements that feel off. This is where you bring the image from "good" to "excellent."

[Pass 3 - Inpaint focal building, denoising: 0.5] ornate coral palace with detailed arched windows, bioluminescent veins running through translucent walls, jellyfish lanterns floating at entrances, intricate carved columns, matching surrounding ocean lighting

Advanced Multi-Pass Techniques

ControlNet chaining: Extract a depth map or edge map from Pass 1 output and use it as ControlNet input for Pass 2. This locks the composition while allowing the AI to completely re-render the detail.
Resolution stepping: Generate at 512x, refine at 1024x, final pass at 2048x. Each resolution step captures detail the previous step could not represent.
Style pass: After composition and detail are finalized, do a final img2img pass at low denoising (0.15-0.25) with a style-focused prompt to unify the aesthetic of the entire image.

Upscaling Pipelines

AI art tools typically generate images at 512x512 to 1024x1024 pixels. For print, large-format display, or commercial use, you need to upscale to much higher resolutions. The right upscaling pipeline makes the difference between a pixelated mess and a crisp, detail-rich final image.

Real-ESRGAN

Real-ESRGAN is the most widely used free AI upscaler. It uses a generative adversarial network to intelligently add detail while increasing resolution, rather than simply interpolating pixels.

4x-UltraSharp: The best general-purpose model. Produces crisp, detailed results for most AI art styles. Excellent at recovering texture detail.
4x-AnimeSharp: Specialized for anime and cartoon art. Produces cleaner lines and flatter color regions appropriate for cel-shaded styles.
ESRGAN-4x: The original model. Still effective but generally outperformed by UltraSharp for photorealistic content.

Best practice: Upscale 2x twice rather than 4x once. Two passes of 2x upscaling produce cleaner results than a single 4x pass, with fewer hallucinated artifacts.

Topaz Gigapixel AI

Topaz is the premium commercial option and currently produces the highest quality upscaling results, especially for faces, fine text, and complex textures. Key advantages:

Superior face recovery — facial features remain sharp and natural at extreme upscale ratios.
Intelligent detail synthesis that matches the style of the original image.
Batch processing for high-volume workflows.
Supports up to 6x upscaling in a single pass.

Best practice: Use Topaz's "AI model" selection to match your content type (Standard, High Fidelity, Art & CG). The Art & CG model is specifically tuned for AI-generated artwork.

Stable Diffusion Hires Fix (Tiled Upscaling)

For Stable Diffusion users, the hires fix with tiled upscaling is a unique approach: it upscales the image while simultaneously running a second generation pass, adding AI-generated detail that is consistent with the original prompt. This produces the most prompt-faithful upscaled images because the AI is actively generating detail rather than just interpolating.

Set hires upscaler to ESRGAN_4x or SwinIR.
Hires denoising strength: 0.3-0.5. Lower preserves more of the original; higher regenerates more detail.
Enable tiled VAE if you encounter memory limits at high resolutions.

Recommended Pipeline for Print-Ready Output

Generate at 1024x1024 (or native resolution of your model).
Run img2img refinement pass at 1.5x resolution (denoising 0.35).
Inpaint any problem areas.
Upscale 2x with Real-ESRGAN UltraSharp.
Upscale another 2x with Topaz Gigapixel (or second Real-ESRGAN pass).
Final result: 4096x4096 to 8192x8192 at print quality.

Professional Post-Processing

Post-processing is the final stage that transforms a raw AI generation into a polished, professional image. Even the best AI output benefits from targeted adjustments. Here is the professional post-processing workflow used by serious AI artists.

Color Grading

AI-generated images often have slightly off color balance or lack the color unity of professionally graded work. In Photoshop, Lightroom, or the free GIMP/Photopea alternatives:

Adjust white balance to match your intended mood (warmer for golden hour, cooler for twilight).
Use curves adjustments to set black and white points, establishing proper contrast range.
Apply a subtle color grade using HSL sliders or a LUT to unify the palette.
Boost or reduce saturation selectively — AI images sometimes oversaturate certain hues.

Sharpening and Noise

Apply subtle sharpening (unsharp mask or high-pass filter) to the focal point of the image.
Add a light film grain layer to reduce the "too clean" digital feel that AI art sometimes has. This is especially effective for photorealistic styles.
For images that will be printed, apply output sharpening calibrated to your print medium (matte paper needs more sharpening than glossy).

Compositing and Touch-Ups

Use clone stamp or healing brush to fix small artifacts the AI missed.
Add lens effects (vignetting, chromatic aberration, lens flare) for photorealistic images to sell the "captured by a real camera" illusion.
Consider adding depth-of-field blur to background elements to guide the viewer's eye to your focal point.
For compositing multiple AI-generated elements, match lighting direction and color temperature between layers. Inconsistent lighting is the fastest way to break the illusion.

Export Settings

Web/Social: JPEG at quality 85-92, sRGB color space, 72 DPI. Target long edge under 2048px for fast loading.
Print: TIFF or PNG, Adobe RGB or ProPhoto RGB color space, 300 DPI at final print size. Embed the color profile.
Portfolio/Archive: PNG lossless, full resolution, with metadata preserved. Always keep a lossless archive copy of your best work.

Video Tutorials

These in-depth video tutorials demonstrate advanced AI art techniques in action, covering ControlNet workflows, upscaling comparisons, and professional-grade generation processes.

Frequently Asked Questions

What is ControlNet and how does it work?

ControlNet is a neural network architecture that adds precise spatial control to Stable Diffusion. It extracts structural information from a reference image (edges, depth, pose, etc.) using preprocessors, then uses that structure to guide the generation process. This lets you control the exact composition, pose, and layout of your output while still using text prompts for style and content. It works as an extension installed alongside your Stable Diffusion setup.

What is a LoRA in AI art?

LoRA (Low-Rank Adaptation) is a lightweight fine-tuning method that lets you teach an AI model new concepts, styles, or characters without retraining the entire model. A LoRA file is typically 10-200MB (compared to full model checkpoints at 2-7GB) and can be combined with base models to add specific capabilities. You activate a LoRA by including <lora:name:weight> in your prompt. Thousands of community-created LoRAs are available on Civitai.

How do I fix bad hands in AI-generated images?

Use inpainting to selectively regenerate just the hands area with a detailed prompt specifying "detailed hands, correct anatomy, five fingers." Use a moderate denoising strength (0.4-0.6) and higher CFG scale. ControlNet with OpenPose can also help by providing correct hand pose reference. Adding "malformed hands, extra fingers, mutated hands" to your negative prompt helps during initial generation. Newer model versions (SDXL, SD3, Flux) have significantly improved hand generation compared to earlier models.

What is the difference between inpainting and outpainting?

Inpainting regenerates selected areas within an existing image — you mask the area you want to change and the AI fills it with new content matching your prompt while preserving everything outside the mask. Outpainting extends the image beyond its original borders, generating new content that seamlessly continues the scene in any direction. Both are essential for professional workflows where you need precise control over specific regions of your image.

What is the best upscaler for AI art?

For most AI art, Real-ESRGAN (specifically the 4x-UltraSharp model) offers the best balance of quality and speed, and it is free. Topaz Gigapixel AI is the premium option with superior detail recovery, especially for faces and textures. For anime-style art, Real-ESRGAN-anime provides specialized upscaling. Within Stable Diffusion, the hires fix with tiled upscaling is effective for maintaining prompt-consistent detail during the upscale process.

How many images do I need to train a LoRA?

For a style LoRA, 15-30 high-quality representative images is a good starting point. For a character LoRA, 10-20 images showing the character from different angles and lighting conditions works well. Quality matters far more than quantity — carefully curated, consistent images will produce better LoRAs than hundreds of random images. Training typically takes 30-90 minutes on a modern GPU with 8GB+ VRAM.

Quick Answer: What Are Advanced AI Art Techniques?