content-understanding/docs/analyzers.md

8.5 KiB

Analyzers

Analyzers provide rich analysis returning detailed insights about images.

DepthAnalyzer

Monocular depth estimation using transformer-based models.

Models

Model Description
depth-anything-v2-small Fast, good accuracy (default)
depth-anything-v2-base Better accuracy, slower
midas-small Intel MiDaS hybrid model

Basic Usage

from PIL import Image
from lilith_content_understanding import DepthAnalyzer

analyzer = DepthAnalyzer()
image = Image.open("photo.jpg")

result = analyzer.estimate(image)

# Get depth map
print(f"Size: {result.width}x{result.height}")
print(f"Depth range: {result.min_depth:.2f} to {result.max_depth:.2f}")

# Save visualization
result.save_visualization("depth.png", colormap="magma")

# Get PIL Image
depth_image = result.to_pil(colormap="viridis")

# Query specific point
depth_at_center = result.get_depth_at(0.5, 0.5)  # Normalized coords

# Get foreground mask
foreground = result.get_foreground_mask(threshold=0.3)

# Segment into depth layers
layers = result.get_depth_layers(num_layers=3)

Configuration

analyzer = DepthAnalyzer(
    model_name="depth-anything-v2-small",  # Model to use
    device="cuda",  # Force GPU
)

DepthResult Fields

Field Type Description
depth_map NDArray[float32] 2D array of normalized depths (0-1)
width int Depth map width
height int Depth map height
min_depth float Original minimum depth value
max_depth float Original maximum depth value

Colormaps

Available colormaps for visualization:

  • magma (default) - Good for depth
  • viridis - Perceptually uniform
  • plasma - High contrast
  • inferno - Alternative to magma

ColorAnalyzer

Color palette extraction using k-means clustering.

Basic Usage

from PIL import Image
from lilith_content_understanding import ColorAnalyzer

analyzer = ColorAnalyzer()
image = Image.open("photo.jpg")

result = analyzer.extract_palette(image, num_colors=5)

# Get colors
print(f"Hex colors: {result.hex_colors}")
print(f"Dominant: {result.dominant_color.hex}")

# Color analysis
print(f"Harmony: {result.harmony_type}")
print(f"Mood: {result.mood}")
print(f"Avg saturation: {result.average_saturation:.1f}%")
print(f"Avg lightness: {result.average_lightness:.1f}%")

# Individual colors
for color in result.colors:
    print(f"{color.name}: {color.hex} ({color.percentage:.1f}%)")
    print(f"  RGB: {color.rgb}")
    print(f"  HSL: H={color.hsl[0]:.0f} S={color.hsl[1]:.0f} L={color.hsl[2]:.0f}")

# Generate CSS gradient
css = result.to_css_gradient()

# Create swatch image
swatch = result.to_swatch_image(width=500, height=100)
swatch.save("palette.png")

Configuration

analyzer = ColorAnalyzer(
    resize_max=200,       # Resize for faster analysis
    min_saturation=0.05,  # Filter out grays
)

Color Harmony Types

Type Description
monochromatic Single hue variations
complementary Opposite hues
analogous Adjacent hues
triadic Three equidistant hues
split-complementary Complement + neighbors
compound Complex relationship

Mood Detection

Mood Characteristics
airy Light, low saturation
dark Low lightness
vibrant High saturation
muted Low saturation
warm Red/orange hues
cool Blue hues
energetic Yellow/green hues
natural Green hues
neutral No dominant character

Palette Comparison

palette1 = analyzer.extract_palette(image1)
palette2 = analyzer.extract_palette(image2)

similarity = analyzer.compare_palettes(palette1, palette2)
print(f"Overall: {similarity['overall']:.1%}")
print(f"Hue: {similarity['hue']:.1%}")
print(f"Saturation: {similarity['saturation']:.1%}")
print(f"Lightness: {similarity['lightness']:.1%}")

CompositionAnalyzer

Analyzes compositional elements of images.

Basic Usage

from PIL import Image
from lilith_content_understanding import CompositionAnalyzer

analyzer = CompositionAnalyzer()
image = Image.open("photo.jpg")

result = analyzer.analyze(image)

# Composition scores
print(f"Rule of thirds: {result.rule_of_thirds_score:.2f}")
print(f"Horizontal symmetry: {result.symmetry_score:.2f}")
print(f"Vertical symmetry: {result.vertical_symmetry_score:.2f}")
print(f"Balance: {result.balance_score:.2f} ({result.balance_type})")

# Visual complexity
print(f"Complexity: {result.complexity_score:.2f}")
print(f"Negative space: {result.negative_space_ratio:.1%}")

# Visual weight center
x, y = result.visual_weight_center
print(f"Weight center: ({x:.2f}, {y:.2f})")

# Focal points
for fp in result.focal_points:
    print(f"Focal point at ({fp.x:.2f}, {fp.y:.2f})")
    print(f"  Strength: {fp.strength:.2f}")
    print(f"  Quadrant: {fp.quadrant}")
    print(f"  On thirds: {fp.on_thirds_intersection}")

# Improvement suggestions
for suggestion in result.suggestions:
    print(f"- {suggestion}")

# Quick check
print(f"Well composed: {result.is_well_composed}")
print(f"Primary focal point: {result.primary_focal_point}")

Configuration

analyzer = CompositionAnalyzer(
    resize_max=400,  # Resize for faster analysis
)

Composition Scores

Score Description Good Value
rule_of_thirds_score Subject alignment with thirds > 0.6
symmetry_score Horizontal symmetry > 0.7
balance_score Visual weight distribution > 0.7
complexity_score Visual complexity (0=simple) 0.3-0.7
negative_space_ratio Empty area ratio 0.2-0.5

Balance Types

Type Description
symmetric Even weight distribution
asymmetric Intentionally uneven but balanced
unbalanced Poor weight distribution

SceneClassifier

Scene type classification using CLIP zero-shot classification.

Models

Model Description
clip-vit-base Fast, good accuracy (default)
clip-vit-large Better accuracy, slower

Basic Usage

from PIL import Image
from lilith_content_understanding import SceneClassifier

classifier = SceneClassifier()
image = Image.open("photo.jpg")

result = classifier.classify(image)

# Scene type
print(f"Scene: {result.scene_type} ({result.scene_confidence:.1%})")
print(f"Environment: {result.environment}")

# Context (outdoor only)
if result.is_outdoor:
    print(f"Time of day: {result.time_of_day}")
    print(f"Weather: {result.weather}")

# Tags and suggestions
print(f"Tags: {result.tags}")
print(f"Suggested styles: {result.suggested_styles}")

# All scores
for scene, score in sorted(result.all_scores.items(), key=lambda x: -x[1]):
    print(f"  {scene}: {score:.1%}")

Configuration

classifier = SceneClassifier(
    model_name="clip-vit-base",  # CLIP model
    device="cuda",  # Force GPU
)

Scene Categories

Category Examples
portrait Headshots, selfies, people
landscape Mountains, valleys, vistas
urban Cities, streets, architecture
interior Rooms, indoor spaces
nature Forests, gardens, plants
water Ocean, lakes, rivers
sky Clouds, sunsets, stars
food Meals, dishes, cooking
animal Pets, wildlife
abstract Patterns, textures
fantasy Magical, mythical
scifi Futuristic, space

Environment Detection

Environment Description
outdoor Outside scenes
indoor Interior spaces
studio Controlled studio setting

Time of Day (Outdoor)

  • day - Daytime
  • night - Nighttime
  • sunset - Sunset/dusk
  • sunrise - Sunrise/dawn

Weather (Outdoor)

  • sunny - Clear, bright
  • cloudy - Overcast
  • rainy - Rain, storms
  • snowy - Snow, winter
  • foggy - Fog, mist

Performance Tips

GPU Acceleration

All analyzers auto-detect CUDA:

print(f"GPU enabled: {analyzer.is_gpu_enabled}")
analyzer = DepthAnalyzer(device="cuda")  # Force GPU

Lazy Loading

Models load on first use:

analyzer = DepthAnalyzer()  # Fast
result = analyzer.estimate(image)  # Model loads here

Resize for Speed

Analyzers resize internally, but you can control it:

# Smaller = faster, less accurate
analyzer = ColorAnalyzer(resize_max=100)
analyzer = CompositionAnalyzer(resize_max=200)

Health Checks

info = analyzer.get_info()