summaryrefslogtreecommitdiff
path: root/photo-enhance-review.md
diff options
context:
space:
mode:
Diffstat (limited to 'photo-enhance-review.md')
-rw-r--r--photo-enhance-review.md678
1 files changed, 678 insertions, 0 deletions
diff --git a/photo-enhance-review.md b/photo-enhance-review.md
new file mode 100644
index 0000000..935c0b2
--- /dev/null
+++ b/photo-enhance-review.md
@@ -0,0 +1,678 @@
+Main technical risks
+1. Real-ESRGAN first, on every image, is your biggest quality risk
+
+Running every image through:
+
+4× ESRGAN
+then downscale back to original size
+
+can definitely improve some photos, but it can also introduce:
+
+hallucinated texture
+crispy foliage
+waxy skin after interaction with later steps
+fake edge detail
+zippering around fine geometry
+over-defined JPEG blocks on already compressed Fuji JPEGs
+
+This is the part I would treat as conditionally applied, not universal.
+
+My recommendation:
+
+gate ESRGAN based on image characteristics
+or at least use different strength paths for portrait vs landscape vs night
+
+Examples:
+
+portraits: maybe skip global ESRGAN or use a weaker path
+night/high-ISO: be careful, because ESRGAN can turn noise into invented detail
+landscapes/architecture: often benefit the most
+
+Right now the pipeline assumes “seen at 16K then downscaled” is always a win. It often is not.
+
+2. CodeFormer after global enhancement can amplify inconsistency
+
+CodeFormer is useful, but it can produce faces that look slightly detached from the rest of the frame if the global pipeline has already altered texture and local contrast.
+
+Potential issues:
+
+face crops look cleaner than surrounding skin/neck/hair
+restored face sharpness conflicts with depth blur/sharpen later
+multiple faces in one frame may get uneven treatment
+
+Things to consider:
+
+apply CodeFormer only when face size exceeds a threshold
+use a lower-strength/fidelity profile depending on scene
+skip CodeFormer for distant faces
+log face count and face bounding-box size into metadata
+
+That would make the workflow easier to debug when faces look “too AI.”
+
+3. Scene classification using 8 CLIP prompts is clever but brittle
+
+This is a nice lightweight idea, but it is likely the weakest decision point in the pipeline because eight prompts force coarse categorization.
+
+Possible failure cases:
+
+beach sunset might oscillate between beach, golden_hour, and landscape
+indoor portraits near a window may flip to portrait or indoor
+urban night scenes may misclassify between street and night
+cloudy mountain lake might be overcast vs landscape
+
+Because your grade profile changes exposure/contrast/saturation/detail/denoise, a wrong label can materially alter the image.
+
+Better approach:
+
+store the full prompt score distribution, not just argmax
+use top-2 or top-3 labels
+blend profiles based on confidence instead of hard-switching
+
+For example:
+
+60% landscape + 40% golden_hour
+instead of forcing one profile
+
+That would reduce sudden profile mistakes.
+
+4. CPU image ops at 4K are fine, but not yet optimized as a pipeline
+
+Your CPU-bound stages are sensible, but there are some efficiency concerns:
+
+guidedFilter and morphology/blur passes at full 4K are not trivial
+ImageScaleBy 16K → 4K on CPU may be heavier than it looks
+repeated color-space conversions and full-frame copies can become memory-bandwidth bound
+if you later parallelize multiple photos, CPU becomes the bottleneck before GPU memory does
+
+This matters because your throughput is already 40–50s/photo, and if you batch more aggressively you may saturate host CPU.
+
+I would especially watch:
+
+OpenCV allocations
+Python ↔ tensor conversion overhead inside custom nodes
+whether large intermediate tensors are duplicated unnecessarily
+5. Polling /history/<prompt_id> every 2s is workable but not ideal
+
+It is acceptable, but it is a weak point operationally.
+
+Risks:
+
+stale/incomplete history states
+long-run prompt ambiguity if ComfyUI restarts
+polling delay adds latency
+harder recovery when output partially exists but metadata doesn’t
+
+If ComfyUI or your wrapper supports websocket progress or event-driven status, that would be better. If not, I would at least strengthen state validation:
+
+ensure expected output files exist and are complete
+ensure metadata JSON corresponds to the same prefix
+distinguish timeout from partial success
+Biggest architectural improvement opportunities
+1. Add conditional routing, not one fixed pipeline for every photo
+
+Right now the graph is elegant, but it is still mostly single-path.
+
+A more robust system would route based on detected attributes:
+
+no faces → skip CodeFormer
+little/no sky → skip SkyEnhance
+low confidence scene label → use default conservative grade
+low-detail or noisy photo → reduce or skip ESRGAN
+already high-contrast/high-saturation image → apply weaker grade
+
+That would reduce over-processing and save time.
+
+2. Move from hardcoded profiles to measured image statistics
+
+Your scene profiles are sensible, but they are still hand-tuned guesses.
+
+A stronger next step would be to incorporate measured stats such as:
+
+luminance histogram
+highlight clipping ratio
+shadow floor occupancy
+saturation percentile
+edge density
+noise estimate
+face area percentage
+sky coverage
+
+Then use those stats to modulate:
+
+exposure
+saturation
+detail multiplier
+denoise
+background blur
+
+That would make the pipeline more adaptive and less prompt-dependent.
+
+3. Preserve and restore metadata more deliberately
+
+You correctly bake orientation before upload because ComfyUI strips EXIF. Good.
+
+But converting final PNG to JPEG without explicit metadata handling means you may be losing:
+
+original EXIF fields
+capture time
+lens/camera info
+ICC profile
+GPS if present
+copyright/author data
+
+That may be fine, but if the intent is “enhanced derivative of original photo,” I would consider:
+
+copying selected EXIF fields from source to final JPEG
+preserving or explicitly assigning ICC profile
+adding software tag / processing note
+optionally stripping privacy-sensitive fields by choice, not by accident
+
+Color profile handling is especially important. “No colour corrections” is not the same as “color managed.”
+
+4. Add resumability per stage, not just per photo
+
+Your manifest marks a photo done after full completion, which is good, but partial reruns still require redoing all remote processing for failed photos.
+
+You could get stronger resilience with stage-aware artifacts:
+
+oriented temp exists
+upload completed
+prompt submitted
+output downloaded
+JPEG written
+metadata written
+
+That might be too much overhead for a personal workflow, but even just logging prompt_id per source photo would help a lot with crash recovery.
+
+5. Treat JPEG as an output format decision, not a fixed end state
+
+JPEG quality 92 is reasonable, but for some images:
+
+foliage
+gradients in skies
+deep edits after enhancement
+
+JPEG may reintroduce artifacts after all that expensive work.
+
+Consider:
+
+archival output as PNG or TIFF
+delivery output as JPEG
+optional WebP/AVIF for web usage
+
+Even if you keep JPEG as primary, having a “master enhanced output” option would be useful.
+
+Specific comments on the custom stages
+AdaptivePhotoGrade
+
+This is the most promising custom logic in the workflow.
+
+Good:
+
+exposure in linear light
+contrast and saturation as explicit steps
+detail/base decomposition
+per-scene profiles
+
+Concerns:
+
+gamma 2.2 approximation is simple, but true sRGB transfer is not exactly 2.2
+clipping highlights at 1.0 can lose recoverable rolloff smoothness
+HSV saturation edits can behave poorly in skin tones and near highlights
+fixed midpoint contrast around 0.5 is simple but not content-aware
+
+If you keep evolving it, the next quality wins will likely come from:
+
+proper sRGB transfer functions
+luminance-aware saturation
+highlight/shadow selective controls
+local contrast constrained by noise estimate
+SkyEnhance
+
+Clever and cheap. Good for a CPU stage.
+
+Risks:
+
+blue clothing, windows, water, reflective buildings, and tinted glass can get caught
+sunset banding or haloing near trees/buildings
+vertical prior helps, but can still fail on mountains or upside-weighted compositions
+
+I would recommend logging:
+
+sky coverage %
+mean mask confidence
+whether sky enhancement was effectively skipped
+
+And maybe auto-disable when coverage is too low or too fragmented.
+
+DepthSelectiveSharpen
+
+This is an interesting stage, but also easy to overdo.
+
+Pros:
+
+more photographic than simple global sharpening
+can add subject separation
+
+Risks:
+
+relative depth is not segmentation
+hair, glasses, transparent objects, fences, and fine branches can create messy transitions
+background blur on an already naturally focused image may look synthetic
+blur-plus-sharpen in one stage can produce “smartphone portrait mode” artifacts
+
+I would strongly consider making this more conservative:
+
+lower default blur
+maybe sharpen foreground only, without explicit background blur
+or gate blur by scene type and depth confidence
+
+For many photos, foreground sharpening alone may be enough.
+
+Performance review
+
+Your breakdown is believable.
+
+The biggest performance cost drivers are probably:
+
+ESRGAN 4× inference
+memory movement around the 16K intermediate
+downscale from 16K to 4K
+Depth Anything inference
+
+This means the obvious speed/quality tradeoff lever is:
+
+reducing or conditionally skipping the 4× path
+
+That one decision could cut runtime materially.
+
+If you want better throughput later, likely gains are:
+
+batch submission queue with bounded concurrency
+reuse loaded models across jobs, which ComfyUI already helps with
+avoid oversized intermediates when not needed
+possibly move some CPU image ops to GPU if they become limiting
+
+But honestly, for 45 photos, the current runtime is already acceptable.
+
+Operational review
+
+This is better than average for reliability, but I would still tighten a few things:
+
+Add stronger failure modes
+
+Include distinct handling for:
+
+upload success but prompt submission failure
+prompt accepted but no output file
+PNG written but metadata JSON missing
+metadata JSON malformed
+partial download
+ComfyUI restart mid-run
+Record more run provenance
+
+In the metadata or manifest, store:
+
+prompt_id
+workflow hash
+custom node git commit or file hash
+model file hashes
+processing duration per stage if possible
+
+That will make future tuning much easier.
+
+Be careful with cache busting
+
+The random prefix solves ComfyUI cache problems, which is practical. But it also removes some chance of safe reuse. A better long-term version is:
+
+cache key derived from source hash + workflow hash + relevant params
+
+Then you can intentionally reuse identical results.
+
+Best improvements to do next
+
+If I were prioritizing, I would do these in order:
+
+make ESRGAN conditional
+store CLIP confidence distribution and blend scene profiles
+make depth stage more conservative
+preserve ICC/selected EXIF metadata
+improve failure recovery and provenance logging
+
+Those would give the best quality/reliability gains without rewriting the whole system.
+
+Bottom line
+
+This is a well-designed, thoughtful workflow with a good systems mindset behind it. The main weakness is not architecture — it is over-processing risk from always-on enhancement stages, especially:
+
+full-image ESRGAN
+hard scene-class selection
+depth-based blur/sharpen
+
+So my verdict is:
+
+architecture: strong
+operational design: strong
+quality strategy: promising, but should become more conditional and confidence-aware
+performance: acceptable for the workload
+next maturity step: adaptive routing and stronger provenance
+
+If you want, I can turn this into a more formal engineering review with sections for “keep / change / risks / recommended roadmap,” or I can propose a v2 pipeline that should improve both quality and speed.
+
+yes:
+
+# Photo Enhancement Workflow Review
+
+## Executive summary
+
+This workflow is well designed. It has a strong separation between orchestration, inference, deterministic image processing, and human review. The system is practical, understandable, and already close to a small production pipeline for batch photo enhancement.
+
+The main issue is not the infrastructure or code shape. The main issue is **quality control under an always-on enhancement strategy**. Several expensive stages are applied to every image, even though their benefit is scene-dependent and sometimes negative. The biggest gains now will come from making the workflow **conditional, confidence-aware, and slightly more conservative**.
+
+---
+
+# What should stay
+
+## 1. Ruby as the control plane
+
+This is a good choice.
+
+It gives you:
+
+* clean batch orchestration
+* simple manifest handling
+* file lifecycle control
+* easy VM lifecycle integration
+* a place to keep business logic out of ComfyUI
+
+## 2. ComfyUI as the execution graph
+
+Also a good choice.
+
+It gives you:
+
+* model reuse
+* visual graph structure
+* easy injection of runtime parameters
+* modular custom node expansion
+
+## 3. Metadata sidecar generation
+
+This is one of the strongest parts of the system.
+
+The `_e.md` and JSON sidecars make the workflow:
+
+* debuggable
+* reviewable
+* reproducible
+* easier to tune later
+
+## 4. Human review tool
+
+The comparison tool is exactly the right final step. Enhancement pipelines often fail because they assume “processed” means “better.” Yours does not.
+
+## 5. EXIF orientation bake before upload
+
+Correct and necessary. Good defensive engineering.
+
+---
+
+# What should change
+
+## 1. Stop treating enhancement as a single fixed path
+
+Right now the graph is elegant, but too uniform. The workflow should become a **decision tree**, not a single mandatory sequence.
+
+Some stages should be optional:
+
+* Real-ESRGAN
+* CodeFormer
+* SkyEnhance
+* DepthSelectiveSharpen
+* grading strength inside AdaptivePhotoGrade
+
+## 2. Make Real-ESRGAN conditional
+
+This is the highest-priority change.
+
+Current risks:
+
+* synthetic texture
+* over-crisp foliage
+* JPEG artifact amplification
+* invented microdetail
+* unnatural skin/hair
+
+### Recommendation
+
+Use ESRGAN only when:
+
+* high detail scenes (landscape, architecture)
+* strong edge density
+* visible softness or compression
+
+Avoid or weaken for:
+
+* portraits
+* night/high ISO
+* already sharp JPEGs
+
+## 3. Replace hard scene labels with blended grading
+
+Current approach uses argmax from CLIP.
+
+Problem: scenes are often mixed.
+
+### Recommendation
+
+* keep top 2–3 scene scores
+* normalize
+* blend profile parameters
+
+Example:
+
+* 0.55 landscape
+* 0.35 golden_hour
+* 0.10 overcast
+
+Blend exposure, contrast, saturation, detail, denoise.
+
+## 4. Make depth processing more conservative
+
+Default behavior should be:
+
+* foreground sharpening only
+* no background blur by default
+
+Enable blur only when:
+
+* strong subject separation
+* portrait-like composition
+
+## 5. Preserve metadata intentionally
+
+Current pipeline likely loses:
+
+* EXIF
+* ICC profile
+
+### Recommendation
+
+Preserve or explicitly manage:
+
+* capture timestamp
+* camera/lens info
+* ICC profile
+* add processing metadata
+
+---
+
+# Main risks
+
+## Quality risks
+
+### Over-processing
+
+Stacked enhancements may lead to synthetic look.
+
+### Face inconsistency
+
+CodeFormer may produce mismatch with surrounding regions.
+
+### Masking errors
+
+Sky and depth masks may:
+
+* misclassify regions
+* create halos
+
+## Operational risks
+
+### Partial success ambiguity
+
+Need stronger validation for:
+
+* missing metadata
+* partial downloads
+
+### Weak provenance
+
+Should log:
+
+* prompt_id
+* workflow hash
+* model versions
+
+### CPU bottleneck
+
+Potential hotspots:
+
+* large rescaling
+* guided filtering
+* morphology operations
+
+---
+
+# Performance review
+
+## Current state
+
+~40–50s/photo is acceptable.
+
+## Main optimization lever
+
+Make ESRGAN conditional.
+
+## Secondary lever
+
+Skip unnecessary stages when not needed.
+
+---
+
+# Recommended v2 architecture
+
+## Goal
+
+Make workflow adaptive.
+
+## Pipeline
+
+### Stage 0 — Preflight analysis
+
+Compute:
+
+* brightness histogram
+* saturation
+* edge density
+* noise estimate
+* face stats
+* sky coverage
+* CLIP scores
+
+### Stage 1 — Policy selection
+
+Decide:
+
+* ESRGAN mode
+* CodeFormer usage
+* grading blend
+* sky enhance on/off
+* depth mode
+
+### Stage 2 — Enhancement
+
+Run only selected stages.
+
+### Stage 3 — Output + metadata
+
+Include:
+
+* policy decisions
+* confidence scores
+* timings
+
+---
+
+# Example metadata (v2)
+
+```json
+{
+ "workflow_version": "photo-enhance-v2",
+ "analysis": {
+ "scene_scores": {
+ "landscape": 0.51,
+ "golden_hour": 0.28
+ },
+ "face_count": 1,
+ "sky_coverage_pct": 23.4
+ },
+ "policy": {
+ "esrgan_mode": "weak",
+ "depth_mode": "sharpen_only"
+ }
+}
+```
+
+---
+
+# Roadmap
+
+## Phase 1
+
+* conditional ESRGAN
+* blended scene grading
+* disable background blur default
+* preserve metadata
+
+## Phase 2
+
+* preflight analysis
+* gating logic for faces and sky
+* improved logging
+
+## Phase 3
+
+* better color handling (true sRGB)
+* noise-aware detail
+* improved saturation logic
+
+---
+
+# Final verdict
+
+## Strengths
+
+* strong architecture
+* practical workflow
+* good separation of concerns
+
+## Weakness
+
+* over-processing risk from always-on stages
+
+## Key improvement
+
+Move from fixed pipeline → adaptive pipeline
+
+This will improve both quality and performance significantly.
+