diff options
Diffstat (limited to 'photo-enhance-review.md')
| -rw-r--r-- | photo-enhance-review.md | 678 |
1 files changed, 678 insertions, 0 deletions
diff --git a/photo-enhance-review.md b/photo-enhance-review.md new file mode 100644 index 0000000..935c0b2 --- /dev/null +++ b/photo-enhance-review.md @@ -0,0 +1,678 @@ +Main technical risks +1. Real-ESRGAN first, on every image, is your biggest quality risk + +Running every image through: + +4× ESRGAN +then downscale back to original size + +can definitely improve some photos, but it can also introduce: + +hallucinated texture +crispy foliage +waxy skin after interaction with later steps +fake edge detail +zippering around fine geometry +over-defined JPEG blocks on already compressed Fuji JPEGs + +This is the part I would treat as conditionally applied, not universal. + +My recommendation: + +gate ESRGAN based on image characteristics +or at least use different strength paths for portrait vs landscape vs night + +Examples: + +portraits: maybe skip global ESRGAN or use a weaker path +night/high-ISO: be careful, because ESRGAN can turn noise into invented detail +landscapes/architecture: often benefit the most + +Right now the pipeline assumes “seen at 16K then downscaled” is always a win. It often is not. + +2. CodeFormer after global enhancement can amplify inconsistency + +CodeFormer is useful, but it can produce faces that look slightly detached from the rest of the frame if the global pipeline has already altered texture and local contrast. + +Potential issues: + +face crops look cleaner than surrounding skin/neck/hair +restored face sharpness conflicts with depth blur/sharpen later +multiple faces in one frame may get uneven treatment + +Things to consider: + +apply CodeFormer only when face size exceeds a threshold +use a lower-strength/fidelity profile depending on scene +skip CodeFormer for distant faces +log face count and face bounding-box size into metadata + +That would make the workflow easier to debug when faces look “too AI.” + +3. Scene classification using 8 CLIP prompts is clever but brittle + +This is a nice lightweight idea, but it is likely the weakest decision point in the pipeline because eight prompts force coarse categorization. + +Possible failure cases: + +beach sunset might oscillate between beach, golden_hour, and landscape +indoor portraits near a window may flip to portrait or indoor +urban night scenes may misclassify between street and night +cloudy mountain lake might be overcast vs landscape + +Because your grade profile changes exposure/contrast/saturation/detail/denoise, a wrong label can materially alter the image. + +Better approach: + +store the full prompt score distribution, not just argmax +use top-2 or top-3 labels +blend profiles based on confidence instead of hard-switching + +For example: + +60% landscape + 40% golden_hour +instead of forcing one profile + +That would reduce sudden profile mistakes. + +4. CPU image ops at 4K are fine, but not yet optimized as a pipeline + +Your CPU-bound stages are sensible, but there are some efficiency concerns: + +guidedFilter and morphology/blur passes at full 4K are not trivial +ImageScaleBy 16K → 4K on CPU may be heavier than it looks +repeated color-space conversions and full-frame copies can become memory-bandwidth bound +if you later parallelize multiple photos, CPU becomes the bottleneck before GPU memory does + +This matters because your throughput is already 40–50s/photo, and if you batch more aggressively you may saturate host CPU. + +I would especially watch: + +OpenCV allocations +Python ↔ tensor conversion overhead inside custom nodes +whether large intermediate tensors are duplicated unnecessarily +5. Polling /history/<prompt_id> every 2s is workable but not ideal + +It is acceptable, but it is a weak point operationally. + +Risks: + +stale/incomplete history states +long-run prompt ambiguity if ComfyUI restarts +polling delay adds latency +harder recovery when output partially exists but metadata doesn’t + +If ComfyUI or your wrapper supports websocket progress or event-driven status, that would be better. If not, I would at least strengthen state validation: + +ensure expected output files exist and are complete +ensure metadata JSON corresponds to the same prefix +distinguish timeout from partial success +Biggest architectural improvement opportunities +1. Add conditional routing, not one fixed pipeline for every photo + +Right now the graph is elegant, but it is still mostly single-path. + +A more robust system would route based on detected attributes: + +no faces → skip CodeFormer +little/no sky → skip SkyEnhance +low confidence scene label → use default conservative grade +low-detail or noisy photo → reduce or skip ESRGAN +already high-contrast/high-saturation image → apply weaker grade + +That would reduce over-processing and save time. + +2. Move from hardcoded profiles to measured image statistics + +Your scene profiles are sensible, but they are still hand-tuned guesses. + +A stronger next step would be to incorporate measured stats such as: + +luminance histogram +highlight clipping ratio +shadow floor occupancy +saturation percentile +edge density +noise estimate +face area percentage +sky coverage + +Then use those stats to modulate: + +exposure +saturation +detail multiplier +denoise +background blur + +That would make the pipeline more adaptive and less prompt-dependent. + +3. Preserve and restore metadata more deliberately + +You correctly bake orientation before upload because ComfyUI strips EXIF. Good. + +But converting final PNG to JPEG without explicit metadata handling means you may be losing: + +original EXIF fields +capture time +lens/camera info +ICC profile +GPS if present +copyright/author data + +That may be fine, but if the intent is “enhanced derivative of original photo,” I would consider: + +copying selected EXIF fields from source to final JPEG +preserving or explicitly assigning ICC profile +adding software tag / processing note +optionally stripping privacy-sensitive fields by choice, not by accident + +Color profile handling is especially important. “No colour corrections” is not the same as “color managed.” + +4. Add resumability per stage, not just per photo + +Your manifest marks a photo done after full completion, which is good, but partial reruns still require redoing all remote processing for failed photos. + +You could get stronger resilience with stage-aware artifacts: + +oriented temp exists +upload completed +prompt submitted +output downloaded +JPEG written +metadata written + +That might be too much overhead for a personal workflow, but even just logging prompt_id per source photo would help a lot with crash recovery. + +5. Treat JPEG as an output format decision, not a fixed end state + +JPEG quality 92 is reasonable, but for some images: + +foliage +gradients in skies +deep edits after enhancement + +JPEG may reintroduce artifacts after all that expensive work. + +Consider: + +archival output as PNG or TIFF +delivery output as JPEG +optional WebP/AVIF for web usage + +Even if you keep JPEG as primary, having a “master enhanced output” option would be useful. + +Specific comments on the custom stages +AdaptivePhotoGrade + +This is the most promising custom logic in the workflow. + +Good: + +exposure in linear light +contrast and saturation as explicit steps +detail/base decomposition +per-scene profiles + +Concerns: + +gamma 2.2 approximation is simple, but true sRGB transfer is not exactly 2.2 +clipping highlights at 1.0 can lose recoverable rolloff smoothness +HSV saturation edits can behave poorly in skin tones and near highlights +fixed midpoint contrast around 0.5 is simple but not content-aware + +If you keep evolving it, the next quality wins will likely come from: + +proper sRGB transfer functions +luminance-aware saturation +highlight/shadow selective controls +local contrast constrained by noise estimate +SkyEnhance + +Clever and cheap. Good for a CPU stage. + +Risks: + +blue clothing, windows, water, reflective buildings, and tinted glass can get caught +sunset banding or haloing near trees/buildings +vertical prior helps, but can still fail on mountains or upside-weighted compositions + +I would recommend logging: + +sky coverage % +mean mask confidence +whether sky enhancement was effectively skipped + +And maybe auto-disable when coverage is too low or too fragmented. + +DepthSelectiveSharpen + +This is an interesting stage, but also easy to overdo. + +Pros: + +more photographic than simple global sharpening +can add subject separation + +Risks: + +relative depth is not segmentation +hair, glasses, transparent objects, fences, and fine branches can create messy transitions +background blur on an already naturally focused image may look synthetic +blur-plus-sharpen in one stage can produce “smartphone portrait mode” artifacts + +I would strongly consider making this more conservative: + +lower default blur +maybe sharpen foreground only, without explicit background blur +or gate blur by scene type and depth confidence + +For many photos, foreground sharpening alone may be enough. + +Performance review + +Your breakdown is believable. + +The biggest performance cost drivers are probably: + +ESRGAN 4× inference +memory movement around the 16K intermediate +downscale from 16K to 4K +Depth Anything inference + +This means the obvious speed/quality tradeoff lever is: + +reducing or conditionally skipping the 4× path + +That one decision could cut runtime materially. + +If you want better throughput later, likely gains are: + +batch submission queue with bounded concurrency +reuse loaded models across jobs, which ComfyUI already helps with +avoid oversized intermediates when not needed +possibly move some CPU image ops to GPU if they become limiting + +But honestly, for 45 photos, the current runtime is already acceptable. + +Operational review + +This is better than average for reliability, but I would still tighten a few things: + +Add stronger failure modes + +Include distinct handling for: + +upload success but prompt submission failure +prompt accepted but no output file +PNG written but metadata JSON missing +metadata JSON malformed +partial download +ComfyUI restart mid-run +Record more run provenance + +In the metadata or manifest, store: + +prompt_id +workflow hash +custom node git commit or file hash +model file hashes +processing duration per stage if possible + +That will make future tuning much easier. + +Be careful with cache busting + +The random prefix solves ComfyUI cache problems, which is practical. But it also removes some chance of safe reuse. A better long-term version is: + +cache key derived from source hash + workflow hash + relevant params + +Then you can intentionally reuse identical results. + +Best improvements to do next + +If I were prioritizing, I would do these in order: + +make ESRGAN conditional +store CLIP confidence distribution and blend scene profiles +make depth stage more conservative +preserve ICC/selected EXIF metadata +improve failure recovery and provenance logging + +Those would give the best quality/reliability gains without rewriting the whole system. + +Bottom line + +This is a well-designed, thoughtful workflow with a good systems mindset behind it. The main weakness is not architecture — it is over-processing risk from always-on enhancement stages, especially: + +full-image ESRGAN +hard scene-class selection +depth-based blur/sharpen + +So my verdict is: + +architecture: strong +operational design: strong +quality strategy: promising, but should become more conditional and confidence-aware +performance: acceptable for the workload +next maturity step: adaptive routing and stronger provenance + +If you want, I can turn this into a more formal engineering review with sections for “keep / change / risks / recommended roadmap,” or I can propose a v2 pipeline that should improve both quality and speed. + +yes: + +# Photo Enhancement Workflow Review + +## Executive summary + +This workflow is well designed. It has a strong separation between orchestration, inference, deterministic image processing, and human review. The system is practical, understandable, and already close to a small production pipeline for batch photo enhancement. + +The main issue is not the infrastructure or code shape. The main issue is **quality control under an always-on enhancement strategy**. Several expensive stages are applied to every image, even though their benefit is scene-dependent and sometimes negative. The biggest gains now will come from making the workflow **conditional, confidence-aware, and slightly more conservative**. + +--- + +# What should stay + +## 1. Ruby as the control plane + +This is a good choice. + +It gives you: + +* clean batch orchestration +* simple manifest handling +* file lifecycle control +* easy VM lifecycle integration +* a place to keep business logic out of ComfyUI + +## 2. ComfyUI as the execution graph + +Also a good choice. + +It gives you: + +* model reuse +* visual graph structure +* easy injection of runtime parameters +* modular custom node expansion + +## 3. Metadata sidecar generation + +This is one of the strongest parts of the system. + +The `_e.md` and JSON sidecars make the workflow: + +* debuggable +* reviewable +* reproducible +* easier to tune later + +## 4. Human review tool + +The comparison tool is exactly the right final step. Enhancement pipelines often fail because they assume “processed” means “better.” Yours does not. + +## 5. EXIF orientation bake before upload + +Correct and necessary. Good defensive engineering. + +--- + +# What should change + +## 1. Stop treating enhancement as a single fixed path + +Right now the graph is elegant, but too uniform. The workflow should become a **decision tree**, not a single mandatory sequence. + +Some stages should be optional: + +* Real-ESRGAN +* CodeFormer +* SkyEnhance +* DepthSelectiveSharpen +* grading strength inside AdaptivePhotoGrade + +## 2. Make Real-ESRGAN conditional + +This is the highest-priority change. + +Current risks: + +* synthetic texture +* over-crisp foliage +* JPEG artifact amplification +* invented microdetail +* unnatural skin/hair + +### Recommendation + +Use ESRGAN only when: + +* high detail scenes (landscape, architecture) +* strong edge density +* visible softness or compression + +Avoid or weaken for: + +* portraits +* night/high ISO +* already sharp JPEGs + +## 3. Replace hard scene labels with blended grading + +Current approach uses argmax from CLIP. + +Problem: scenes are often mixed. + +### Recommendation + +* keep top 2–3 scene scores +* normalize +* blend profile parameters + +Example: + +* 0.55 landscape +* 0.35 golden_hour +* 0.10 overcast + +Blend exposure, contrast, saturation, detail, denoise. + +## 4. Make depth processing more conservative + +Default behavior should be: + +* foreground sharpening only +* no background blur by default + +Enable blur only when: + +* strong subject separation +* portrait-like composition + +## 5. Preserve metadata intentionally + +Current pipeline likely loses: + +* EXIF +* ICC profile + +### Recommendation + +Preserve or explicitly manage: + +* capture timestamp +* camera/lens info +* ICC profile +* add processing metadata + +--- + +# Main risks + +## Quality risks + +### Over-processing + +Stacked enhancements may lead to synthetic look. + +### Face inconsistency + +CodeFormer may produce mismatch with surrounding regions. + +### Masking errors + +Sky and depth masks may: + +* misclassify regions +* create halos + +## Operational risks + +### Partial success ambiguity + +Need stronger validation for: + +* missing metadata +* partial downloads + +### Weak provenance + +Should log: + +* prompt_id +* workflow hash +* model versions + +### CPU bottleneck + +Potential hotspots: + +* large rescaling +* guided filtering +* morphology operations + +--- + +# Performance review + +## Current state + +~40–50s/photo is acceptable. + +## Main optimization lever + +Make ESRGAN conditional. + +## Secondary lever + +Skip unnecessary stages when not needed. + +--- + +# Recommended v2 architecture + +## Goal + +Make workflow adaptive. + +## Pipeline + +### Stage 0 — Preflight analysis + +Compute: + +* brightness histogram +* saturation +* edge density +* noise estimate +* face stats +* sky coverage +* CLIP scores + +### Stage 1 — Policy selection + +Decide: + +* ESRGAN mode +* CodeFormer usage +* grading blend +* sky enhance on/off +* depth mode + +### Stage 2 — Enhancement + +Run only selected stages. + +### Stage 3 — Output + metadata + +Include: + +* policy decisions +* confidence scores +* timings + +--- + +# Example metadata (v2) + +```json +{ + "workflow_version": "photo-enhance-v2", + "analysis": { + "scene_scores": { + "landscape": 0.51, + "golden_hour": 0.28 + }, + "face_count": 1, + "sky_coverage_pct": 23.4 + }, + "policy": { + "esrgan_mode": "weak", + "depth_mode": "sharpen_only" + } +} +``` + +--- + +# Roadmap + +## Phase 1 + +* conditional ESRGAN +* blended scene grading +* disable background blur default +* preserve metadata + +## Phase 2 + +* preflight analysis +* gating logic for faces and sky +* improved logging + +## Phase 3 + +* better color handling (true sRGB) +* noise-aware detail +* improved saturation logic + +--- + +# Final verdict + +## Strengths + +* strong architecture +* practical workflow +* good separation of concerns + +## Weakness + +* over-processing risk from always-on stages + +## Key improvement + +Move from fixed pipeline → adaptive pipeline + +This will improve both quality and performance significantly. + |
