summaryrefslogtreecommitdiff
path: root/photo-enhance-review.md
blob: 935c0b205e2cbf2e002e82e57d83ec7d7e148dbd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
Main technical risks
1. Real-ESRGAN first, on every image, is your biggest quality risk

Running every image through:

4× ESRGAN
then downscale back to original size

can definitely improve some photos, but it can also introduce:

hallucinated texture
crispy foliage
waxy skin after interaction with later steps
fake edge detail
zippering around fine geometry
over-defined JPEG blocks on already compressed Fuji JPEGs

This is the part I would treat as conditionally applied, not universal.

My recommendation:

gate ESRGAN based on image characteristics
or at least use different strength paths for portrait vs landscape vs night

Examples:

portraits: maybe skip global ESRGAN or use a weaker path
night/high-ISO: be careful, because ESRGAN can turn noise into invented detail
landscapes/architecture: often benefit the most

Right now the pipeline assumes “seen at 16K then downscaled” is always a win. It often is not.

2. CodeFormer after global enhancement can amplify inconsistency

CodeFormer is useful, but it can produce faces that look slightly detached from the rest of the frame if the global pipeline has already altered texture and local contrast.

Potential issues:

face crops look cleaner than surrounding skin/neck/hair
restored face sharpness conflicts with depth blur/sharpen later
multiple faces in one frame may get uneven treatment

Things to consider:

apply CodeFormer only when face size exceeds a threshold
use a lower-strength/fidelity profile depending on scene
skip CodeFormer for distant faces
log face count and face bounding-box size into metadata

That would make the workflow easier to debug when faces look “too AI.”

3. Scene classification using 8 CLIP prompts is clever but brittle

This is a nice lightweight idea, but it is likely the weakest decision point in the pipeline because eight prompts force coarse categorization.

Possible failure cases:

beach sunset might oscillate between beach, golden_hour, and landscape
indoor portraits near a window may flip to portrait or indoor
urban night scenes may misclassify between street and night
cloudy mountain lake might be overcast vs landscape

Because your grade profile changes exposure/contrast/saturation/detail/denoise, a wrong label can materially alter the image.

Better approach:

store the full prompt score distribution, not just argmax
use top-2 or top-3 labels
blend profiles based on confidence instead of hard-switching

For example:

60% landscape + 40% golden_hour
instead of forcing one profile

That would reduce sudden profile mistakes.

4. CPU image ops at 4K are fine, but not yet optimized as a pipeline

Your CPU-bound stages are sensible, but there are some efficiency concerns:

guidedFilter and morphology/blur passes at full 4K are not trivial
ImageScaleBy 16K → 4K on CPU may be heavier than it looks
repeated color-space conversions and full-frame copies can become memory-bandwidth bound
if you later parallelize multiple photos, CPU becomes the bottleneck before GPU memory does

This matters because your throughput is already 40–50s/photo, and if you batch more aggressively you may saturate host CPU.

I would especially watch:

OpenCV allocations
Python ↔ tensor conversion overhead inside custom nodes
whether large intermediate tensors are duplicated unnecessarily
5. Polling /history/<prompt_id> every 2s is workable but not ideal

It is acceptable, but it is a weak point operationally.

Risks:

stale/incomplete history states
long-run prompt ambiguity if ComfyUI restarts
polling delay adds latency
harder recovery when output partially exists but metadata doesn’t

If ComfyUI or your wrapper supports websocket progress or event-driven status, that would be better. If not, I would at least strengthen state validation:

ensure expected output files exist and are complete
ensure metadata JSON corresponds to the same prefix
distinguish timeout from partial success
Biggest architectural improvement opportunities
1. Add conditional routing, not one fixed pipeline for every photo

Right now the graph is elegant, but it is still mostly single-path.

A more robust system would route based on detected attributes:

no faces → skip CodeFormer
little/no sky → skip SkyEnhance
low confidence scene label → use default conservative grade
low-detail or noisy photo → reduce or skip ESRGAN
already high-contrast/high-saturation image → apply weaker grade

That would reduce over-processing and save time.

2. Move from hardcoded profiles to measured image statistics

Your scene profiles are sensible, but they are still hand-tuned guesses.

A stronger next step would be to incorporate measured stats such as:

luminance histogram
highlight clipping ratio
shadow floor occupancy
saturation percentile
edge density
noise estimate
face area percentage
sky coverage

Then use those stats to modulate:

exposure
saturation
detail multiplier
denoise
background blur

That would make the pipeline more adaptive and less prompt-dependent.

3. Preserve and restore metadata more deliberately

You correctly bake orientation before upload because ComfyUI strips EXIF. Good.

But converting final PNG to JPEG without explicit metadata handling means you may be losing:

original EXIF fields
capture time
lens/camera info
ICC profile
GPS if present
copyright/author data

That may be fine, but if the intent is “enhanced derivative of original photo,” I would consider:

copying selected EXIF fields from source to final JPEG
preserving or explicitly assigning ICC profile
adding software tag / processing note
optionally stripping privacy-sensitive fields by choice, not by accident

Color profile handling is especially important. “No colour corrections” is not the same as “color managed.”

4. Add resumability per stage, not just per photo

Your manifest marks a photo done after full completion, which is good, but partial reruns still require redoing all remote processing for failed photos.

You could get stronger resilience with stage-aware artifacts:

oriented temp exists
upload completed
prompt submitted
output downloaded
JPEG written
metadata written

That might be too much overhead for a personal workflow, but even just logging prompt_id per source photo would help a lot with crash recovery.

5. Treat JPEG as an output format decision, not a fixed end state

JPEG quality 92 is reasonable, but for some images:

foliage
gradients in skies
deep edits after enhancement

JPEG may reintroduce artifacts after all that expensive work.

Consider:

archival output as PNG or TIFF
delivery output as JPEG
optional WebP/AVIF for web usage

Even if you keep JPEG as primary, having a “master enhanced output” option would be useful.

Specific comments on the custom stages
AdaptivePhotoGrade

This is the most promising custom logic in the workflow.

Good:

exposure in linear light
contrast and saturation as explicit steps
detail/base decomposition
per-scene profiles

Concerns:

gamma 2.2 approximation is simple, but true sRGB transfer is not exactly 2.2
clipping highlights at 1.0 can lose recoverable rolloff smoothness
HSV saturation edits can behave poorly in skin tones and near highlights
fixed midpoint contrast around 0.5 is simple but not content-aware

If you keep evolving it, the next quality wins will likely come from:

proper sRGB transfer functions
luminance-aware saturation
highlight/shadow selective controls
local contrast constrained by noise estimate
SkyEnhance

Clever and cheap. Good for a CPU stage.

Risks:

blue clothing, windows, water, reflective buildings, and tinted glass can get caught
sunset banding or haloing near trees/buildings
vertical prior helps, but can still fail on mountains or upside-weighted compositions

I would recommend logging:

sky coverage %
mean mask confidence
whether sky enhancement was effectively skipped

And maybe auto-disable when coverage is too low or too fragmented.

DepthSelectiveSharpen

This is an interesting stage, but also easy to overdo.

Pros:

more photographic than simple global sharpening
can add subject separation

Risks:

relative depth is not segmentation
hair, glasses, transparent objects, fences, and fine branches can create messy transitions
background blur on an already naturally focused image may look synthetic
blur-plus-sharpen in one stage can produce “smartphone portrait mode” artifacts

I would strongly consider making this more conservative:

lower default blur
maybe sharpen foreground only, without explicit background blur
or gate blur by scene type and depth confidence

For many photos, foreground sharpening alone may be enough.

Performance review

Your breakdown is believable.

The biggest performance cost drivers are probably:

ESRGAN 4× inference
memory movement around the 16K intermediate
downscale from 16K to 4K
Depth Anything inference

This means the obvious speed/quality tradeoff lever is:

reducing or conditionally skipping the 4× path

That one decision could cut runtime materially.

If you want better throughput later, likely gains are:

batch submission queue with bounded concurrency
reuse loaded models across jobs, which ComfyUI already helps with
avoid oversized intermediates when not needed
possibly move some CPU image ops to GPU if they become limiting

But honestly, for 45 photos, the current runtime is already acceptable.

Operational review

This is better than average for reliability, but I would still tighten a few things:

Add stronger failure modes

Include distinct handling for:

upload success but prompt submission failure
prompt accepted but no output file
PNG written but metadata JSON missing
metadata JSON malformed
partial download
ComfyUI restart mid-run
Record more run provenance

In the metadata or manifest, store:

prompt_id
workflow hash
custom node git commit or file hash
model file hashes
processing duration per stage if possible

That will make future tuning much easier.

Be careful with cache busting

The random prefix solves ComfyUI cache problems, which is practical. But it also removes some chance of safe reuse. A better long-term version is:

cache key derived from source hash + workflow hash + relevant params

Then you can intentionally reuse identical results.

Best improvements to do next

If I were prioritizing, I would do these in order:

make ESRGAN conditional
store CLIP confidence distribution and blend scene profiles
make depth stage more conservative
preserve ICC/selected EXIF metadata
improve failure recovery and provenance logging

Those would give the best quality/reliability gains without rewriting the whole system.

Bottom line

This is a well-designed, thoughtful workflow with a good systems mindset behind it. The main weakness is not architecture — it is over-processing risk from always-on enhancement stages, especially:

full-image ESRGAN
hard scene-class selection
depth-based blur/sharpen

So my verdict is:

architecture: strong
operational design: strong
quality strategy: promising, but should become more conditional and confidence-aware
performance: acceptable for the workload
next maturity step: adaptive routing and stronger provenance

If you want, I can turn this into a more formal engineering review with sections for “keep / change / risks / recommended roadmap,” or I can propose a v2 pipeline that should improve both quality and speed.

yes:

# Photo Enhancement Workflow Review

## Executive summary

This workflow is well designed. It has a strong separation between orchestration, inference, deterministic image processing, and human review. The system is practical, understandable, and already close to a small production pipeline for batch photo enhancement.

The main issue is not the infrastructure or code shape. The main issue is **quality control under an always-on enhancement strategy**. Several expensive stages are applied to every image, even though their benefit is scene-dependent and sometimes negative. The biggest gains now will come from making the workflow **conditional, confidence-aware, and slightly more conservative**.

---

# What should stay

## 1. Ruby as the control plane

This is a good choice.

It gives you:

* clean batch orchestration
* simple manifest handling
* file lifecycle control
* easy VM lifecycle integration
* a place to keep business logic out of ComfyUI

## 2. ComfyUI as the execution graph

Also a good choice.

It gives you:

* model reuse
* visual graph structure
* easy injection of runtime parameters
* modular custom node expansion

## 3. Metadata sidecar generation

This is one of the strongest parts of the system.

The `_e.md` and JSON sidecars make the workflow:

* debuggable
* reviewable
* reproducible
* easier to tune later

## 4. Human review tool

The comparison tool is exactly the right final step. Enhancement pipelines often fail because they assume “processed” means “better.” Yours does not.

## 5. EXIF orientation bake before upload

Correct and necessary. Good defensive engineering.

---

# What should change

## 1. Stop treating enhancement as a single fixed path

Right now the graph is elegant, but too uniform. The workflow should become a **decision tree**, not a single mandatory sequence.

Some stages should be optional:

* Real-ESRGAN
* CodeFormer
* SkyEnhance
* DepthSelectiveSharpen
* grading strength inside AdaptivePhotoGrade

## 2. Make Real-ESRGAN conditional

This is the highest-priority change.

Current risks:

* synthetic texture
* over-crisp foliage
* JPEG artifact amplification
* invented microdetail
* unnatural skin/hair

### Recommendation

Use ESRGAN only when:

* high detail scenes (landscape, architecture)
* strong edge density
* visible softness or compression

Avoid or weaken for:

* portraits
* night/high ISO
* already sharp JPEGs

## 3. Replace hard scene labels with blended grading

Current approach uses argmax from CLIP.

Problem: scenes are often mixed.

### Recommendation

* keep top 2–3 scene scores
* normalize
* blend profile parameters

Example:

* 0.55 landscape
* 0.35 golden_hour
* 0.10 overcast

Blend exposure, contrast, saturation, detail, denoise.

## 4. Make depth processing more conservative

Default behavior should be:

* foreground sharpening only
* no background blur by default

Enable blur only when:

* strong subject separation
* portrait-like composition

## 5. Preserve metadata intentionally

Current pipeline likely loses:

* EXIF
* ICC profile

### Recommendation

Preserve or explicitly manage:

* capture timestamp
* camera/lens info
* ICC profile
* add processing metadata

---

# Main risks

## Quality risks

### Over-processing

Stacked enhancements may lead to synthetic look.

### Face inconsistency

CodeFormer may produce mismatch with surrounding regions.

### Masking errors

Sky and depth masks may:

* misclassify regions
* create halos

## Operational risks

### Partial success ambiguity

Need stronger validation for:

* missing metadata
* partial downloads

### Weak provenance

Should log:

* prompt_id
* workflow hash
* model versions

### CPU bottleneck

Potential hotspots:

* large rescaling
* guided filtering
* morphology operations

---

# Performance review

## Current state

~40–50s/photo is acceptable.

## Main optimization lever

Make ESRGAN conditional.

## Secondary lever

Skip unnecessary stages when not needed.

---

# Recommended v2 architecture

## Goal

Make workflow adaptive.

## Pipeline

### Stage 0 — Preflight analysis

Compute:

* brightness histogram
* saturation
* edge density
* noise estimate
* face stats
* sky coverage
* CLIP scores

### Stage 1 — Policy selection

Decide:

* ESRGAN mode
* CodeFormer usage
* grading blend
* sky enhance on/off
* depth mode

### Stage 2 — Enhancement

Run only selected stages.

### Stage 3 — Output + metadata

Include:

* policy decisions
* confidence scores
* timings

---

# Example metadata (v2)

```json
{
  "workflow_version": "photo-enhance-v2",
  "analysis": {
    "scene_scores": {
      "landscape": 0.51,
      "golden_hour": 0.28
    },
    "face_count": 1,
    "sky_coverage_pct": 23.4
  },
  "policy": {
    "esrgan_mode": "weak",
    "depth_mode": "sharpen_only"
  }
}
```

---

# Roadmap

## Phase 1

* conditional ESRGAN
* blended scene grading
* disable background blur default
* preserve metadata

## Phase 2

* preflight analysis
* gating logic for faces and sky
* improved logging

## Phase 3

* better color handling (true sRGB)
* noise-aware detail
* improved saturation logic

---

# Final verdict

## Strengths

* strong architecture
* practical workflow
* good separation of concerns

## Weakness

* over-processing risk from always-on stages

## Key improvement

Move from fixed pipeline → adaptive pipeline

This will improve both quality and performance significantly.