Abstract
This thesis presents original research on deep learning applied to photo editing.
Our work focuses on several goals we use as guiding lights, and that
distinguish it from rich, contemporaneous research in the same field.
We focus on the democratization of photo editing, not only because the methods
don’t require steep learning curves to be adopted by end users, but also because
they can run on consumer hardware. We strive to make this requirement
compatible with arbitrary resolution editing, so users don’t have to sacrifice
quality in exchange for speed or accessibility. Furthermore, we are respectful
of the work of photographers as artists and professionals, and develop lossless
parametric methods whenever possible: they act like traditional filters, rather
than replacing captured pixels with generated ones.
Our first contribution is FilterNet, an automatic enhancement method that
predicts the values of filters that should be applied to a photo in order to
improve it, rather than predicting the edited result. Predicted filters can be
applied to arbitrary resolution photos, are easily explainable because they
map to photography concepts such as exposure or color temperature, and can be
tweaked by the user at will. However, they act globally because the system is
trained with whole images.
As an intermediate step to extend FilterNet to local edits, we explore how
segmentation methods can be conditioned using text instructions, so users
can intuitively indicate the subject they are interested in. Linking visual
and textual representations is being extensively researched these days; in
particular, generative methods such as diffusion models achieve impressive
creative results in this area. Unlike these methods, we don’t attempt to
produce final versions of edited photos; instead, we demonstrate that they
can be leveraged to predict segmentation masks. Our prototype, CocoGold,
is our second contribution.
Combining FilterNet with CocoGold, we create COCONET, a pipeline that
predicts filter parameters like FilterNet, but applies them locally to the segmentation
masks created with CocoGold.
No research is ever complete. Our progress in the areas we set out to explore
opens new directions for the future, which we are excited to pursue. We end
this thesis recognizing the limitations of our prototypes and looking forward
to the next steps that can get us closer to our vision.
Journal Title
Journal ISSN
Volume Title
Publisher
Universidad Rey Juan Carlos
URL external
DOI
Date
Description
Tesis Doctoral leída en la Universidad Rey Juan Carlos de Madrid en 2025.
Supervisors
José María Cañas Plaza
Jesús Fernández Conde
Keywords
Citation
Collections
Endorsement
Review
Supplemented By
Referenced By
Document viewer
Select a file to preview:
Reload



