Pose estimation verbeteren op kunstcollecties aan de hand van style transfer

Student:	Tristan Verheecke
Richting:	Master of Science in de industriële wetenschappen: informatica
Abstract:	Wegens digitalisering krijgen musea de mogelijkheid om hun kunstcollectie beter te analyseren. Belangrijke connecties tussen kunstwerken kunnen op deze manier worden blootgelegd, wat nuttig kan zijn voor classificatie en retrieval. Musea steken veel moeite in dit proces, maar het kan erg arbeidsintensief zijn om dit handmatig te doen. Om dit probleem te elimineren, hebben ze geprobeerd deze taken te automatiseren met behulp van computervisie methoden. Op het gebied van computervisie is er een grote hoeveelheid onderzoek beschikbaar op het gebied van image classification, semantic segmentation, object detection and 2D/3D human pose estimation (HPE). Het blijkt echter dat deze algoritmen niet geschikt zijn voor taken op kunstcollecties, omdat ze zijn getraind op foto’s. Deze thesis behandelt het HPE-probleem en welke methoden kunnen worden gebruikt om de prestaties op kunstcollecties te verbeteren. Er kunnen twee tekortkomingen worden geïdentificeerd: onvolledige keypoint-voorspelling en verkeerde pose-associatie. Om dit probleem op te lossen, stelt dit artikel een methode voor die state-of-the-art (SOTA) HPE-modellen verfijnt met een combinatie van styled COCO-datasets. Er zijn drie datasets gemaakt op basis van de WikiArt-dataset die barok, renaissance en impressionisme vertegenwoordigen. Uit deze genres wordt met behulp van content-based image retrieval een selectie van figuratieve schilderijen gemaakt. Vervolgens wordt voor elk style transfer model eerst een mix van genres gebruikt en ten tweede één met uitsluitend impressionisme om een styled COCO-dataset te creëren. Dit wordt gedaan voor CycleGAN en AdaIN. Vervolgens worden de SWAHR en ViTPose pose estimation models verfijnd op de COCO-dataset in combinatie met de styled COCO-dataset, en alleen de styled COCO-dataset. Dit maakt een totaal van 16 modellen die zijn geëvalueerd en waarin een consistente verbetering in pose estimation werd gevonden.
Abstract (Eng):	Through digitalization, museums are given the ability to more efficiently analyze their art collections. Important connections between artworks can be uncovered this way, which can be useful for classification or retrieval. Museums put a great amount of effort in this process, but it can be very labor intensive doing this manually. To eliminate this issue, they’ve sought to automate these tasks using computer vision methods. In computer vision, there’s a rich volume of research in image classification, semantic segmentation, object detection and 2D/3D human pose estimation (HPE). It turns out however, that these algorithms aren’t suitable for tasks on art collections as they were trained on photographs. This thesis will deal with the HPE problem and what methods can be used to improve performance on art collections. Two shortcomings can be identified: incomplete keypoint prediction and wrong pose association. To solve this problem, this thesis proposes a method which fine-tunes state-of-the-art (SOTA) HPE models with a combination of stylized COCO datasets. Three datasets were created from the WikiArt dataset representing baroque, renaissance and impressionism. From those genres a selection of figurative paintings is made using content-based image retrieval. Then for each style transfer model, first, a mixture of genres is used and, second, one with only impressionism to create a stylized COCO dataset. This is done for CycleGAN and AdaIN. Then, the SWAHR and ViTPose pose estimation models are fine-tuned on the COCO dataset in combination with the stylized COCO dataset, and with only the stylized COCO dataset. This makes a total of 16 models that are evaluated and in which a consistent improvement in pose estimation prediction was found.