| | SLO | ENG | Cookies and privacy

Bigger font | Smaller font

Show document Help

Title:Controllable speech-driven gesture generation with selective activation of weakly supervised controls
Authors:ID Crnek, Karlo (Author)
ID Rojc, Matej (Author)
Files:.pdf applsci-15-09467-v2.pdf (1,63 MB)
MD5: C4928D336842855F4B0DD151617041C0
 
Language:English
Work type:Article
Typology:1.01 - Original Scientific Article
Organization:FERI - Faculty of Electrical Engineering and Computer Science
Abstract:Generating realistic and contextually appropriate gestures is crucial for creating engaging embodied conversational agents. Although speech is the primary input for gesture generation, adding controls like gesture velocity, hand height, and emotion is essential for generating more natural, human-like gestures. However, current approaches to controllable gesture generation often utilize a limited number of control parameters and lack the ability to activate/deactivate them selectively. Therefore, in this work, we propose the Cont-Gest model, a Transformer-based gesture generation model that enables selective control activation through masked training and a control fusion strategy. Furthermore, to better support the development of such models, we propose a novel evaluation-driven development (EDD) workflow, which combines several iterative tasks: automatic control signal extraction, control specification, visual (subjective) feedback, and objective evaluation. This workflow enables continuous monitoring of model performance and facilitates iterative refinement through feedback-driven development cycles. For objective evaluation, we are using the validated Kinetic–Hellinger distance, an objective metric that correlates strongly with the human perception of gesture quality. We evaluated multiple model configurations and control dynamics strategies within the proposed workflow. Experimental results show that Feature-wise Linear Modulation (FiLM) conditioning, combined with single-mask training and voice activity scaling, achieves the best balance between gesture quality and adherence to control inputs.
Keywords:gesture generation, objective evaluation, selective control activation, transformers, weakly supervised learning
Publication status:Published
Publication version:Version of Record
Submitted for review:10.08.2025
Article acceptance date:26.08.2025
Publication date:28.08.2025
Publisher:MDPI
Year of publishing:2025
Number of pages:21 str.
Numbering:Vol. 15, iss. 17, [article no.] 9467
PID:20.500.12556/DKUM-95219 New window
UDC:004.8
ISSN on article:2076-3417
COBISS.SI-ID:248164099 New window
DOI:10.3390/app15179467 New window
Copyright:© 2025 by the authors
Publication date in DKUM:09.09.2025
Views:0
Downloads:4
Metadata:XML DC-XML DC-RDF
Categories:Misc.
:
Copy citation
  
Average score:(0 votes)
Your score:Voting is allowed only for logged in users.
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Record is a part of a journal

Title:Applied sciences
Shortened title:Appl. sci.
Publisher:MDPI
ISSN:2076-3417
COBISS.SI-ID:522979353 New window

Document is financed by a project

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P2-0069-2018
Name:Napredne metode interakcij v telekomunikacijah

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.

Secondary language

Language:Slovenian
Keywords:ustvarjanje gest, objektivno vrednotenje, transfromatorji, slabo nadzorovano učenje


Comments

Leave comment

You must log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back
Logos of partners University of Maribor University of Ljubljana University of Primorska University of Nova Gorica