| Title: | Controllable speech-driven gesture generation with selective activation of weakly supervised controls |
|---|
| Authors: | ID Crnek, Karlo (Author) ID Rojc, Matej (Author) |
| Files: | applsci-15-09467-v2.pdf (1,63 MB) MD5: C4928D336842855F4B0DD151617041C0
|
|---|
| Language: | English |
|---|
| Work type: | Article |
|---|
| Typology: | 1.01 - Original Scientific Article |
|---|
| Organization: | FERI - Faculty of Electrical Engineering and Computer Science
|
|---|
| Abstract: | Generating realistic and contextually appropriate gestures is crucial for creating engaging embodied conversational agents. Although speech is the primary input for gesture generation, adding controls like gesture velocity, hand height, and emotion is essential for generating more natural, human-like gestures. However, current approaches to controllable gesture generation often utilize a limited number of control parameters and lack the ability to activate/deactivate them selectively. Therefore, in this work, we propose the Cont-Gest model, a Transformer-based gesture generation model that enables selective control activation through masked training and a control fusion strategy. Furthermore, to better support the development of such models, we propose a novel evaluation-driven development (EDD) workflow, which combines several iterative tasks: automatic control signal extraction, control specification, visual (subjective) feedback, and objective evaluation. This workflow enables continuous monitoring of model performance and facilitates iterative refinement through feedback-driven development cycles. For objective evaluation, we are using the validated Kinetic–Hellinger distance, an objective metric that correlates strongly with the human perception of gesture quality. We evaluated multiple model configurations and control dynamics strategies within the proposed workflow. Experimental results show that Feature-wise Linear Modulation (FiLM) conditioning, combined with single-mask training and voice activity scaling, achieves the best balance between gesture quality and adherence to control inputs. |
|---|
| Keywords: | gesture generation, objective evaluation, selective control activation, transformers, weakly supervised learning |
|---|
| Publication status: | Published |
|---|
| Publication version: | Version of Record |
|---|
| Submitted for review: | 10.08.2025 |
|---|
| Article acceptance date: | 26.08.2025 |
|---|
| Publication date: | 28.08.2025 |
|---|
| Publisher: | MDPI |
|---|
| Year of publishing: | 2025 |
|---|
| Number of pages: | 21 str. |
|---|
| Numbering: | Vol. 15, iss. 17, [article no.] 9467 |
|---|
| PID: | 20.500.12556/DKUM-95219  |
|---|
| UDC: | 004.8 |
|---|
| ISSN on article: | 2076-3417 |
|---|
| COBISS.SI-ID: | 248164099  |
|---|
| DOI: | 10.3390/app15179467  |
|---|
| Copyright: | © 2025 by the authors |
|---|
| Publication date in DKUM: | 09.09.2025 |
|---|
| Views: | 0 |
|---|
| Downloads: | 4 |
|---|
| Metadata: |  |
|---|
| Categories: | Misc.
|
|---|
|
:
|
Copy citation |
|---|
| | | | Average score: | (0 votes) |
|---|
| Your score: | Voting is allowed only for logged in users. |
|---|
| Share: |  |
|---|
Hover the mouse pointer over a document title to show the abstract or click
on the title to get all document metadata. |