Pythagoras II – AIIA LAB

Pythagoras II – Sub-project 31: Effective organization, navigation and information retrieval in multimedia.

The research conducted during the present sub-project aimed at the development of hybrid organization and information retrieval techniques. The aforementioned techniques exploit the informative content of texts, sound and videos. In total 5 objectives were achieved concerning the analysis of audiovisual data:

Emotional speech clustering. The usage of self-organizing maps was investigated.

Dialogue detection in movies. Either the sound signal or the visual signal is exploited for movie dialogue detection. The effectiveness of various classifiers is assessed, such as multilayer perceptrons, radial basis functions neural networks, support vector machines, and boosting meta-classifiers e.g. Adaboost. Speaker clustering is achieved using SIFT features. The expected utterance duration between two persons that participate in movie dialogues is estimated. Speech detection algorithms that exploit the visual signal image sequences are proposed.

Speaker segmentation and clustering. A review of milestone methods is carried out and new methods are proposed both for speaker segmentation and for speaker clustering.

Musical instrument classification. The technique of non-negative matrix factorization (NMF) for musical instrument classification is studied and various improvements and heuristics are proposed.

Emotion processing. Techniques for statistically-based facial expressions synthesis are developed.

The developed techniques support multimedia annotation and they are part of the research conducted on the MPEG7 prototype and the semantic web. This way information retrieval is facilitated. Moreover, the developed techniques can be exploited by the cinematographic industry, the computer animated graphics companies, the mass media (television and radio broadcasts), and the Internet (for example in web 2.0 services). From another point of view, the developed techniques can be employed in anthropocentric computer-human interaction.

Journal Papers

[1] V. Moschou, D. Ververidis, and C. Kotropoulos, “Assessment of self-organizing map variants for clustering with application to redistribution of emotional speech patterns,” Neurocomputing, Special Issue Advances of Neural Networks for Audio and Speech Processing, vol. 71, nos 1-3, pp. 147-156, December 2007.

[2] M. Kotti, E. Benetos, C. Kotropoulos, and I. Pitas, “A neural network approach for audio-assisted movie dialogue detection,” Special Issue Advances of Neural Networks for Audio and Speech Processing, vol. 71, nos 1-3, pp. 157-166, December 2007.

[3] M. Kotti, V. Moschou, and C. Kotropoulos, “Speaker segmentation and clustering,” Signal Processing, vol. 88, no. 5, pp. 1091-1124, May 2008.

[4] E. Benetos, S. Siatras, C. Kotropoulos, N. Nikolaidis, and I. Pitas “Movie analysis with emphasis to dialogue and action scene detection”, Multimodal Processing and Interaction: Audio, Video, and Text (P. Maragos, A. Potamianos, and P. Gross, Eds.) Springer, 2008.

[5] S. Krinidis and I. Pitas, “Statistical Analysis of Human Facial Expressions”, IEEE Trans. Circuits and Systems for Video Technology, under review.

[6] S. Siatras, N. Nikolaidis, and I. Pitas, “Visual lip activity detection and speaker detection using mouth region intensities,” IEEE Trans. Circuits and Systems for Video Technology, in revision.

[7] Panagiotis Antonopoulos, Nikos Nikolaidis, Ioannis Pitas, “Video Face Clustering Using SIFT Features”, IEEE Trans. Circuits and Systems for Video Technology, under review.

Conference Papers

[8] E. Benetos, M. Kotti, and C. Kotropoulos, “Application of non-negative matrix factorization to musical instrument classification,” in Proc. 2006 2^nd Int. Symp. Communications, Control, and Signal Processing (ISCCSP2006), Marrakech, Morocco, March 13-15, 2006.

[9] E. Benetos, M. Kotti, and C. Kotropoulos, “Musical instrument classification using non-negative matrix factorization algorithms,” in Proc. 2006 IEEE Int. Symp. Circuits and Systems May 21- 24, Kos, Greece.

[10] E. Benetos, M. Kotti, and C. Kotropoulos, “Musical instrument classification using non-negative matrix factorization and subset feature selection,” in Proc. 2006 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. V, pp. 2221-224, May 14-18, Toulouse, France.

[11] E. Benetos, M. Kotti, and C. Kotropoulos, “Applying supervised classifiers based on non-negative matrix factorization to musical instrument classification,” Proc. 2006 IEEE Int. Conf. Multimedia and Expo, pp. 2105-2108, July 2006, Toronto, Canada.

[12] V. Moschou, D. Ververidis, and C. Kotropoulos, “On the variants of the self-organizing map that are based on order statistics,” in Lecture Notes in Computer Science, Artificial Neural Networks – ICANN 2006, vol. 4131, pp. 426-434, Athens, September 11-13, 2006.

[13] S. Siatras, N. Nikolaidis, and I. Pitas “Visual speech detection using mouth region intensities”, in Proc. 2006 European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, 4-8 September 2006.

[14] S. Krinidis and I. Pitas, “Facial Expression Synthesis through facial expressions statistical analysis”, in Proc. 2006 European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, 4-8 September 2006.