EEIS 東京大学大学院 工学系研究科 電気系工学専攻

SAITO Daisuke Associate Professor

Hongo Campus

Media, Intelligence & Computation
Perceptual information processing
Intelligent informatics
Intelligent robotics
Kansei informatics

Real-data-oriented Speech Information Processing and Media Information Processing

Saito Laboratory is studying and developing speech information processing, and conducting research on multimedia information processing based on the developed techniques. In particular, in recent years, we have been working on research on complex phenomena such as multiple singing, and analysis about the relationship between appearance of robots and their voice. As a research stance, we aim to create new technologies based on mathematical backgrounds and handle a wide range of media.

Research field 1

Voice design suitable for the appearance of the agent

We are studying techniques to design the voice that is suitable for its appearance for voice agents and robots. Specifically, we are developing the selection criteria for the appropriate base speaker and the technology that gives the audio artificiality.
Research field 2

Chorus information engineering and singing voice information processing

In recent years, singing information processing for single singers has developed very much. On the other hand, when we treat the harmony of multiple singers as unified systems and handle the phenomenon of chorus, simply adding a single singing is not enough. We are researching and developing technologies for the chorus that multiple people sing at the same time.
Research field 3

Aiming for speech information processing based on new principles

The paradigm shift based on large-scale data and deep learning also occur in speech information processing. On the other hand, the utilized technology tends to be fixed, and tends to rely only on development based on the amount of data. Saito Laboratory scrutinize the flow from classical technology to recent technology, and are aiming to explore voice information processing based on new principles, e.g. the simplification of self -attention mechanism in Transformer, analysis of the speech synthesis models by simplification of input information and so on.
Back to the list