EEIS 東京大学大学院 工学系研究科 電気系工学専攻

MINEMATSU Nobuaki Professor

Hongo Campus

Media, Intelligence & Computation
Cognitive science
Perceptual information processing
Intelligent informatics
Kansei informatics
Learning support system

Assistive technology for speech communication using computers that can talk with, listen to, and support users

Speech-to-text (speech recognition) and text-to-speech (speech synthesis) are working well even on smartphones. In our laboratory, using these speech technologies, we are developing frameworks that can aid humans to realize high-quality speech communication with other humans or with machines. By acquiring various kinds of knowlege such as acoustic phonetics, cognitive science, linguistics, and brain sciences as well as speech technologies, we are trying to improve the QoL of individuals who are communicating orally with others.

Research field 1

Monitoring listening behaviors and its application to realize high-quality speech communication

When incomming speech is of a foreign language or acoustically contaminated, listeners often have troubles in listening. In this project, we're proposing techniques to monitor listening disfluency without using any brain sensing techniques. With these techniques, for example, we developed a method that can enhance language learners' listening skills drastically. From 2023, our method is introduced into English education in School of Engineering, the University of Tokyo.
Research field 2

Monitoring speaking behaviors and its application to realize high-quality speech communication

Have you ever had communication troubles when you talk to others? For example, when you talk in a foreign language or in a noisy condition. In this project, we're proposing techniques to monitor speaking behaviors and to compare them with ideal behaviors, which can help indivduals to speak in a more intelligible way. With these techniques, for example, we developed a method that can enhance language learners' speaking skills drastically. From 2023, our method is introduced into English education in School of Engineering, the University of Tokyo.
Research field 3

Analysis and modeling of international speech communication with World Englishes

In international contexts, everybody communicates in English, but speakers' pronunciation is diverse due to non-native accents and/or regional accents. Further, listeners' listening skills are also diverse depending on their language backgrounds. We can say that international speech communication is a place where speaker diversity meets with listener diversity. In this project, we are analyzing and modeling the diveristy x diversity in international speech communication, and develping a technical framework for individuals to survive the diveristy x diversity.
Research field 4

Acoustic embodiment of information on speech such as speech synthesis, emotion synthesis, voice convertion, accent convertion, etc

We transmit various types of information by encoding them in speech acoustics, and we can extract those types of information by decoding them from speech acoustics. In this project, we aim at realizing this "encoding" ability on computers. So far, high-performance speech synthesis, emotion sysnthesis, voice conversion, accent convertion, etc have been realized and introduced into practice. What is still missing? By finding out what is not realized tehcnically, we aim at implementing missing "encoding" ability.
Research field 5

Information extraction from speech acoustics such as speech recognition, speaker recognition, emotion recognition, speech assessment, etc

Real-time Computing: Low latency and advanced computation capabilities are essential in fields that require real-time processing. Accordingly, we are considering next-generation communication standards, such as B5G and 6G, to achieve low latency and computation speed comparable to that of high-performance GPU servers, even with weak devices, and we are driving a research project on edge computing that utilizes post-5G networks. Time-series Data Analysis: Sophisticated cyber-physical systems require high-performance inference mechanisms. In our research lab, we specialize in analyzing time-series data, such as sensor values and logs that are continuously generated by sensors and devices. More specifically, we are developing analysis techniques for predicting faults in wind turbines and solar panels, real-time pluvial flood prediction based on knowledge distillation, and analyzing sales activity logs. Digital Twin: Digital Twin is a technology that collects information from the real world and reproduces the real-world environment in cyberspace based on the collected information. It is an indispensable technology for efficiently monitoring and simulating the real environment. Currently, we are promoting research and development of environmental measurement technology and data interpolation technology for radio environment Digital Twin and advancing its practical application to radio wave environment assessment.
Back to the list