Sight is a novel wearable visual-sonification device that gives the user a distinctive “seeing by ear” perceptual experience by converting visual information captured from a high resolution camera into a synthesized audio stream using computer vision and machine learning techniques. It encloses real-world sensing and analysis technologies with machine learning, and an information presentation technique with real-time sound synthesis.
The developers of Sight are verifying the practicability of the system by user testing. Currently, the users can distinguish simple object shapes such as bricks, and can grasp the structure of surrounding space consists of walls, pillars and a ceiling. This system will enable the users to distinguish people and a range of common objects, and become more practicable in daily scenes.
Sight could expand the spatial perception of human, based on the knowledge of neuroscience and machine learning. It is expected to be used in not only the assisting applications, but also entertainment and human augmentation applications.
Sight provides the users with a novel experience of “seeing by ear”. The current verion of Sight extracts the visual features—what you are looking at and where they are located—from a scene using cameras. The users can hear the corresponding sound constructed from the information. Sight is now still under the development and is being refined to be more practicable system.