• Title/Summary/Keyword: Language of vision

Search Result 181, Processing Time 0.028 seconds

Robot vision interface (로보트와 Vision System Interface)

  • 김선일;여인택;박찬웅
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1987.10b
    • /
    • pp.101-104
    • /
    • 1987
  • This paper shows the robot-vision system which consists of robot, vision system, single board computer and IBM-PC. IBM-PC based system has a great flexibility in expansion for a vision system interfacing. Easy human interfacing and great calculation ability are the benefits of this system. It was carried to interface between each component. The calibration between two coordinate systems is studied. The robot language for robot-vision system was written in "C" language. User also can write job program in "C" language in which the robot and vision related functions reside in the library.side in the library.

  • PDF

PC를 기반으로한 VISION 기능을 갖는 ROBOT 제어기의 개발

  • 서일홍;김재현;정중기;노병옥
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1994.10a
    • /
    • pp.537-542
    • /
    • 1994
  • 본 연구에서는 최근 널리 알려져 있는 vision function들을 Robot controller에 맞게 수정하고, 또한 각종 windows에 익숙한 사용자들이 보다 쉽게 로보트를 다룰 수 있도록 하기 위해, 이러한 기본 기능들을 X-windows 상에서 vision language로 구현하였다. 기존의 기본 motion language, I/O language와 함께 Vision Langeage를 구현함으로써 Vison based Intelligent Robot system을 구축하였다.

  • PDF

A Study on the Web Building Assistant System Using GUI Object Detection and Large Language Model (웹 구축 보조 시스템에 대한 GUI 객체 감지 및 대규모 언어 모델 활용 연구)

  • Hyun-Cheol Jang;Hyungkuk Jang
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.830-833
    • /
    • 2024
  • As Large Language Models (LLM) like OpenAI's ChatGPT[1] continue to grow in popularity, new applications and services are expected to emerge. This paper introduces an experimental study on a smart web-builder application assistance system that combines Computer Vision with GUI object recognition and the ChatGPT (LLM). First of all, the research strategy employed computer vision technology in conjunction with Microsoft's "ChatGPT for Robotics: Design Principles and Model Abilities"[2] design strategy. Additionally, this research explores the capabilities of Large Language Model like ChatGPT in various application design tasks, specifically in assisting with web-builder tasks. The study examines the ability of ChatGPT to synthesize code through both directed prompts and free-form conversation strategies. The researchers also explored ChatGPT's ability to perform various tasks within the builder domain, including functions and closure loop inferences, basic logical and mathematical reasoning. Overall, this research proposes an efficient way to perform various application system tasks by combining natural language commands with computer vision technology and LLM (ChatGPT). This approach allows for user interaction through natural language commands while building applications.

Text-To-Vision Player - Converting Text to Vision Based on TVML Technology -

  • Hayashi, Masaki
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.799-802
    • /
    • 2009
  • We have been studying the next generation of video creation solution based on TVML (TV program Making Language) technology. TVML is a well-known scripting language for computer animation and a TVML Player interprets the script to create video content using real-time 3DCG and synthesized voices. TVML has a long history proposed back in 1996 by NHK, however, the only available Player has been the one made by NHK for years. We have developed a new TVML Player from scratch and named it T2V (Text-To-Vision) Player. Due to the development from scratch, the code is compact, light and fast, and extendable and portable. Moreover, the new T2V Player performs not only a playback of TVML script but also a Text-To-Vision conversion from input written in XML format or just a mere plane text to videos by using 'Text-filter' that can be added as a plug-in of the Player. We plan to make it public as freeware from early 2009 in order to stimulate User-Generated-Content and a various kinds of services running on the Internet and media industry. We think that our T2V Player would be a key technology for upcoming new movement.

  • PDF

Robot Vision to Audio Description Based on Deep Learning for Effective Human-Robot Interaction (효과적인 인간-로봇 상호작용을 위한 딥러닝 기반 로봇 비전 자연어 설명문 생성 및 발화 기술)

  • Park, Dongkeon;Kang, Kyeong-Min;Bae, Jin-Woo;Han, Ji-Hyeong
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.1
    • /
    • pp.22-30
    • /
    • 2019
  • For effective human-robot interaction, robots need to understand the current situation context well, but also the robots need to transfer its understanding to the human participant in efficient way. The most convenient way to deliver robot's understanding to the human participant is that the robot expresses its understanding using voice and natural language. Recently, the artificial intelligence for video understanding and natural language process has been developed very rapidly especially based on deep learning. Thus, this paper proposes robot vision to audio description method using deep learning. The applied deep learning model is a pipeline of two deep learning models for generating natural language sentence from robot vision and generating voice from the generated natural language sentence. Also, we conduct the real robot experiment to show the effectiveness of our method in human-robot interaction.

Domain Adaptive Fruit Detection Method based on a Vision-Language Model for Harvest Automation (작물 수확 자동화를 위한 시각 언어 모델 기반의 환경적응형 과수 검출 기술)

  • Changwoo Nam;Jimin Song;Yongsik Jin;Sang Jun Lee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.19 no.2
    • /
    • pp.73-81
    • /
    • 2024
  • Recently, mobile manipulators have been utilized in agriculture industry for weed removal and harvest automation. This paper proposes a domain adaptive fruit detection method for harvest automation, by utilizing OWL-ViT model which is an open-vocabulary object detection model. The vision-language model can detect objects based on text prompt, and therefore, it can be extended to detect objects of undefined categories. In the development of deep learning models for real-world problems, constructing a large-scale labeled dataset is a time-consuming task and heavily relies on human effort. To reduce the labor-intensive workload, we utilized a large-scale public dataset as a source domain data and employed a domain adaptation method. Adversarial learning was conducted between a domain discriminator and feature extractor to reduce the gap between the distribution of feature vectors from the source domain and our target domain data. We collected a target domain dataset in a real-like environment and conducted experiments to demonstrate the effectiveness of the proposed method. In experiments, the domain adaptation method improved the AP50 metric from 38.88% to 78.59% for detecting objects within the range of 2m, and we achieved 81.7% of manipulation success rate.

James Joyce's 'The Dead' Revisited

  • Kim, Donguk
    • Journal of English Language & Literature
    • /
    • v.55 no.3
    • /
    • pp.429-440
    • /
    • 2009
  • This paper does not follow the well-known critical practice of Charles Perke, Edward Brandabur and Phillip Herring who, regarding The Dead, James Joyce s earliest masterpiece, as the conclusion of Dubliners, classify Gabriel as one of the dead. Instead it concurs with such critics as William York Tindal, Kenneth Burke and Allen Tate who, interpreting The Dead as a story of Gabriel s spiritual maturation, discuss the famous snow vision at the end of the story as a signifier of his rebirth experience. A new reading of The Dead, which is the aim of this paper, examines the very processes which produce both form and content, thereby demonstrating that The Dead is a story of Gabriel s spiritual growth and that the supreme snow vision is prepared for his spirit which progresses towards a richer synthesis of life and death, a higher altitude of flight and wider horizons.

Vision- Based Finger Spelling Recognition for Korean Sign Language

  • Park Jun;Lee Dae-hyun
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.6
    • /
    • pp.768-775
    • /
    • 2005
  • For sign languages are main communication means among hearing-impaired people, there are communication difficulties between speaking-oriented people and sign-language-oriented people. Automated sign-language recognition may resolve these communication problems. In sign languages, finger spelling is used to spell names and words that are not listed in the dictionary. There have been research activities for gesture and posture recognition using glove-based devices. However, these devices are often expensive, cumbersome, and inadequate for recognizing elaborate finger spelling. Use of colored patches or gloves also cause uneasiness. In this paper, a vision-based finger spelling recognition system is introduced. In our method, captured hand region images were separated from the background using a skin detection algorithm assuming that there are no skin-colored objects in the background. Then, hand postures were recognized using a two-dimensional grid analysis method. Our recognition system is not sensitive to the size or the rotation of the input posture images. By optimizing the weights of the posture features using a genetic algorithm, our system achieved high accuracy that matches other systems using devices or colored gloves. We applied our posture recognition system for detecting Korean Sign Language, achieving better than $93\%$ accuracy.

  • PDF

Hybrid Learning for Vision-and-Language Navigation Agents (시각-언어 이동 에이전트를 위한 복합 학습)

  • Oh, Suntaek;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.281-290
    • /
    • 2020
  • The Vision-and-Language Navigation(VLN) task is a complex intelligence problem that requires both visual and language comprehension skills. In this paper, we propose a new learning model for visual-language navigation agents. The model adopts a hybrid learning that combines imitation learning based on demo data and reinforcement learning based on action reward. Therefore, this model can meet both problems of imitation learning that can be biased to the demo data and reinforcement learning with relatively low data efficiency. In addition, the proposed model uses a novel path-based reward function designed to solve the problem of existing goal-based reward functions. In this paper, we demonstrate the high performance of the proposed model through various experiments using both Matterport3D simulation environment and R2R benchmark dataset.