Search | Korea Science

Design and Implementation Automatic Character Set Encoding Recognition Method for Document File (문서 파일의 문자 인코딩 자동 인식 기법의 설계 및 구현)

Seo, Min-Ji;Kim, Myung-Ho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2015.10a
- /
- pp.95-98
- /
- 2015
문자 인코딩은 컴퓨터에 저장하거나 네트워크상에서 전송하기 위해 문서를 이진화 하는 방법이다. 문자 인코딩은 고유의 문자 코드 테이블을 이용하여 문서를 이진화 하기 때문에, 문서에 적용된 문자 인코딩과 다른 문자 인코딩을 이용하여 디코딩 하면 원본과 다른 문서가 출력되어 문서를 읽을 수 없게 된다. 따라서 문서를 읽기 위해서는 문서에 적용된 문자 인코딩을 알아내야 한다. 본 논문에서는 문서의 문자 인코딩을 자동으로 판별하는 방법을 제시한다. 제안하는 방법은 이스케이프 문자를 이용한 판별법, 문서에 나타난 코드 값 범위 판별법, 문서에 나타난 코드 값의 특징 판별법, 단어 데이터베이스를 이용한 판별법과 같은 여러 단계를 걸쳐 문서에 적용된 문자 인코딩을 판별한다. 제안하는 방법은 문서를 언어별로 분류하여 문자 인코딩을 판별하기 때문에, 높은 문자 인코딩 인식률을 보인다.
https://doi.org/10.3745/PKIPS.y2015m10a.95 인용 PDF

Encoding and language detection of text document using Deep learning algorithm (딥러닝 알고리즘을 이용한 문서의 인코딩 및 언어 판별)

Kim, Seonbeom;Bae, Junwoo;Park, Heejin
- The Journal of Korean Institute of Next Generation Computing
- /
- v.13 no.5
- /
- pp.124-130
- /
- 2017
Character encoding is the method used to represent characters or symbols on a computer, and there are many encoding detection software tools. For the widely used encoding detection software"uchardet", the accuracy of encoding detection of unmodified normal text document is 91.39%, but the accuracy of language detection is only 32.09%. Also, if a text document is encrypted by substitution, the accuracy of encoding detection is 3.55% and the accuracy of language detection is 0.06%. Therefore, in this paper, we propose encoding and language detection of text document using the deep learning algorithm called LSTM(Long Short-Term Memory). The results of LSTM are better than encoding detection software"uchardet". The accuracy of encoding detection of normal text document using the LSTM is 99.89% and the accuracy of language detection is 99.92%. Also, if a text document is encrypted by substitution, the accuracy of encoding detection is 99.26%, the accuracy of language detection is 99.77%.

Improvement of Encoding Detection Algorithm for Multi-byte Encoded Data with Errors (오류가 발생한 멀티바이트 인코딩 데이터의 인코딩 기법 판별 알고리즘 개선)

Bae, Junwoo;Kim, Seonbeom;Park, Heejin
- The Journal of Korean Institute of Next Generation Computing
- /
- v.13 no.2
- /
- pp.18-25
- /
- 2017
In computer science, an encoding is a standardization of converting information to one format for audio, video or text. Therefore, the encoding information of the data should be known to open and read it and there are algorithms detecting encoder of the data. However, some informations of data could be disappeared by packet loss when transmitted on network, especially, if the data is snatched by packet sniffing or eavesdropping from wireless communications. In this paper, we improve the performance of encoding detection algorithm of 'uchardet' program for multi-byte encoded data with errors based on bit-shift algorithm. To simulate the performance, we generated Korean and Japanese text data with errors that is removed some random bits at random positions. Then the detection algorithm are tested using the data and 'uchardet-bitshift' showed better performance than 'uchardet'. When Korean texts are used, 'uchardet' could detect perfectly with ≤0.005% errors but it showed 0% detection rate with ≥1% errors while 'uchardet-bitshift' detected perfectly with ≤0.05% errors and it showed correct detection cases with ≥1% errors. Japanese texts with errors tend to report falsely as Chinese encoding because Japanese texts include lots of Chinese characters. As a results, we improved encoding detection algorithms by applying bit shift operation.

Distributed Encoding Scheme for N-Screen Service in Cloud Computing (클라우드 컴퓨팅에서 N-스크린 서비스를 위한 분산 인코딩 기법)

Lim, Heon-Yong;Kim, Chang-Hyeon;Lee, Won-Joo;Jeon, Chang-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06a
- /
- pp.16-17
- /
- 2012
본 논문에서는 클라우드 컴퓨팅 환경에서 N-스크린 서비스를 위한 동영상 콘텐츠의 분산 인코딩 기법을 제안한다. 이 기법은 Hadoop에 기반하여 인코딩 작업을 여러 가상머신에서 분산 실행하는데 각 가상머신의 작업량을 가상머신의 성능에 따라 다르게 할당한다. 성능에 따른 차등할당으로 가상머신의 유휴(idle)시간을 최소화하여 총인코딩시간을 단축시키고, 자원 활용도를 높일 수도 있다. 실험을 통하여 제안한 인코딩 기법이 균등분할 방식보다 짧은 시간에 인코딩을 완료함을 보인다. N-스크린 서비스는 같은 동영상을 다양한 디바이스 특성에 맞추어 여러 가지 해상도로 스트리밍 해야 하기 때문에 인코딩 소요시간을 단축함으로써 서비스의 성능 향상을 기대할 수 있다.

Efficient Video Signal Processing Method on Dual Processor of RISC and DSP (RISC와 DSP의 듀얼 프로세서에서의 효율적인 비디오 신호 처리 방법)

김범호;마평수
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10c
- /
- pp.676-678
- /
- 2003
최근에 2.5G나 3G 이동 단말 장치를 위한 프로세서로, 다양한 멀티미디어가 가미된 응용구현이 가능하도록 RISC 프로세서와 DSP를 포함하는 단일 칩 프로세서 기술이 등장하고 있다. 이에 따라 듀얼 프로세서 구조에서 비디오 인코딩/디코딩의 처리 속도를 향상시키기 위안 비디오의 인코더/디코더 구조를 제안한다. 기존의 연구에서는 비디오의 인코딩/디코딩의 전 과정을 DSP가 담당하도록 설계하였으나 많은 비트 연산이 필요한 부분에서는 RISC 칩보다 효율성이 낮게 된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 비디오 신호 처리의 인코딩/디코딩을 구성하는 모듈들을 DSP와 RISC의 특성에 맞도록 분리해 수행시킴으로써 효율성을 높이고자 한다.
PDF

A Method for Automatic Detection of Character Encoding of Multi Language Document File (다중 언어로 작성된 문서 파일에 적용된 문자 인코딩 자동 인식 기법)

Seo, Min Ji;Kim, Myung Ho
- KIISE Transactions on Computing Practices
- /
- v.22 no.4
- /
- pp.170-177
- /
- 2016
Character encoding is a method for changing a document to a binary document file using the code table for storage in a computer. When people decode a binary document file in a computer to be read, they must know the code table applied to the file at the encoding stage in order to get the original document. Identifying the code table used for encoding the file is thus an essential part of decoding. In this paper, we propose a method for detecting the character code of the given binary document file automatically. The method uses many techniques to increase the detection rate, such as a character code range detection, escape character detection, character code characteristic detection, and commonly used word detection. The commonly used word detection method uses multiple word database, which means this method can achieve a much higher detection rate for multi-language files as compared with other methods. If the proportion of language is 20% less than in the document, the conventional method has about 50% encoding recognition. In the case of the proposed method, regardless of the proportion of language, there is up to 96% encoding recognition.
https://doi.org/10.5626/KTCP.2016.22.4.170 인용 PDF KSCI

Design of degree distribution of distributed LT codes using distcrete Fourier transform (Discrete Fourier transform을 이용한 distributed LT codes의 degree distribution 설계)

Suh, Young-Kil;Heo, Jun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2010.07a
- /
- pp.112-115
- /
- 2010
본 논문은 그림 1과 같은 네트워크 환경에서 두 송신단이 LT code를 오류정정 부호로 사용할 때, 두 송신단이 생성하는 인코딩 심볼과 수신단이 수신하는 인코딩 심볼들의 degree distribution의 관계에 대해 다룬다. LT code를 복호하기 위해 belief propagation 방법을 사용했을 때, 수신단이 받은 인코딩 심볼들의 degree distribution은 robust soliton distribution(RSD)을 따를 때, overhead 대비 가장 높은 확률로 복호에 성공한다. 하지만 그림 1과 같은 네트워크 환경에서, 두 송신단 모두 RSD에 따라 인코딩 심볼을 생성하여 송신하면, 수신단에서 수신한 심볼은 RSD를 따르지 않는다. 본 논문은 한 송신단($S_1$) 이 생성하는 인코딩 심볼의 degree distribution을 알 때, 수신단에서의 인코딩 심볼의 분포가 근사적으로 RSD를 따르도록 하는 또 다른 송신단($S_2$)에서의 degree distribution을 구하는 방법을 제시한다.
PDF

A New SAT Encoding for Solving Sudoku (수도쿠 풀이를 위한 새로운 SAT 인코딩)

Park, Jun-Kil;Choi, Jin-Young
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.06b
- /
- pp.487-492
- /
- 2007
수도쿠를 푸는 것은 오락으로서 뿐 아니라 컴퓨터 계산 문제로서도 흥미롭다. 수도쿠는 minimal과 extended 인코딩을 통해 SAT로 변환되고, 탐색이 아닌 추론기술의 반복 적용을 통해 다항시간에 해를 찾을 수 있다. minimal과 extended 인코딩은 직관적이지만 고차 수도쿠($16\times16$ 이상)를 풀기에 충분하지 못하다. 이 논문에서는 extended 인코딩을 개선한 블록 인코딩을 제안한다. 블록 인코딩을 $16\times16$와 $25\times25$ 퍼즐 집합에 적용 했을 때 extended 인코딩에 비해 추론기술에 따라 1%에서 12% 더 많은 수의 퍼즐을 푸는 것을 실험을 통하여 보인다.
PDF

A New Rate Control Scheme for H.264/AVC Video Using Pseudo Encoding Model (근사 인코딩 기법을 이용한 H.264/AVC 비트율 제어 알고리즘)

Lee, Rok-Kyu;Jeon, Gwang-Gil;Jeong, Je-Chang
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2008.11a
- /
- pp.139-142
- /
- 2008
본 논문에서는 근사 인코딩 기법을 이용한 H.264/AVC 비디오 코덱에서의 비트율 제어 알고리즘을 제안한다. H.264는 기존의 동영상 압축 표준보다 월등한 압축 성능을 나타내지만, 구조적 복잡성으로 인해 비트율 제어 측면에서는 과거에 제안된 H.264를 위한 비트율 제어 알고리즘들의 성능은 기대에 미치지 못하였다. 제안된 알고리즘은 근사 인코딩 기법을 사용하여 실제 H.264 인코딩이 이루어지기 이전에 향후 발생될 인코딩 비트를 미리 예측할 수 있고, 비트율 제어에서 매우 높은 중요성을 차지하는 프레임의 복잡도 예측에서 우수한 성능을 나타낸다. 알고리즘의 연산량 측면에서도 제안된 근사 인코딩 기법은 간단한 구조로 이루어져 있어 장점을 나타낸다. 본 논문에서는 DCT 영역에서의 각 프레임의 zero의 개수를 분석하여 얻어낸 영상의 특성을 비트율 제어에 활용한다. 실험결과는 제안된 알고리즘이 H.264 레퍼런스 소프트웨어의 가장 최신 버전인 JM12.2 환경에서 기존의 알고리즘에 비해 우수한 성능을 나타낸다는 것을 알 수 있다.
PDF

Design and Evaluation of Cache Structure for Semi-packed Instruction (부분 압축 명령어를 위한 캐쉬 구조의 설계 및 평가)

Hong, Won-Gi;Lee, Seung-Yeop;Kim, Sin-Deok
- Journal of KIISE:Computer Systems and Theory
- /
- v.28 no.5
- /
- pp.245-258
- /
- 2001
VLIW에서는 프로그램 코드를 병렬화 하는 작업이 모두 컴파일러에 의해서만 이루어진다. 따라서 병렬로 수행될 연산어들을 명시적으로 나타내 주어야 하며, 이를 위한 명령어 인코딩 방식으로 전개 인코딩 방식과 압축 인코딩 방식이 사용되어 왔다. 각 인코딩 방식들은 명령어의 적재 및 검색을 위해 서로 다른 캐쉬 구조를 필요로 하는데, 전개 인코딩 방식으로 비압축 캐쉬를 압축 인코딩 방식으로 압축 캐쉬를 사용하고 있다. 그러나 이들은 각각 무효 연산어로 인한 메모리 활용 효율 저하와 복원 과정으로 인한 명령어 인출 오버헤드의 증가라는 문제점을 안고 있다. 본 논문에서는 부분적으로 명령어 길이를 일정하게 유지하는 부분 압축 인코딩을 사용해 메모리 활용 효율을 높이는 동시에 명령어 인출 오버헤드를 줄일 수 있는 분할 캐쉬 구조를 제안한다. 각 캐쉬 구조를 구현하는데 필요한 칩 영역을 계산하여, 분할 캐쉬가 비교적 비용 효율적인 캐쉬 구조임을 확인하였다. 모의 실험을 통한 메모리 활용 효율 측정 결과 하드웨어 비용의 증가를 고려하더라도 분할 캐쉬는 비압축 캐쉬에 비해 최고 약 3배의 메모리 활용 효율을 얻을 수 있었다. 각 캐쉬 구조를 일차 캐쉬로 하는 VLIW 시스템들의 성능 측정 결과는 TCSC(블록 집중형 분할 캐쉬)를 사용한 시스템이 비용 대비 성능 면에서 가장 우수한 것으로 나타났다.
PDF

Search Result 745, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)