コード
原題: Code
分析結果
- カテゴリ
- AI
- 重要度
- 60
- トレンドスコア
- 24
- 要約
- コードとは、情報(文字、単語、ジェスチャーなど)を別の形や表現に変換するためのルールのシステムです。時には短縮されることもあります。
- キーワード
Code — Grokipedia Fact-checked by Grok 3 months ago Code Ara Eve Leo Sal 1x A code is a system of rules to convert information —such as a letter, word, or gesture —into another form or representation, sometimes shortened or secret, for communication through a channel or storage in a medium. Codes are used in diverse fields, including computing , where they manifest as programming instructions; cryptography , for secure messaging; biology , such as the genetic code ; and mathematics , like Gödel numbering . This article explores the concept of code across these and other domains. In computing, code often refers to instructions written in a programming language to direct computers in performing tasks, from simple calculations to complex applications. Its development began in the mid-20th century with machine code and assembly language, evolving to high-level languages like Fortran in 1957. [1] Code operates at various abstraction levels, processed by compilers or interpreters, and encompasses paradigms such as procedural, object-oriented, and functional programming. As of 2025, programming code underpins digital technologies, with over 8,000 documented languages, though Python, JavaScript , and SQL remain dominant due to their versatility. [2] Challenges in maintainability, security, and AI-generated code continue to shape the field. Definitions and Fundamentals General Definition In information theory and coding theory , a code is defined as a systematic mapping from a source alphabet of symbols or sequences to a target alphabet of code symbols, facilitating the representation, transmission, or storage of information . [3] This mapping transforms data from its original form into a more suitable format for specific purposes, such as compression or reliable communication, while preserving the essential content. [4] Key properties of a code include injectivity , ensuring that distinct source symbols map to distinct codewords to enable unambiguous decoding; uniqueness or unique decodability, which guarantees that every possible encoded sequence corresponds to at most one source sequence; and completeness , meaning the code assigns a codeword to every symbol in the source alphabet. [5] Mathematically, a code C C C can be expressed as a function C : A → B ∗ C: A \to B^* C : A → B ∗ , where A A A denotes the finite source alphabet, B B B is the finite code alphabet, and B ∗ B^* B ∗ represents the Kleene star of all finite-length strings over B B B . [3] The code itself constitutes the static set of rules defining these mappings, distinct from the processes of encoding (applying the code to input data ) and decoding (reversing the process to recover the original information ). [6] A simple example is the International Morse code, a substitution code that assigns unique sequences of dots and dashes to letters and numerals, such as "A" mapped to ".-" and "B" to "-...". [7] Historical Development The concept of codes traces its origins to ancient civilizations, where systematic signaling and writing systems facilitated communication and record-keeping. In antiquity, semaphore-like systems emerged for long-distance signaling; for instance, the Greek historian Polybius described a torch-based method in the 2nd century BCE, utilizing a 5x5 grid known as the Polybius square to encode Greek letters by coordinating the number of torches lit on hilltops. [8] Similarly, early writing systems like cuneiform , developed by the Sumerians in Mesopotamia around 3500 BCE, represented one of the first codified scripts, using wedge-shaped impressions on clay tablets to encode language, numerals, and administrative data. [9] These innovations laid foundational principles for encoding information efficiently across distances or media. [10] The 19th century marked a pivotal shift toward electrical communication, driven by the invention of telegraph codes. In 1837, Samuel F. B. Morse developed Morse code , a system of dots and dashes for transmitting messages over telegraph wires, which revolutionized rapid long-distance communication by assigning variable-length symbols to letters based on frequency of use. [11] This era's advancements set the stage for 20th-century theoretical formalization, particularly through Claude Shannon's seminal work. In his 1948 paper " A Mathematical Theory of Communication ," Shannon introduced information theory , defining coding as a process to minimize entropy —measuring uncertainty in message sources—and achieve efficient transmission rates close to channel capacity while combating noise . [12] Shannon's framework quantified code efficiency as the ratio of transmitted information to channel capacity, influencing all subsequent coding practices. [12] Post-World War II developments accelerated the practical application of codes in computing and reliability. Error-correcting codes gained prominence as electronic systems proliferated, with Richard Hamming at Bell Labs developing the first such codes in response to computational errors in early machines; his 1950 paper outlined systematic binary codes capable of detecting and correcting single-bit errors through parity checks. [13] In parallel, coding evolved in computing from mechanical punch cards—pioneered by Herman Hollerith in the 1890s for the U.S. Census, where holes encoded demographic data on stiff cards for tabulating machines—to fully electronic systems like the ENIAC , completed in 1945, which used ring counters to encode decimal digits electronically for arithmetic operations. [14] [15] Hamming codes were formalized in 1950, providing a blueprint for reliable data storage and transmission in emerging digital technologies. [13] By the mid-20th century, the concept extended to biological systems, with the genetic code's deciphering in 1961 by Marshall Nirenberg and Heinrich Matthaei, who identified RNA triplets encoding amino acids through in vitro experiments. [16] Coding Theory Variable-Length Codes Variable-length codes are a class of source codes in which symbols from an alphabet are assigned codewords of varying lengths, typically shorter sequences to more probable symbols, to achieve efficient representation of data and minimize the average number of bits required per symbol. This approach contrasts with fixed-length codes by exploiting symbol probabilities to reduce redundancy, making it particularly useful for lossless data compression where the goal is to approximate the entropy of the source. The seminal work on such codes was introduced by David A. Huffman in his 1952 paper, which formalized a method for constructing optimal prefix codes that satisfy the prefix condition—no codeword is a prefix of another—to enable instantaneous decoding without ambiguity. [17] A fundamental concept underpinning variable-length codes is the prefix condition, which ensures unique decodability. This condition is characterized by the Kraft inequality, stating that for a binary prefix code with codeword lengths $ l_1, l_2, \dots, l_m $ for $ m $ symbols , the sum satisfies ∑ i = 1 m 2 − l i ≤ 1. \sum_{i=1}^m 2^{-l_i} \leq 1. i = 1 ∑ m 2 − l i ≤ 1. The inequality provides a necessary and sufficient condition for the existence of such a code, as proven in L. G. Kraft's 1949 master's thesis, and later extended by B. McMillan in 1956 to uniquely decodable codes. For a given set of symbol probabilities, the optimal variable-length code minimizes the expected code length $ \sum p_i l_i $, approaching the source entropy $ H = -\sum p_i \log_2 p_i $ as a lower bound, with Huffman codes achieving this within 1 bit per symbol on average. [18] The Huffman algorithm constructs an optimal variable-length prefix code through a bottom-up tree-building process. It begins by creating a list of symbols sorted by their probabilities $ p_i $, then repeatedly merges the two lowest-probability nodes into a parent node with combined probability, assigning 0 and 1 to the branches, until a single root remains; the codewords are the binary paths from root to leaves. This greedy method ensures the resulting code satisfies the Kraft inequality with equality for optimal lengths and is widely adopted due to its simplicity and optimality for discrete memoryless sources. [17] In applications, variable-length codes, particularly Huffman variants, form the basis of data compression in file formats like ZIP, where the DEFLATE algorithm combines Huffman coding with LZ77 dictionary methods to encode literals and distances efficiently, achieving significant reductions in file sizes for text and binary data. The primary advantage is the minimization of average code length for given probabilities, enabling up to 20-30% better compression ratios than fixed-length schemes for typical data distributions, though it requires knowing symbol frequencies in advance or adaptively updating them. These codes relate to source coding theorems, providing a practical means to approach the theoretical limits of lossless compression . [19] Error-Correcting Codes Error-correcting codes are a class of codes that introduce redundancy into data to detect and correct errors introduced during transmission or storage. By embedding additional check symbols, these codes enable receivers to identify and repair corrupted bits or symbols without requiring retransmission. This reliability is crucial in noisy channels, such as wireless communications or data storage media. [20] The foundational metric for error-correcting codes is the Hamming distance d d d , defined as the minimum number of symbol differences between any two distinct codewords in the code. This distance determines the code's error resilience: a code with minimum distance d d d can detect up to d − 1 d-1 d − 1 errors and correct up to t = ⌊ ( d − 1 ) / 2 ⌋ t = \lfloor (d-1)/2 \rfloor t = ⌊( d − 1 ) /2 ⌋ errors, as errors within this bound uniquely map back to the nearest codeword via nearest-neighbor decoding. [20] [21] Linear error-correcting codes, a promine