Global Trend Radar
Web: arstechnica.com US web_search 2026-05-06 13:19

AI大規模言語モデルの仕組みをわかりやすく説明

原題: A jargon-free explanation of how AI large language models work

元記事を開く →

分析結果

カテゴリ
AI
重要度
72
トレンドスコア
36
要約
AIの大規模言語モデルは、膨大なテキストデータを学習し、言語のパターンを理解することで機能します。これにより、文脈に応じた適切な応答を生成することが可能です。モデルは、単語やフレーズの関係を学び、次に来る単語を予測する能力を持っています。これにより、自然な会話や文章生成が実現されます。
キーワード
A jargon-free explanation of how AI large language models work - Ars Technica Skip to content Text settings Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only Learn more Minimize to nav When ChatGPT was introduced last fall, it sent shockwaves through the technology industry and the larger world. Machine learning researchers had been experimenting with large language models (LLMs) for a few years by that point, but the general public had not been paying close attention and didn’t realize how powerful they had become. Today, almost everyone has heard about LLMs, and tens of millions of people have tried them out. But not very many people understand how they work. If you know anything about this subject, you’ve probably heard that LLMs are trained to “predict the next word” and that they require huge amounts of text to do this. But that tends to be where the explanation stops. The details of how they predict the next word is often treated as a deep mystery. One reason for this is the unusual way these systems were developed. Conventional software is created by human programmers, who give computers explicit, step-by-step instructions. By contrast, ChatGPT is built on a neural network that was trained using billions of words of ordinary language. As a result, no one on Earth fully understands the inner workings of LLMs. Researchers are working to gain a better understanding, but this is a slow process that will take years—perhaps decades—to complete. Still, there’s a lot that experts do understand about how these systems work. The goal of this article is to make a lot of this knowledge accessible to a broad audience. We’ll aim to explain what’s known about the inner workings of these models without resorting to technical jargon or advanced math. We’ll start by explaining word vectors, the surprising way language models represent and reason about language. Then we’ll dive deep into the transformer, the basic building block for systems like ChatGPT. Finally, we’ll explain how these models are trained and explore why good performance requires such phenomenally large quantities of data. Word vectors To understand how language models work, you first need to understand how they represent words. Humans represent English words with a sequence of letters, like C-A-T for “cat.” Language models use a long list of numbers called a “word vector.” For example, here’s one way to represent cat as a vector: [0.0074, 0.0030, -0.0105, 0.0742, 0.0765, -0.0011, 0.0265, 0.0106, 0.0191, 0.0038, -0.0468, -0.0212, 0.0091, 0.0030, -0.0563, -0.0396, -0.0998, -0.0796, …, 0.0002] (The full vector is 300 numbers long—to see it all, click here and then click “show the raw vector.”) Why use such a baroque notation? Here’s an analogy. Washington, DC, is located at 38.9 degrees north and 77 degrees west. We can represent this using a vector notation: Washington, DC, is at [38.9, 77] New York is at [40.7, 74] London is at [51.5, 0.1] Paris is at [48.9, -2.4] This is useful for reasoning about spatial relationships. You can tell New York is close to Washington, DC, because 38.9 is close to 40.7 and 77 is close to 74. By the same token, Paris is close to London. But Paris is far from Washington, DC. Language models take a similar approach: Each word vector represents a point in an imaginary “word space,” and words with more similar meanings are placed closer together (technically, LLMs operate on fragments of words called tokens, but we’ll ignore this implementation detail to keep this article a manageable length). For example, the words closest to cat in vector space include dog, kitten, and pet. A key advantage of representing words with vectors of real numbers (as opposed to a string of letters, like C-A-T) is that numbers enable operations that letters don’t. Words are too complex to represent in only two dimensions, so language models use vector spaces with hundreds or even thousands of dimensions. The human mind can’t envision a space with that many dimensions, but computers are perfectly capable of reasoning about them and producing useful results. Researchers have been experimenting with word vectors for decades, but the concept really took off when Google announced its word2vec project in 2013. Google analyzed millions of documents harvested from Google News to figure out which words tend to appear in similar sentences. Over time, a neural network trained to predict which words co-occur with other words learned to place similar words (like dog and cat) close together in vector space. Google’s word vectors had another intriguing property: You could “reason” about words using vector arithmetic. For example, Google researchers took the vector for “biggest,” subtracted “big,” and added “small.” The word closest to the resulting vector was “smallest.” Credit: Sean Trott Credit: Sean Trott You can use vector arithmetic to draw analogies! In this case, big is to biggest as small is to smallest. Google’s word vectors captured a lot of other relationships: Swiss is to Switzerland as Cambodian is to Cambodia (nationalities) Paris is to France as Berlin is to Germany (capitals) Unethical is to ethical as possibly is to impossibly (opposites) Mouse is to mice as dollar is to dollars (plurals) Man is to woman as king is to queen (gender roles) Because these vectors are built from the way humans use words, they end up reflecting many of the biases that are present in human language . For example, in some word vector models, “doctor minus man plus woman” yields “nurse.” Mitigating biases like this is an area of active research. Nevertheless, word vectors are a useful building block for language models because they encode subtle but important information about the relationships between words. If a language model learns something about a cat (for example, it sometimes goes to the vet), the same thing is likely to be true of a kitten or a dog. If a model learns something about the relationship between Paris and France (for example, they share a language), there’s a good chance that the same will be true for Berlin and Germany and for Rome and Italy. Word meaning depends on context A simple word vector scheme like this doesn’t capture an important fact about natural language: Words often have multiple meanings. For example, the word “bank” can refer to a financial institution or to the land next to a river. Or consider the following sentences: John picks up a magazine. Susan works for a magazine. The meanings of magazine in these sentences are related but subtly different. John picks up a physical magazine, while Susan works for an organization that publishes physical magazines. When a word has two unrelated meanings, as with bank, linguists call them homonyms. When a word has two closely related meanings, as with magazine, linguists call it polysemy. LLMs like ChatGPT are able to represent the same word with different vectors depending on the context in which that word appears. There’s a vector for bank (financial institution) and a different vector for bank (of a river). There’s a vector for magazine (physical publication) and another for magazine (organization). As you might expect, LLMs use more similar vectors for polysemous meanings than homonymous ones. So far, we haven’t said anything about how language models do this—we’ll get into that shortly. But we’re belaboring these vector representations because it’s fundamental to understanding how language models work. Traditional software is designed to operate on data that’s unambiguous. If you ask a computer to compute “2 + 3,” there’s no ambiguity about what 2, +, or 3 mean. But natural language is full of ambiguities that go beyond homonyms and polysemy: In “the customer asked the mechanic to fix his car,” does “his” refer to the customer or the mechanic? In “the professor urged the student to do her homework” does “her” refer to the professor or the student? In “fruit flies like a banana” is “flies” a verb (referring to fruit soaring across the sky) or a noun (referring to banana-loving insects)? People resolve ambiguities like this based on context, but there are no simple or deterministic rules for doing this. Rather, it requires understanding facts about the world. You need to know that mechanics typically fix customers’ cars, that students typically do their own homework, and that fruit typically doesn’t fly. Word vectors provide a flexible way for language models to represent each word’s precise meaning in the context of a particular passage. Now let’s look at how they do that. Transforming word vectors into word predictions GPT-3, a 2020 predecessor to the language models that power ChatGPT, is organized into dozens of layers. Each layer takes a sequence of vectors as inputs—one vector for each word in the input text—and adds information to help clarify the meaning of that word and better predict which word might come next. Let’s start by looking at a stylized example: Credit: Timothy B. Lee / Understanding AI Credit: Timothy B. Lee / Understanding AI Each layer of an LLM is a transformer, a neural network architecture that was first introduced by Google in a landmark 2017 paper . The model’s input, shown at the bottom of the diagram, is the partial sentence “John wants his bank to cash the.” These words, represented as word2vec-style vectors, are fed into the first transformer. The transformer figures out that wants and cash are both verbs (both words can also be nouns). We’ve represented this added context as red text in parentheses, but in reality, the model would store it by modifying the word vectors in ways that are difficult for humans to interpret. These new vectors, known as a hidden state, are passed to the next transformer in the stack. The second transformer adds two other bits of context: It clarifies that “bank” refers to a financial institution rather than a river bank, and that “his” is a pronoun that refers to John. The second tran

類似記事(ベクトル近傍)