Global Trend Radar
Web: www.ibm.com US web_search 2026-05-06 13:19

大規模言語モデル(LLM)とは?

原題: What are large language models (LLMs)? - IBM

元記事を開く →

分析結果

カテゴリ
AI
重要度
78
トレンドスコア
42
要約
大規模言語モデル(LLM)は、膨大なテキストデータを基に訓練されたAIモデルであり、自然言語処理の分野で重要な役割を果たしています。これらのモデルは、文章の生成、翻訳、要約など多様なタスクを実行でき、ユーザーとのインタラクションを向上させるために利用されています。LLMは、文脈を理解し、適切な応答を生成する能力を持ち、さまざまなアプリケーションに応用されています。
キーワード
What Are Large Language Models (LLMs)? | IBM IBM Dev Day: Bob Edition Building Intelligent Apps with Agents and MCP | Register now What are large language models (LLMs)? By Cole Stryker What are LLMs? Large language models (LLMs) are a category of deep learning models trained on immense amounts of data, making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks. LLMs are built on a type of neural network architecture called a transformer which excels at handling sequences of words and capturing patterns in text. LLMs work as giant statistical prediction machines that repeatedly predict the next word in a sequence. They learn patterns in their text and generate language that follows those patterns. LLMs represent a major leap in how humans interact with technology because they are the first AI system that can handle unstructured human language at scale, allowing for natural communication with machines. Where traditional search engines and and other programmed systems used algorithms to match keywords, LLMs capture deeper context, nuance and reasoning. LLMs, once trained, can adapt to many applications that involve interpreting text, like summarizing an article, debugging code or drafting a legal clause. When given agentic capabilities, LLMs can perform, with varying degrees of autonomy, various tasks that would otherwise be performed by humans. LLMs are the culmination of decades of progress in natural language processing (NLP) and machine learning research, and their development is largely responsible for the explosion of artificial intelligence advancements across the late 2010s and 2020s. Popular LLMs have become household names, bringing generative AI to the forefront of the public interest. LLMs are also used widely in enterprises, with organizations investing heavily across numerous business functions and use cases. LLMs are easily accessible to the public through interfaces like Anthropic’s Claude , Open AI’s ChatGPT , Microsoft’s Copilot, Meta’s Llama models , and Google’s Gemini assistant, along with its BERT and PaLM models. IBM maintains a Granite model series on watsonx.ai , which has become the generative AI backbone for other IBM products like watsonx Assistant and watsonx Orchestrate . Join over 100,000 subscribers who read the latest news in tech Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think Newsletter, delivered twice weekly. See the IBM Privacy Statement . Thank you! You are subscribed. Pretraining large language models Training starts with a massive amount of data—billions or trillions of words from books, articles, websites, code and other text sources. Data scientists oversee cleaning and pre-processing to remove errors, duplication and undesirable content. This text is broken down into smaller, machine-readable units called “tokens,” during a process of “ tokenization .” Tokens are smaller units such as words, subwords or characters. This standardizes the language so rare and novel words can be handled consistently. LLMs are initially trained with self-supervised learning , a machine learning technique that uses unlabeled data for supervised learning . Self-supervised learning doesn’t require labeled datasets, but it’s closely related to supervised learning in that it optimizes performance against a "ground truth." In self-supervised learning, tasks are designed such that ground truth can be inferred from unlabeled data. Instead of being told what the “correct output” is for each input, as in supervised learning, the model tries to find patterns, structures or relationships in the data on its own. Self-attention The model passes the tokens through a transformer network. Transformer models, introduced in 2017, are useful due to their self-attention mechanism , which allows them to “pay attention to” different tokens at different moments. This technique is the centerpiece of the transformer and its prime innovation. Self-attention is useful in part because it allows the AI model to calculate the relationships and dependencies between tokens, especially ones that are distant from one another in the text. Transformer architectures also allow for parallelization, making the process much more efficient than previous methods. These qualities allowed LLMs to handle unprecedentedly large datasets . Once text is split into tokens, each token is mapped to a vector of numbers called an embedding . Neural networks consists of layers of artificial neurons, where each neuron performs a mathematical operation. Transformers consist of many of these layers, and at each, the embeddings are slightly adjusted, becoming richer contextual representations from layer to layer. The goal in this process is for the model to learn semantic associations between words, so that words like “bark” and “dog” appear closer together in vector space in an essay about dogs than “bark” and “tree” would, based on the surrounding dog-related words in the essay. Transformers also add positional encodings , which give each token information about its place in the sequence. To compute attention, each embedding is projected into three distinct vectors using learned weight matrices: a query, a key, and a value. The query represents what a given token is “seeking,” the key represents the information that each token contains, and the value “returns” the information from each key vector, scaled by its respective attention weight. Alignment scores are then computed as the similarity between queries and keys. These scores, once normalized into attention weights, determine how much of each value vector flows into the representation of the current token. This process allows the model to flexibly focus on relevant context while ignoring less important tokens (like “tree”). Self-attention thus creates “weighted” connections between all tokens more efficiently than earlier architectures could. The model assigns weights to each relationship between the tokens. LLMs can have billions or trillions of these weights, which are one type of LLM parameter , the internal configuration variables of a machine learning model that control how it processes data and makes predictions. The number of parameters refers to how many of these variables exist in a model, with some LLMs containing billions of parameters. So-called small language models are smaller in scale and scope with comparatively few parameters, making them suitable for deployment on smaller devices or in resource-constrained environments. During training, the model makes predictions across millions of examples drawn from its training data , and a loss function quantifies the error of each prediction. Through an iterative cycle of making predictions and then updating model weights through backpropagation and gradient descent , the model “learns” the the weights in the layers that produce the query, key and value vectors. Once those weights are sufficiently optimized, they’re able to take in any token’s original vector embedding and produce query, key and value vectors for it that, when interacting with the vectors generated for all the other tokens, will yield “better” alignment scores that in turn result in attention weights which help the model produce better outputs. The end result is a model that has learned patterns in grammar, facts, reasoning structures, writing styles and more. Fine-tuning large language models After training (or in the context of additional training, “pretraining”), LLMs can be fine-tuned to make them more useful in certain contexts. For example, a foundational model trained on a large dataset of general knowledge can be fine-tuned on a corpus of legal Q&As in order to create a chatbot for the legal field. Here are some of the most common forms of fine-tuning. Practitioners may use one method or a combination of several. Supervised fine-tuning Fine-tuning most often happens in a supervised context with a much smaller, labelled dataset. The model updates its weights to better match the new ground truth (in this case, labeled data). While pretraining is intended to give the model broad general knowledge, fine-tuning adapts a general-purpose model to specific tasks like summarization, classification or customer support. These functional adaptations represent new types of tasks. Supervised fine-tuning produces outputs closer to the human-provided examples, requiring far fewer resources than training from scratch. Supervised fine-tuning is also useful for domain-specific customization , such as training a model on medical documents so it has the ability to answer healthcare-related questions. Reinforcement learning from human feedback To further refine models, data scientists often use reinforcement learning from human feedback (RLHF), a form of fine-tuning where humans rank model outputs and the model is trained to prefer outputs that humans rank higher. RLHF is often used in alignment, a process which consists of making LLM outputs useful, safe and consistent with human values. RLHF is also particularly useful for stylistic alignment , where an LLM can be adjusted to respond in a way that's more casual, humorous or brand-consistent. Stylistic alignment involves training for the same types of tasks, but producing outputs in a specific style. Reasoning models Purely supervised fine-tuning teaches a model to imitate examples, but it doesn’t necessarily encourage better reasoning, which involves abstract, multi-step processes. Such tasks don’t always have abundant labeled data, so reinforcement learning is often used in the creation of reasoning model s, LLMs that have been fine-tuned to break complex problems into smaller steps, often called “reasoning traces,” prior to generating a final output. Increasingly sophisticated means of training models gives them chain-of-thought reasoning and other multi-step decision-making strategies. In

類似記事(ベクトル近傍)