LLMとは何か、生成AIにおけるその利用法
原題: What are LLMs, and how are they used in generative AI?
分析結果
- カテゴリ
- AI
- 重要度
- 78
- トレンドスコア
- 42
- 要約
- LLM(大規模言語モデル)は、膨大なテキストデータを基に学習し、自然言語処理を行うAI技術です。生成AIにおいては、文章生成、翻訳、要約、対話システムなど多岐にわたる用途で活用されています。LLMは、文脈を理解し、創造的なコンテンツを生成する能力を持ち、さまざまな業界での応用が進んでいます。
- キーワード
What are LLMs, and how are they used in generative AI? – Computerworld Topics Close Analytics Android Apple Artificial Intelligence Augmented Reality Careers Cloud Computing Collaboration Software Computers and Peripherals Data Center Emerging Technology Enterprise Applications Enterprise Buyer’s Guides Generative AI Hybrid and Remote Work Industry IT Leadership IT Management IT Operations Mobile Networking Office Suites Operating Systems Productivity Software Security Vendors and Providers Windows Americas United States Asia India Korea (대한민국) Europe Germany (Deutschland) Netherlands Poland (Polska) Spain (España) Sweden (Sverige) United Kingdom Oceania Australia New Zealand by Lucas Mearian Senior Reporter What are LLMs, and how are they used in generative AI? feature Feb 7, 2024 15 mins Large language models are the algorithmic basis for chatbots like OpenAI's ChatGPT and Google's Bard. The technology is tied back to billions — even trillions — of parameters that can make them both inaccurate and non-specific for vertical industry use. Here's what LLMs are and how they work. Credit: Shutterstock When ChatGPT arrived in November 2022, it made mainstream the idea that generative artificial intelligence (genAI) could be used by companies and consumers to automate tasks, help with creative ideas, and even code software. If you need to boil down an email or chat thread into a concise summary, a chatbot such as OpenAI’s ChatGPT or Google’s Bard can do that. If you need to spruce up your resume with more eloquent language and impressive bullet points, AI can help. Want some ideas for a new marketing or ad campaign? Generative AI to the rescue. ChatGPT stands for chatbot generative pre-trained transformer. The chatbot’s foundation is the GPT large language model (LLM), a computer algorithm that processes natural language inputs and predicts the next word based on what it’s already seen. Then it predicts the next word, and the next word, and so on until its answer is complete. In the simplest of terms, LLMs are next-word prediction engines . Along with OpenAI’s GPT-3 and 4 LLM , popular LLMs include open models such as Google’s LaMDA and PaLM LLM (the basis for Bard), Hugging Face’s BLOOM and XLM-RoBERTa , Nvidia’s NeMO LLM , XLNet , Co:here , and GLM-130B . Open-source LLMs, in particular, are gaining traction, enabling a cadre of developers to create more customizable models at a lower cost. Meta’s February launch of LLaMA (Large Language Model Meta AI) kicked off an explosion among developers looking to build on top of open-source LLMs. LLMs are a type of AI that are currently trained on a massive trove of articles, Wikipedia entries, books, internet-based resources and other input to produce human-like responses to natural language queries. That’s an immense amount of data. But LLMs are poised to shrink, not grow, as vendors seek to customize them for specific uses that don’t need the massive data sets used by today’s most popular models. For example, Google’s new PaLM 2 LLM, announced earlier this month, uses almost five times more training data than its predecessor of just a year ago — 3.6 trillion tokens or strings of words, according to one report . The additional datasets allow PaLM 2 to perform more advanced coding, math, and creative writing tasks. Training up an LLM right requires massive server farms, or supercomputers, with enough compute power to tackle billions of parameters. So, what is an LLM? An LLM is a machine-learning neuro network trained through data input/output sets; frequently, the text is unlabeled or uncategorized, and the model is using self-supervised or semi-supervised learning methodology. Information is ingested, or content entered, into the LLM, and the output is what that algorithm predicts the next word will be. The input can be proprietary corporate data or, as in the case of ChatGPT , whatever data it’s fed and scraped directly from the internet. Training LLMs to use the right data requires the use of massive, expensive server farms that act as supercomputers. LLMs are controlled by parameters, as in millions, billions, and even trillions of them. (Think of a parameter as something that helps an LLM decide between different answer choices.) OpenAI’s GPT-3 LLM has 175 billion parameters, and the company’s latest model – GPT-4 – is purported to have 1 trillion parameters . For example, you could type into an LLM prompt window “For lunch today I ate….” The LLM could come back with “cereal,” or “rice,” or “steak tartare.” There’s no 100% right answer, but there is a probability based on the data already ingested in the model. The answer “cereal” might be the most probable answer based on existing data, so the LLM could complete the sentence with that word. But, because the LLM is a probability engine, it assigns a percentage to each possible answer. Cereal might occur 50% of the time, “rice” could be the answer 20% of the time, steak tartare .005% of the time. “The point is it learns to do this,” said Yoon Kim, an assistant professor at MIT who studies Machine Learning , Natural Language Processing and Deep Learning . “It’s not like a human — a large enough training set will assign these probabilities.” But beware — junk in, junk out. In other words, if the information an LLM has ingested is biased, incomplete, or otherwise undesirable, then the response it gives could be equally unreliable, bizarre, or even offensive. When a response goes off the rails, data analysts refer to it as “hallucinations,” because they can be so far off track. “Hallucinations happen because LLMs, in their in most vanilla form, don’t have an internal state representation of the world,” said Jonathan Siddharth, CEO of Turing, a Palo Alto, California company that uses AI to find, hire, and onboard software engineers remotely. “There’s no concept of fact. They’re predicting the next word based on what they’ve seen so far — it’s a statistical estimate.” Because some LLMs also train themselves on internet-based data, they can move well beyond what their initial developers created them to do. For example, Microsoft’s Bing uses GPT-3 as its basis, but it’s also querying a search engine and analyzing the first 20 results or so. It uses both an LLM and the internet to offer responses. “We see things like a model being trained on one programming language and these models then automatically generate code in another programming language it has never seen,” Siddharth said. “Even natural language; it’s not trained on French, but it’s able to generate sentences in French.” “It’s almost like there’s some emergent behavior. We don’t know quite know how these neural network works,” he added. “It’s both scary and exciting at the same time.” Another problem with LLMs and their parameters is the unintended biases that can be introduced by LLM developers and self-supervised data collection from the internet. Are LLMs biased? For example, systems like ChatGPT are highly likely to provide gender-biased answers based on the data they’ve ingested from the internet and programmers, according to Sayash Kapoor, a Ph.D. candidate at Princeton University’s Center for Information Technology Policy. “We tested ChatGPT for biases that are implicit — that is, the gender of the person is not obviously mentioned, but only included as information about their pronouns,” Kapoor said. “That is, if we replace “she” in the sentence with “he,” ChatGPT would be three times less likely to make an error.” Innate biases can be dangerous, Kapoor said, if language models are used in consequential real-world settings. For example, if biased language models are used in hiring processes, they can lead to real-world gender bias. Such biases are not a result of developers intentionally programming their models to be biased. But ultimately, the responsibility for fixing the biases rests with the developers, because they’re the ones releasing and profiting from AI models, Kapoor argued. What is prompt engineering? While most LLMs, such as OpenAI’s GPT-4, are pre-filled with massive amounts of information, prompt engineering by users can also train the model for specific industry or even organizational use. “Prompt engineering is about deciding what we feed this algorithm so that it says what we want it to,” MIT’s Kim said. “The LLM is a system that just babbles without any text context. In some sense of the term, an LLM is already a chatbot.” Prompt engineering is the process of crafting and optimizing text prompts for an LLM to achieve desired outcomes. Perhaps as important for users, prompt engineering is poised to become a vital skill for IT and business professionals. Because prompt engineering is a nascent and emerging discipline, enterprises are relying on booklets and prompt guides as a way to ensure optimal responses from their AI applications. There are even marketplaces emerging for prompts, such as the 100 best prompts for ChatGPT . Perhaps as important for users, prompt engineering is poised to become a vital skill for IT and business professionals, according to Eno Reyes, a machine learning engineer with Hugging Face, a community-driven platform that creates and hosts LLMs. Prompt engineers will be responsible for creating customized LLMs for business use. How will LLMs become smaller, faster, and cheaper? Today, chatbots based on LLMs are most commonly used “out of the box” as a text-based, web-chat interface . They’re used in search engines such as Google’s Bard and Microsoft’s Bing (based on ChatGPT) and for automated online customer assistance. Companies can ingest their own datasets to make the chatbots more customized for their particular business, but accuracy can suffer because of the massive trove of data already ingested. “What we’re discovering more and more is that with small models that you train on more data longer…, they can do what large models used to do,” Thomas Wolf, co-founder and CSO at Hugging Face, said while attending an MI