ジェミニ
原題: Gemini
分析結果
- カテゴリ
- AI
- 重要度
- 60
- トレンドスコア
- 24
- 要約
- ジェミニは、Google DeepMindによって開発されたアメリカのマルチモーダル大規模言語モデルであり、テキストの処理と推論能力に優れています。
- キーワード
Gemini — Grokipedia Fact-checked by Grok 13 days ago Gemini Ara Eve Leo Sal 1x ''Gemini'' is an American multimodal large language model developed by Google DeepMind known for its ability to process and reason over text, images, audio, video, and code. [1] [2] Announced in December 2023, Gemini was introduced as Google's most capable and general AI model at that time, initially available in three sizes—Ultra, Pro, and Nano—for applications ranging from complex reasoning to on-device use. [1] The Nano variant provides lightweight on-device AI for tasks like summarization and smart features on Android devices, available free and built into supported hardware without cloud dependency. [3] The model demonstrated strong performance across benchmarks, surpassing prior state-of-the-art results in several multimodal and reasoning tasks. Subsequent versions, including Gemini 1.5 with significantly expanded context windows, the 2.0 family, and the 3.0 series, have continued to enhance its capabilities and efficiency. Within the Gemini 2.5 series, the Gemini 2.5 Flash model is available in stable and preview versions. The stable version (gemini-2.5-flash) is fixed and unchanging, recommended for production applications due to its reliability. Preview versions (e.g., gemini-2.5-flash-preview-09-2025) and specialized previews (such as the Live preview for real-time audio streaming and the TTS preview for controllable text-to-speech) incorporate the latest improvements, including enhanced instruction following, reduced output tokens for cost and latency savings, improved tool use—specifically support for function calling (tool use) in the Gemini API, including parallel and compositional calls—multimodal capabilities, and features like real-time audio or text-to-speech. For example, parallel function calling enables the model to invoke multiple independent tools in a single turn, such as powering a disco ball, starting music, and dimming lights simultaneously. Compositional function calling allows chaining multiple calls sequentially, for instance in code examples using frameworks like LlamaIndex where tools are defined for addition and multiplication, enabling the model to compute queries like "(121 + 2) * 5?" by first calling add(121, 2) to get 123, then multiply(123, 5) to return 615. These capabilities are standard features of the Gemini API and not specific to a "Live" variant. [4] [5] Although previews may provide superior performance in quality and speed, they are subject to changes, typically feature more restrictive rate limits, and are deprecated with at least a 2-week notice. As of February 2026, the stable version is the preferred choice for consistent use, while previews are suitable for testing the latest enhancements. [4] On February 19, 2026, Google announced and began rolling out Gemini 3.1 Pro Preview across consumer apps, developer tools, and enterprise platforms as a major upgrade focused on complex reasoning and problem-solving. Gemini 3.1 Pro Preview achieved verified scores of 77.1% on ARC-AGI-2—more than double Gemini 3 Pro's performance—and 80.6% on SWE-Bench Verified. These gains reflect significant progress in core reasoning capabilities and have received positive reception underscoring Google's strong performance in the AI space. [6] [7] Building upon improvements in SVG generation demonstrated by models like Gemini 3 Pro in 2025, Gemini 3.1 Pro enables the generation of website-ready, animated SVGs directly from text prompts, including explainer diagrams, interactive data visualizations (e.g., animated charts), workflow diagrams, and other visual diagrams with features like sequential reveals and code-based animations. [6] On December 17, 2025, Google launched Gemini 3 Flash in public preview (model ID: gemini-3-flash-preview), positioning it as a fast, cost-effective model with frontier intelligence for everyday tasks. It became the new default model in the Gemini app (replacing Gemini 2.5 Flash for "Fast" mode), offering PhD-level reasoning, significant multimodal improvements (text, images, audio, video, PDF inputs; text outputs), and modes like "Fast" for quick answers and "Thinking" for complex problems. The model rolled out across the Gemini API, Google AI Studio, Vertex AI, Gemini CLI, Gemini Code Assist (VS Code/IntelliJ, preview as of March 13, 2026), and integrations like Gmail and Document AI. It features a 1 million token context window, knowledge cutoff around January 2025, strong tool use, and efficiency (e.g., lower token consumption and faster inference than some prior models). As of late March 2026, it remains in public preview without promotion to stable GA (no fixed model ID like gemini-3-flash). It is widely used in production-like settings, with reports of high-volume processing. This positions it ahead of the June 17, 2026 deprecation of Gemini 2.5 Flash stable. Gemini 3 Flash follows Google's pattern for Flash models: quick preview to default rollout, with stable GA typically weeks to months later. For example, Nano Banana Pro, built on Gemini 3 Pro, specializes in advanced image generation and editing with high-quality outputs, consistent likenesses, and text rendering, offering limited free access (with watermarks and quotas) via the Gemini app and paid upgrades for higher limits and resolutions. [2] [8] Gemini Live was initially exclusive to Gemini Advanced users, but Google started rolling it out to free users in September 2024, with phased availability across Android and eventually iOS devices. Gemini Live primarily utilizes the Flash series of models, which are specifically engineered for high speed and low latency. While the standard "chat" interface often allows you to toggle between different versions (like Gemini Advanced or Gemini 1.5 Pro), Gemini Live is designed for real-time, fluid conversation where response time is critical. Current Model Architecture As of 2026, the specific versions powering the "Live" experience include: Gemini 3.1 Flash Live : This is the latest iteration used for high-quality, audio-to-audio (A2A) interactions. It is optimized for sub-second latency, allowing the AI to listen and respond almost instantly. Gemini builds on Google's prior work with models like PaLM and the Pathways system, using native multimodal training rather than combining unimodal components. It has been integrated into products such as the Gemini app (gemini.google.com), Google Workspace, developer APIs, and Google AI Studio (ai.google.dev). YouTube has adapted Gemini to develop Large Recommender Models (LRMs) that enhance its video recommendation system, using Semantic IDs to tokenize videos into a learnable "language" based on content features, enabling better semantic understanding of video content and more personalized, precise suggestions as an evolutionary improvement to its AI-driven recommendation architecture. [9] [10] It has also been integrated into tools like Google Flow, an AI filmmaking application for creating cinematic video clips and stories that leverages Gemini models for intuitive prompting alongside Veo for video generation and Imagen for image assets; Google Flow requires a Google AI Pro or Ultra subscription for access. [11] As of February 2026, the Gemini app also supports music generation using the Lyria 3 model, enabling users aged 18 and older in supported languages to create 30-second tracks from text prompts or uploaded images/videos. Free accounts can generate up to 10 tracks per day, with higher limits for paid subscribers (Google AI Plus: 20 tracks per day, Pro: 50 tracks per day, Ultra: 100 tracks per day). [12] [13] As of February 2026, Google Gemini Apps support uploading PDFs and most other document file types with limits of up to 100 MB per non-video file and up to 10 files per prompt. There is no official support or mention of KML file uploads in Gemini AI documentation; KML is not listed among supported formats, though Gemini can generate KML text for use elsewhere (e.g., Google Earth/Maps). [14] Within Google Workspace, Gemini enables the generation of podcast-style "Audio Overview" audio summaries from PDFs in Google Drive; the process takes a few minutes, after which users receive an email notification that the file is ready (e.g., "your file is ready"), and the audio file is saved in an "Audio overviews" folder in Drive. [15] [16] Google AI Studio provides access to Gemini models with large context windows, typically 1 million tokens or more (e.g., in Gemini 3.1 Pro and 2.5 Pro), enabling effective handling of long documents, extended conversations, and complex tasks via API capabilities. [17] In contrast, the Gemini Advanced subscription tier, which provides access to advanced models through the Gemini web and app interface, has a more limited effective context window for ongoing conversations. In 2025-2026, Google advertised large context windows for Gemini Advanced models, such as 1 million tokens for Gemini 2.5 Pro (with 2 million planned) and up to 10 million tokens for Gemini 3 Pro, but numerous user reports indicate that the effective context window in the Gemini Advanced web/app interface is significantly smaller—often limited to ~32,000 tokens or less—with severe recall degradation ("context amnesia") beyond that, despite API access supporting the full advertised sizes. The effective window is often capped or truncated to around 32,000–128,000 tokens in practice, causing loss of earlier context in long chats despite the underlying model's support for larger windows. [13] [18] [19] [20] Gemini models are also accessible via third-party services such as OpenRouter. On OpenRouter, Gemini 2.5 Flash-Lite, released in June 2025, offers the highest cost-effectiveness among the Gemini series, priced at $0.10 per million input tokens and $0.40 per million output tokens. This lightweight reasoning model is optimized for ultra-low latency and cost efficiency, making it suitable for everyday tasks, coding, and other applicati