Web: grokipedia.com US web_search 2026-05-07 00:28

相関

原題: Correlation

分析結果

カテゴリ: AI
重要度: 54
トレンドスコア: 18
要約: 相関とは、統計学において2つの連続変数間の線形関係の強さと方向を測る指標を指します。相関は、変数がどの程度一緒に変動するかを示し、正の相関や負の相関が存在することがあります。
キーワード: correlation linear cov coefficient between association pearson relationships

Correlation — Grokipedia Fact-checked by Grok 3 months ago Correlation Ara Eve Leo Sal 1x In statistics, correlation refers to a measure of the strength and direction of the linear relationship between two continuous variables, quantified by a correlation coefficient that ranges from -1 to +1, where values near 1 indicate a strong positive association, near -1 a strong negative association, and near 0 no linear association. [1] The concept originated in the late 19th century through the work of Francis Galton , who developed the idea of the correlation coefficient to quantify consistent linear relationships between numeric variables, such as the relationship between the heights of parents and their children in his studies of heredity . [2] Karl Pearson later formalized the mathematical formula for the Pearson product-moment correlation coefficient in 1895, establishing it as a cornerstone of modern statistical analysis. [3] The most common form, Pearson's correlation coefficient (denoted as r for samples and ρ for populations), assumes normally distributed data and measures linear relationships, with positive values indicating that as one variable increases, the other tends to increase, and negative values showing the opposite. [1] For non-normal or ordinal data , alternatives like Spearman's rank correlation coefficient (ρ_s) are used, which assess monotonic relationships by ranking variables and are more robust to outliers. [1] Other variants, such as Kendall's tau , evaluate ordinal associations based on concordant and discordant pairs, providing another measure of rank correlation strength. [4] Key properties of correlation coefficients include their dimensionless nature, symmetry (the correlation between X and Y equals that between Y and X), and independence from variable scaling, making them versatile for comparing relationships across datasets. [1] However, correlation does not imply causation , as associations may arise from confounding factors, chance, or indirect influences, a limitation emphasized since its early development to prevent misinterpretation in fields like medicine and social sciences. [1] It also only captures linear or monotonic patterns, potentially underestimating nonlinear relationships, and is sensitive to outliers in the case of Pearson's method. [5] Applications of correlation span numerous disciplines, including assessing variable associations in psychology , economics , biology , and environmental science , often visualized through scatterplots to illustrate patterns before formal computation. [6] In research, it serves as a preliminary tool for hypothesis generation, informing regression analysis or experimental design, but requires cautious interpretation alongside significance testing (e.g., p-values) to evaluate reliability. [7] Fundamentals of Correlation Definition and Interpretation Correlation is a statistical measure that quantifies the strength and direction of the linear relationship between two variables, standardized to range from -1 to +1. A coefficient of +1 represents perfect positive linear association, where one variable increases proportionally with the other; 0 indicates no linear association; and -1 signifies perfect negative linear association, where one variable decreases as the other increases. [8] This measure focuses exclusively on linear dependencies and does not capture nonlinear relationships or imply causation. [9] The term "correlation" was coined by British scientist Francis Galton in 1888, during his studies on regression and biological inheritance , to describe the tendency of traits to vary together. Galton 's ideas were expanded by statistician Karl Pearson in 1895, who developed a mathematical framework for quantifying this association, laying the foundation for modern correlational analysis. [10] [3] Interpreting the correlation coefficient involves assessing both its sign (positive or negative direction) and magnitude (strength of the linear link). Values close to 0 suggest a weak association, while common guidelines classify |r| < 0.3 as weak, 0.3–0.7 as moderate, and >0.7 as strong; however, these thresholds are subjective and context-dependent, varying across fields like psychology or economics . [9] For instance, a correlation of 0.8 might indicate a robust linear relationship in social sciences but require cautious interpretation in physics due to differing expectations for effect sizes. [8] Scatterplots provide the essential visual aid for interpreting correlation, plotting paired observations as points on a coordinate plane to reveal patterns. High positive correlation appears as points tightly clustered along an upward-sloping line, negative correlation along a downward-sloping line, and low correlation as a diffuse cloud with no clear linear trend, enabling intuitive assessment of both strength and potential outliers. [11] Correlation and Independence In probability theory, two random variables X X X and Y Y Y are defined as uncorrelated if their covariance is zero, that is, Cov ⁡ ( X , Y ) = 0 \operatorname{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 , or equivalently, E [ ( X − μ X ) ( Y − μ Y ) ] = 0 E[(X - \mu_X)(Y - \mu_Y)] = 0 E [( X − μ X ) ( Y − μ Y )] = 0 , where μ X = E [ X ] \mu_X = E[X] μ X = E [ X ] and μ Y = E [ Y ] \mu_Y = E[Y] μ Y = E [ Y ] . [12] This condition implies that there is no linear relationship between the deviations of X X X and Y Y Y from their respective means. [13] Independence of X X X and Y Y Y always implies that they are uncorrelated, since the joint expectation factors under independence: E [ X Y ] = E [ X ] E [ Y ] E[XY] = E[X]E[Y] E [ X Y ] = E [ X ] E [ Y ] , leading to Cov ⁡ ( X , Y ) = 0 \operatorname{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 . [13] However, the converse does not hold in general: zero correlation does not imply statistical independence . [14] A classic counterexample involves X X X uniformly distributed on [ − 1 , 1 ] [-1, 1] [ − 1 , 1 ] and Y = X 2 Y = X^2 Y = X 2 . Here, E [ X ] = 0 E[X] = 0 E [ X ] = 0 and E [ X Y ] = E [ X 3 ] = 0 E[XY] = E[X^3] = 0 E [ X Y ] = E [ X 3 ] = 0 (since X 3 X^3 X 3 is an odd function over a symmetric interval), so Cov ⁡ ( X , Y ) = 0 \operatorname{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 , confirming uncorrelatedness. [15] Yet, X X X and Y Y Y are dependent, as the distribution of Y Y Y given X = 0 X = 0 X = 0 (where Y = 0 Y = 0 Y = 0 ) differs from the marginal distribution of Y Y Y , which is a scaled chi-squared-like density on [ 0 , 1 ] [0, 1] [ 0 , 1 ] . [15] An important exception occurs for jointly normal distributions. If X X X and Y Y Y follow a bivariate normal distribution , then zero correlation ( ρ X , Y = 0 \rho_{X,Y} = 0 ρ X , Y = 0 ) is equivalent to independence . [16] This equivalence arises because the joint density factors into the product of marginal normals precisely when the off-diagonal covariance term vanishes. [17] Full details on this property are discussed in the context of bivariate normal distribution s. In practice, tests of zero correlation, such as those based on the Pearson correlation coefficient, can assess independence only when the normality assumption holds; otherwise, they merely detect the absence of linear dependence, potentially missing nonlinear relationships. [18] Pearson's Product-Moment Correlation Mathematical Definition The Pearson product-moment correlation coefficient for two random variables X X X and Y Y Y , denoted ρ X , Y \rho_{X,Y} ρ X , Y , is defined as the covariance between X X X and Y Y Y divided by the product of their standard deviations: ρ X , Y = Cov ⁡ ( X , Y ) σ X σ Y , \rho_{X,Y} = \frac{\operatorname{Cov}(X,Y)}{\sigma_X \sigma_Y}, ρ X , Y = σ X σ Y Cov ( X , Y ) , where Cov ⁡ ( X , Y ) = E [ ( X − μ X ) ( Y − μ Y ) ] \operatorname{Cov}(X,Y) = E[(X - \mu_X)(Y - \mu_Y)] Cov ( X , Y ) = E [( X − μ X ) ( Y − μ Y )] , μ X = E [ X ] \mu_X = E[X] μ X = E [ X ] and μ Y = E [ Y ] \mu_Y = E[Y] μ Y = E [ Y ] are the expected values, σ X = Var ⁡ ( X ) \sigma_X = \sqrt{\operatorname{Var}(X)} σ X = Var ( X ) , and σ Y = Var ⁡ ( Y ) \sigma_Y = \sqrt{\operatorname{Var}(Y)} σ Y = Var ( Y ) . [19] [20] This formulation, introduced by Karl Pearson in 1895, quantifies the strength and direction of the linear relationship between the variables, assuming finite variances. [21] The coefficient can be derived from the covariance of standardized variables. Let Z X = ( X − μ X ) / σ X Z_X = (X - \mu_X)/\sigma_X Z X = ( X − μ X ) / σ X and Z Y = ( Y − μ Y ) / σ Y Z_Y = (Y - \mu_Y)/\sigma_Y Z Y = ( Y − μ Y ) / σ Y be the standardized versions of X X X and Y Y Y , each with mean zero and variance one. Then, ρ X , Y = E [ Z X Z Y ] = E [ ( X − μ X ) ( Y − μ Y ) ] σ X σ Y , \rho_{X,Y} = E[Z_X Z_Y] = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}, ρ X , Y = E [ Z X Z Y ] = σ X σ Y E [( X − μ X ) ( Y − μ Y )] , which normalizes the covariance to lie within a bounded range, facilitating comparison across different scales. [3] Geometrically, ρ X , Y \rho_{X,Y} ρ X , Y represents the cosine of the angle between the centered random vectors associated with X X X and Y Y Y in the L 2 L^2 L 2 space of square-integrable functions, where the inner product is the expectation: \rho_{X,Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sqrt{E[(X - \mu_X)^2] E[(Y - \mu_Y)^2]} = \cos \theta. This interpretation highlights the coefficient as a measure of directional alignment in a vector space framework. [3] The value of ρ X , Y \rho_{X,Y} ρ X , Y satisfies − 1 ≤ ρ X , Y ≤ 1 -1 \leq \rho_{X,Y} \leq 1 − 1 ≤ ρ X , Y ≤ 1 , a consequence of the Cauchy-Schwarz inequality applied to the inner product E [ ( X − μ X ) ( Y − μ Y ) ] E[(X - \mu_X)(Y - \mu_Y)] E [( X − μ X ) ( Y − μ Y )] . Equality holds at ρ X , Y = 1 \rho_{X,Y} = 1 ρ X , Y = 1 if and only if Y = a X + b Y = aX + b Y = a X + b for some a > 0 a > 0 a > 0 and constant b b b (perfect positive linear relationship), and at ρ X

相関

分析結果

類似記事（ベクトル近傍）