Symptoms Measurement , Community Sampling and the Zero-Problem : A Case for Two-Part Modeling

The difference between mental health and mental ability measurement hinges on a single concept—zero. Dysfunctional mental health is manifested by symptoms defined as self-reported feelings of unpleasantness due to pathological causes. Symptoms can be meaningfully reported as present or absent whereas mental abilities are generally considered to be ever present in some positive amount. Absence of symptoms creates a population zero class with unknown membership and proportion. Inadvertent mixture of zeroand non-zero classes, as often occurs in community samples, biases symptom estimates of means, variances, and covariance for the non-zero class, resulting in what is herein referred to as the zero-problem. Two-part modeling is proposed as a means of circumventing the zero-problem. In Part I, zero-class sample members are identified and deleted. Part II provides users a symptoms research paradigm based on a multiplicative measurement model. Data are logarithmically transformed, and the log-normal distribution assumed. The hypothesis that symptom statements are unidimensional is tested by confirmatory factor analysis (CFA). If accepted, statements are combined into a weighted pathology score. Pathology scores can be correlated, corrected for attenuation, and used as input to multivariate statistical applications. Computer routines are provided as a user service.


Introduction
Symptoms as self-reported feelings of abnormality play a critical role in psychopathological diagnosis and assessment.Symptoms considered as overt manifestations of an underlying pathological state [1] differ from traits in that symptoms can be meaningfully reported as present or absent, whereas traits are generally considered to be ever present in some positive amount [2].This distinction is central to an understanding of the inherent difference between human health and human ability measurement.

The Zero-problem and its ramifications
Zero as a real representation of nothing has historically fascinated mankind [3].Measurement-wise, zero can be used to represent the total absence of a construct amount, as in absolute zero temperature, or to represent a categorical distinction of kind such as presence or absence of a disease.The problem is that zero cannot simultaneously be ascribed both a categorical and a dimensional representation within a generative experiment.Readers are encouraged to refer to [2] for a discussion of experiment as a generative process.If zero is used to designate a class, the distinction between zero and non-zero is qualitative.On the other hand, if zero is intended to distinguish absence of a construct amount from its presence, the distinction is quantitative.Of course, the digit 0 as a scale origin can always be arbitrarily assigned to an observable event, such as the freezing point of water.
Community sampling defined as the selection of experimental subjects from a community setting [4] is particularly susceptible

Part one
The most expeditious identification of asymptomatic individuals is to make use of the collection of symptom descriptive statements that generate sample community data.In the absence of a zero benchmark, symptom free sample members can reasonably be expected to choose the scale benchmark indicative of the least symptom amount, generally designated by the integer 1.By this logic, all sampled individuals with a p x 1 response profile of 1s can reasonably be considered asymptomatic, where p is the number of items comprising the scale.To compensate for the possibility that the criterion may be too stringent, the definition is expanded to include all sampled individuals with a response profile containing at least p -1 unitary responses, where p is the number of symptom statements.All such individuals are defined as asymptomatic and consequently deleted from further analytic consideration.

Part two: Symptom measurement model
Symptoms as feelings of unpleasantness can be decomposed into true (T S ) and error (E S ) latent components.The error component is considered to represent random measurement error that is independent of the true symptom component.True and error symptom components are combined multiplicatively and expressed as where Y is an observed continuous symptom response variable, and ≡ is interpreted as "defined as".The implication is that Y is a derived variable caused by the co-joint effects of T S and E S. The symptom measurement model differs from the classical test score model in that true and error symptom components are combined multiplicatively rather than additively as in the neo-classic model [2, Tenet 1].Tenet 1 refers to the first of 14 numbered tenets contained in Citation 2.
To be useful, symptoms must be symptomatic of an underlying pathology defined as the anatomic or dysfunctional manifestations of a disease or disorder denoted by a latent random variable P.
The pathology measure P is considered to be decomposable into a , to the zero problem.The reason is that community samples are likely to contain an admixture of asymptomatic and symptomatic individuals.Class membership as well as class proportions are usually unknown and must be estimated from sample data.Symptom presence is generally scaled on intensity, severity, frequency, or duration, with five or more integer benchmarks arranged from left to right in ascending order.Seldom is a scale benchmark provided to account for symptom absence.In the absence of an explicit zero benchmark, asymptomatic individuals are more likely to choose the leftmost benchmark indicating the least amount resulting in positively skewed response distributions.
The zero-problem has ramifications for the computation of sample mean, variance, and covariance.The inclusion of asymptomatic individuals in a sample will bias the estimates for symptomatic individuals regardless of whether respondents are allowed to self-report symptom absence.Reported symptom absence on paired scales contributes to their covariance, leading to the dubious assertion that joint symptom absence can be construed as association.The susceptibility to bias argues against use of conventional covariance structure analysis [5] in symptoms research on participants drawn from a community sample.

Purpose and organization
The intent of this article is to put forth a modeling procedure that circumvents the zero-problem in symptoms research.To this end, a proposed procedure must: (a) screen out asymptomatic individuals from a community sample; (b) quantize a continuous symptom measure to integer scale benchmarks; (c) correct symptom covariance for scale coarseness; (d) estimate latent pathology true and error model parameters; (e) allow for skewed response distributions, and (f) compute and store pathology scores for further use.Finally, a computational procedure must be readily available in the form of an open source computer utility.

ACTA PSYCHOPATHOLOGICA ISSN 2469-6676
where ∈ is the base of the natural logarithms and f µ λ + is a pathology true score measure with ( ) E P µ = , P λ σ = , and f the standardization of the P variable.The purpose is to establish a causal linkage between an underlying pathology and psychological feelings of unpleasantness.The linkage between error in the pathology measurement and error in symptom measurement is similarly depicted as Upon substitution for T S and E S , the continuous symptom response variable Y can be expressed as where f µ λ + and ε are as previously defined.In the absence of measurement error in the pathology variable P, ε =0 and the symptom response variable Y is unaffected, as 0 1 ε = .When 0 ε < , the symptom response variable Y is attenuated, and conversely inflated when ε >0 as should be expected.As the

Logarithmic transformation
The logarithm has been referred to as the most useful arithmetic concept in science [9].Its use in biology has been catalogued by Koch [10,11].In psychology, logarithms form the basis of Fitt's, Hick's, and the Weber-Fechner Laws [12,13].The log transformation applied to the multiplicative symptom response model is represented as ln( ) ln( ) ln( )

Scalar quantization
The continuous symptom response variable ln(Y) draws values from the real number line.However, symptom responses as conventionally scaled are restricted to integer values generally ranging from 1 to 5 or 7.This situation bears a remarkable similarity to the "analog" to "digital" conversion in signal theory.
The process whereby an interval of analog signals is assigned a single scalar value is known in audio coding as scalar quantization , which emphasizes the role of q(ln(y)) as add-on noise.

Scale coarseness
Coarseness in symptom scaling is a function of the number of scale benchmarks used to categorize the ln(Y) variable.The more benchmarks provided, the smaller the quantization error.
Conversely, fewer benchmarks coarsen the scale and increase quantization noise.The effect of quantization noise is to introduce nonlinear and systematic error that serve to attenuate estimates of population covariance and correlation [15].Because quantization noise is systematic, it should not be confused with measurement error which by definition is unsystematic variation.

The log-normal distribution
As previously argued, the symptom measurement model can be expressed as ln Y f µ λ ε = + + , which follows neo-classical test theory [2] in form with the exception that the observed variable Y is logarithmically transformed.As f and e in the pathology measurement model are defined as independent normally distributed random variables, ln Y as a weighted sum of normally distributed variables is itself normally distributed.Thus, Y can be said to be log-normally distributed with a twoparameter probability density function where μ is the location parameter and σ the scale parameter on a logarithmic scale.For a sample of size n, the parameters are estimated as ^1 1 ln( ) ∑ [16].Sample parameter estimates are on a logarithmic scale as they are functions of the transformed value ln( ) i y , where i = i, 2, …, n.Depending upon parameter values, the log-normal distribution can range in shape from near normal to skewed.Location and scale parameter estimates on the logarithmic scale can also be directly obtained from the mean and variance of assumed normally distributed sample data according to the relations

Visual graphics as a decision tool
The most observable distinction between the neo-classical true score and the symptom measurement model is the proposed form of the data distribution.The neo-classical model assumes that the non-transformed data follow a normal distribution whereas the symptom model assumes a log-normal distribution.
The extent of comparative fit can be visually examined by fitting a normal and a log-normal distribution to the histogram of the non-transformed Part II data and displaying the result as a graphical plot.A comparative fit index (CFI) can be computed for each distributional form according to the formula )) where ( ) i q est is the estimated value for the 1%, 5%, 10%, 25%, 50%, 75%, 90%, 95%, and 99% quantiles of the normal or lognormal distribution and ( ) i q actual is the integer scale score corresponding to each quantile.The distribution with the lower CFI is judged to be the better fit.If the log-normal distribution is not the better fit, the hypothesis of a multiplicative measurement model is problematic.To log transform the original symptom scale data simply as a corrective for skewed data is subject to the scenario of misuses and misinterpretations as enumerated by Feng, Wang, Lu, and Tu [18].Because the graphics procedure must be dynamic and able to operate in near real time as well as serving modeler's immediate needs and interests, it has been described as dynamic-interactive by some authors [19].

Symptom scalability
A set of p symptom statements descriptive of unpleasant feelings associated with a target pathology is said to be scalable if there exists a standardized pathology measure f that is common to all p symptom statements.If so, then the log transformation of each symptom statement can be expressed as

A comparative analysis
The distinction between human ability and human health measurement is most apparent when examining the comparative meaning of classical true score and true symptom.As both neoclassical true score and true symptom can be considered as mapping of an experimental probability space to real numbers [2], their difference must reside in the nature of the underlying generative experiment.For a more comprehensive discussion, the reader is referred to [2, Tenet 1].Ability experiments as organized activity are designed to produce quantitative outcomes that differ in amount across a subject population.All subjects are assumed to possess this ability in some positive amount.Abilities as latent traits are relatively enduring over time.True implies that the latent trait scores are free of unsystematic measurement error.
Symptoms, in contrast, are self-reports of the presence of unpleasant states-of-being that impair human psychological and physiological functioning [20].To be useful in a diagnostic and treatment capacity, symptoms must be symptomatic of some underlying biomedical causal agent, generally the presence of biological pathogens, inherent weakness, organ malfunctioning, or environmental stressors.When applied to mental functioning, epistemology in the United States generally takes the form of a biomedical model that posits that mental disorders are diseases of the brain amenable to pharmacological treatment [21].This is not the case for traits which are posited to have a genetic causal framework making them relatively immutable to treatment [22].

Symptom pathology score
The initial step in symptoms research is to identify a target pathology of research interest.Measurement requires that p benchmarked statements considered as descriptive of manifest feelings emanating from the target pathology be developed and submitted to two-part modeling.A log-normal distribution is fitted to the histogram of Part II untransformed data for each statement and compared with the fit of a normal distribution.

ACTA PSYCHOPATHOLOGICA ISSN 2469-6676
If the log-normal is judged to be the better fit, Part II data are log transformed and a p x p correlation matrix computed and corrected for scale coarseness.A single-factor CFA is performed on the corrected correlation matrix.If the hypothesis of unidimensionality is sustained, the p statements can be said to be scalable and a single pathology score estimate ^j f assigned to the j th symptomatic sample subject according to the formula The reliability of the standardized latent pathology score ^j f is defined as the squared canonical correlation between a maximally-weighted sum of p observed log transformed symptom scores and the standardized common pathology score variable f.This squared correlation is termed R Max , as it is the maximum squared correlation that can be obtained by choice of observed symptom weights, and is estimated as

A computer routine for two-part modeling
Due to the computational requirements of two-part modeling and the incorporation of computer graphics, a customized integrated execution routine is presented in Appendix A. The routine is written in the SAS® IML language.The routine accepts as input a SAS® data file containing only the numeric responses for p symptom statements scaled on a 5-point scale with no zero benchmark for a sample of N individuals.No other character or ID variables are permitted.Missing values are not allowed and if present must be imputed with an integer scale value prior to running.The routine accepts data files containing 4 to 10 symptom statements as variables.Users are asked to input the name of the SAS® library housing the data set; the assigned name for the SAS® data set; the number of variables contained in the specified data set; formatting notation related to number of variables; the choice as to whether or not to create data histograms; the choice as to normal or log-normal distributional form; the choice as to whether to save computed pathology scores; and the file name where pathology score estimates are to be saved.User input is checked for accuracy and the program terminated if an entry inconsistency is encountered.
Given no input inconsistencies, the routine begins by sorting sample observations into those who meet the asymptomatic criterion (having a p-item profile containing not less than p -1 1s) and those who do not, considered as symptomatic.
The asymptomatic subsample is deleted from further analytic consideration.Given the recommended starting options to create a histogram and a normal distribution, the routine runs SAS® Proc Univariate on each of the p symptomatic subsample variables and plots both a normal and a lognormal distributional fit on a single graph for visual comparison.Additionally, goodness-of-fit data are presented for nine quantiles.A utility for computation of a fit index based on quantiles is presented in Appendix B. Users are responsible for provision of quantile data to the fit index routine.
The Appendix A routine must be rerun with the "no histogram" option and log-normal distributional specification for each pathology to be scaled.As a result, the original data are log transformed.A p x p correlation matrix is computed, corrected for scale coarseness, and a CFA performed using SAS® Proc Calis.
The routine outputs the distributional type; the symptomatic sample size; the asymptomatic sample size; CFA fit statistics; CFA parameters; R Max ; pathology score mean; and pathology score uncorrected and corrected variance.

Exemplary Application of Two-part Modeling Sampling procedure
The sample selected for exemplary two-part analysis is drawn from the National Survey of the Vietnam Generation (NSVG) Public Use Analysis File that contained the analysis variables from the National Vietnam Veterans Readjustment Study (NVVRS) [24].The data source was selected because it represents the most comprehensive and documented sample of military veterans' health outcomes ever assembled.In the NVVRS, study cohorts were selected via probability sampling by a two-stage national household design.An initial sample of 1187 male veterans who had served in the Vietnam theater was drawn from the NSVG Public Use Analysis File.Vietnam theater group veterans were targeted because they had the most direct combat experience.
Males were targeted because females during the Vietnam War were prohibited from combat duty.Ten of the 1187 veterans each had more than five missing data values and were deleted, producing a final analysis sample of 1177 male Vietnam theater veterans.The final sample is considered a community sample, with community being defined as those male veterans who had served in the Vietnam theater of operations.As with community sampling in general, the proportion of the final sample suffering from PTSD as a dysfunctional pathology was unknown.
This article is available from: www.psychopathology.imedpub.com

Sample data
PTSD symptom statements were drawn from the 35 items of the Mississippi Scale for Combat-Related Post-Traumatic Stress Disorder (M-PTSD) [25].The M-PTSD items were scored on a 5-point Likert-type scale, adjusted for analytic purposes such that higher benchmark values always corresponded to greater dysfunctionality.No allowance was made for symptom absence.Item symptom statements were uniquely assigned to four pathologies identified from previous research as: Re-experiencing and Situational Avoidance; Withdrawal and Numbing; Arousal and Lack of Behavioral Control; Self-Persecution/Survivor Guilt [Schlenger WE (2014) PTSD symptoms research: A third generation approach].Each of the pathologies and their associated symptom statements was hypothesized to constitute a unidimensional symptom measurement model.
Prior to analysis, 23 missing values were imputed using the "hot deck" procedure [26], which has the advantage of imputing scale scores as whole numbers.

Scale purification
Each of the four hypothesized pathology scales was subject to tetrad-based purification using a six-step process suggested by Drewes [27].As a result of purification, six items were deleted from the first pathology measurement model, five from the second, three from the third, and one from the fourth.The first pathology measurement model was renamed Re-experiencing and contained five subsidiary symptom statements.The second was renamed Withdrawal and contained six subsidiary symptom statements.The third was renamed Arousal and contained five subsidiary symptom statements.The fourth was renamed Self-Persecution and contained four subsidiary symptom statements.
The four purified pathology models and their twenty component subsidiary statements are shown in Table 1.

Part I results
The                   Arousal, respectively, can be combined into a weighted sum as each statement measures the same underlying pathology.
Component statements assigned to Self-persecution cannot be combined, as the CFA results do not support the hypothesis of a common pathology underlying each of the four component symptom statements.
The results designated as factor loadings in Statistical Outputs A-D represent the relative importance of each observed symptom   The range of manifest symptom reliabilities, (0.712-0.203) for Re-experiencing, (0.427-0.126) for Withdrawal, and (0.580-0.081) for Arousal attest to the multi-faceted complexity of PTSD pathologies [30].No single symptom statement has sufficient reliability to serve as a sole proxy for the common pathology.
Yet each has the potential to make a unique contribution.A promising approach is to use a weighted combination of all constituent manifest symptoms to predict a pathology score.The square of the multiple-regression coefficient is Max R , as previously defined.Estimated Max R values are given in Statistical Output A-D as 0.825 for Re-experiencing, 0.726 for Withdrawal, and 0.720 for Arousal.

The difference between Max
R and the square of the largest component factor loading represents the contribution of the remaining symptom statements to latent pathology prediction.For the Re-experiencing component, the remaining four statements account for (0.825-0.712) x 100% = 11.3% of the pathology score variance; for Withdrawal, the remaining five statements account for (0.726-0.427) x 100% = 29.9% of Withdrawal variance; and for Arousal, the remaining four statements account for an additional (0.720-0.580) x 100% = 14% of Arousal variance.The larger residual contribution for Withdrawal is probably due to an additional symptom statement.The evidence-based conclusion is that the remaining statements in each component measurement model make a significant contribution to pathology score measurement and should be retained in the model.No conclusions can be drawn for Selfpersecution, as the measurement model failed the CFA test for unidimensionality.
The standardized beta coefficient * ^i b for the i th manifest symptom statement, i = 1, 2, … p, is estimated as A latent pathology score ^j f is estimated for the j th subject as a weighted standardized score according to , where B f is as previously defined and Z ij is a profile of standardized log transformed symptom scores.Note that the weighted score interpretation is equivalent to the Bartlett factor score as previously defined.Depending upon user discretion, the Appendix A routine can compute individual subject latent pathology scores and save as a SAS® file in a designated library.

Statistical analyses with latent pathology scores
For analysis purposes, latent pathology scores can be treated as if they were empirical variables.Means, variances, covariance, and correlations can be computed using conventional formulae.This allows researchers to establish inter-correlations among a set of differential pathologies; to use pathology scores as predictors of external health-related criteria variables or conversely external

ACTA PSYCHOPATHOLOGICA ISSN 2469-6676
health-related variables as predictors of single or multiple pathology scores; or to perform cluster or factor analyses to determine the dimensionality of a correlation matrix of selected pathologies.
There is, however, an important caveat to bear in mind.Latent pathology scores, more generally known as Bartlett factor scores, are weighted summations of standardized log transformed observed scale scores.Accordingly, the variance of pathology scores contains a contribution due to residual error variance.This contribution serves to inflate the computed variance estimate resulting in an upward bias.Fortunately, the population variance of standardized Bartlett factor scores B f can be expressed as But as previously discussed, Corr f f is the disattenuated factor score correlation.By incorporating the definition of correlation, the pair-wise factor score correction for attenuation can be rewritten as , which is the factor score extension of Spearman's correction for attenuation [31].
Sample size is at issue as the number of symptomatic subjects varies by component pathology.For the PTSD example, the symptomatic sample size was 846 for Re-experiencing, 1133 for Withdrawal, and 980 for Arousal.The Appendix A routine designates asymptomatic subjects as missing values in the computation and storage of Bartlett factor scores.This has the advantage of maintaining a constant sample size across component factors (N = 1177 for the PTSD example).
Remaining at issue is how to handle missing values.The simplest approach is to compute pair-wise correlations using only those observations with non-missing factor scores on both pathology factors.While simple, the pair-wise approach has the disadvantage that Bartlett score inter-correlations may be based on different subjects depending upon the selected factor pair. Correlations based on different sample sizes and subject composition poses serious statistical problems for analyses requiring a k x k (k>2) dimensional correlation matrix.Consequently, a recommended solution is to keep all observations that have no missing values on the k Bartlett scores and to drop all others.By so doing, all k(k-1)/2 pair-wise correlations are based on the same sample and can be estimated by conventional means.More sophisticated missing value imputation is unsuitable either due to non-random missing value assignment or lack of external covariates to permit missing value prediction.
Implementation of the recommended procedure requires that the k Bartlett score files created by sequential running of the Appendix A utility for each component factor be merged into a single file.The k factors in the merged file must then be inter-correlated with the requirement that all retained sample observations contain no missing values.In SAS® Version 9. File names are those used in analysis of the PTSD data.File naming conventions are at the user discretion but must be the same as those used in each of the separate analyses.Variables must be renamed sequentially in the merged file, as by default the Bartlett score variable in each merged file is named v1.
This article is available from: www.psychopathology.imedpub.comRunning the above program produces attenuated correlations.Correlations are corrected for attenuation by dividing each off-diagonal attenuated correlation by the square root of the product of the R Max for each contributing factor.In the following matrix, uncorrected correlations for the three PTSD pathologies are shown above the diagonal and corrected correlations below the diagonal.
Each correlation is based on a sample size of 766, which is the number of observations with no missing values on all three pathology scores.For the PTSD example, 766 out of 1177 subjects qualified as symptomatic on each of the three PTSD pathology factors.Withdrawal and Arousal exhibit the highest and Withdrawal and Reexperiencing the lowest pair-wise correlations.

Symptoms Research -A Sequential Approach
Symptoms as latent variables.The premise of this paper is that neo-true score theory [2] can be meaningfully applied to psychopathological symptoms measurement.At the core is the axiom that an observed symptom score can be decomposed into a true-score and an error-score component.True and error symptom score components are each non-observable and assigned the status of latent variables.Error symptom latent scores are independent of true symptom scores, with the implication that true symptom scores are measured error free.
The definition of symptom true and error scores differ from that used in neo-classic test score theory.For mental ability test scores X, true score is defined as , where exp( ) exp( ) exp( ) exp( ) under the assumption that the pathology P score components are additively combined.Consequently, ln(Y) = P under the multiplicative combination of true and error symptom components so defined.
If the pathology variable P is normally distributed, then by definitional equivalence, so must ln(Y).The distributional form wherein the log of a variable is normally distributed is known as a log-normal distribution.The normal distribution has the property that if two variables are each normally distributed, then their sum is also normally distributed.The log-normal distribution also shares this feature-if true and error components are each log-normally distributed, then their sum is also log-normally distributed.The normal and the log-normal are the only wellknown statistical distributions with this essential modeling property.
This leaves symptoms researchers with two options as to measurement models-additive or multiplicative.There is recognition in the literature that symptoms may be multiplicative [33].Fortunately, each type has a recognizable visual signature.
For the normal bell curve, observations cluster symmetrically around the mean.The odds of a symptom score being less than one standard deviation below the mean are approximately 1 in 6.3 and equals the odds of a symptom score being more than one standard deviation above the means.The odds increase exponentially as the distance below or above the mean increases [34].In contrast, for the log-normal distribution, the mass of the probability density function is disproportionally grouped at the lower or higher end of the scale resulting in a skewed distribution.Skewness is visually illustrated in Figure 1b, the second symptom statement for the Re-experiencing pathology.For a normal distribution fit to that data, the probability of a scale value less than 2 is 0.510, as contrasted with 0.611 for the log-normal distribution fit to the same data.The increase in probability is due to clumping of observations at lower scale values.
The hypothesis that system true and error components are best considered to be multiplicatively combined is tested by the multivariate CFI presented in Appendix C. A lower CFI for the log-normal distribution is confirmatory evidence for the multiplicative assertion.A lower CFI for the normal distribution suggests that Part II multivariate symptom data fail to support a multiplicative measurement model.

ACTA PSYCHOPATHOLOGICA ISSN 2469-6676
Modeling in symptoms research.For symptoms researchers, a legitimate question is: Why model?Conventional practice tends to treat individual symptom statements as bona fide symptom entities.What is often overlooked is that symptoms as reported unpleasant feelings occur within the broader context of individual subjective experience [33].Symptoms are communicated by self-reports of those experiencing them.As such, they are subject to the vagaries of social communication [35].Meaning assigned to physical sensations may well vary by demographic as well as individual factors.Variation introduced by systematic demographic characteristics can be controlled by population sampling.Individual factors, however, encompass non-systematic variability due to temporal situational states of mind [1].Non-systematic effects are considered to occur randomly and are collectively referred to as measurement error.
If individual symptom statements are to have clinical utility, they must measure the systematic effects of causal pathologies.These systematic effects are referred to as true scores in classic measurement theory.True score contribution varies across symptom statements as reflected by differential statement reliability.It is quite possible for individual symptom statements to measure mainly random error, contradicting the prevalent supposition that statements are error free.
True scores as a measure of systemic effects vary in degree of uniqueness.At one extreme, each symptom statement has a unique true score indicating that each statement measures a different pathological entity.In this case, there are as many pathologies as there are symptom statements.The counter condition is that all symptom statements in the modeling domain measure a common latent pathology, a condition herein referred to as unidimensionality.Observable symptom statements are herein defined as containing both true and error measurement effects multiplicatively combined.
The classic means of dealing with error-prone measurement is summation over a large number of observed score replications [36].The supporting rationale is that measurement error, being random, can be expected to be self-canceling over a large number of measurement instances.Whereas the logic with respect to error has stood the test of time, the meaning of a sum of error-free true symptom components is not so clear-cut.If each true variable is unique to a single symptom statement, the sum over p statements is a mishmash of unrelated pathological effects defying a meaningful interpretation.It is only when each symptom statement assigns an identical standardized true score to an individual subject, i.e., a defining property of unidimensionality, that meaning emerges.
The unidimensionality of a set of p symptom statements is empirically verifiable by running a single-factor confirmatory factor analysis (CFA) on the p x p correlation matrix.If the hypothesis of unidimensionality is sustained, symptoms researchers are faced with attaching meaning to the common variable.Standardized true score is a possibility but emphasizes mainly the error-free nature.Factor is a tempting choice but focuses more on dimensionality than function.Syndrome as a collection of signs and symptoms characteristic of a known pathological condition [37] appears a better option but shifts attention to the collectivity and away from the function served.
The convention endorsed in this paper is to emphasize the core measurement function-mapping of the amount of quantified symptom unpleasantness due to causal pathologies to the real number line.For a more detailed discussion, please refer to [2, Tenet 1] which differs in application only to the extent that mental traits as opposed to mental states are addressed.
Unreliability of individual symptom statements attenuates the measurement of felt unpleasantness due to pathological conditions.The remedy is to use the p observed symptom statements as independent variables in a multiple regression model to predict pathology scores.The square of the multiple correlation coefficient is referred to as R Max and is the maximum correlation that can be obtained by weighting individual statements.Under the supposition that all p statements have some reliability, R Max will exceed the reliability of any symptom statement considered in isolation.This gain epitomizes the system adage that the whole is greater than the sum of its parts and constitutes the major rationale for symptoms modeling.

Summary Remarks
A case for a two-part symptoms modeling procedure as a means of dealing with the zero-problem inherent in community sampling has been made.The procedure assumes the pre-existence of a pool of p statements descriptive of adverse feelings considered to be caused by a target pathology.Statements may be drawn from existing scales or constructed to reflect contemporary research results, theories, or clinical intuition.Statements are self-rated on a five-point scale, preferably with a severity, duration, or frequency metric common to all statements.Whatever the metric, the leftmost benchmark is generally indicative of minimal unpleasantness.
Central to the zero-problem is the not-so-remote possibility that a portion of a community sample may be asymptomatic, thereby constituting a zero class.As there is generally no scale provision for symptom absence, it is reasonable to suspect that zero-class members will likely gravitate to the left-most response category thereby creating a skewed distribution.
The measurement model is judged to be unidimensional by virtue of running a single-factor, uncorrelated error CFA.
Furthermore, suppose that a standardized Bartlett factor scores is computed according to the formula where Z is a 1 x p vector of standardized items, Θ is a p x p diagonal matrix of item error variances, and Λ is a p x 1 vector of standardized item loadings.

Unbiased true score estimator
Bartlett factor scores are unbiased, which is equivalent to claiming that where BB f f k  is a Bartlett factor score conditional on assuming the value k.To prove this assertion requires that the expected value of a conditional Bartlett factor score be expressed as Substituting into the above formulation gives

Standardized variance adjustment
A procedure for adjusting the variance of a standardized Bartlett factor score by multiplying the variance estimate by the estimated R Max for the scale is offered.A proof that this variance adjustment results in an unbiased estimate remains to be developed.To do so, requires beginning with the expected value formulation for variance of a variable X as Var(X) = E(X 2 ) -E(X) 2 .Therefore, the variance of a standardized Bartlett factor variable is Var( Carrying out the squaring in the numerator and taking expected values gives us which equals 22 1 ii   as ( ) 0 i Ef   by assumption.Again by assumption, () , where ρ ij is the correlation between standardized item Z i and Z j .Substituting back into the variance formulation and multiplying, gives Distributing the summation and rearranging terms allows simplification as This formulation can be further simplified as From this, we can deduce that Var(f B ) = 1 only when R Max = 1.As R Max decreases, the variance of f B increases accordingly to reflect the contribution of error variance in the items.
Multiplication of Var(f B ) by R Max yields unity, which is the unbiased estimate as claimed.

Covariance estimation
Suppose that Bartlett factor scores are estimated separately for two measurement models, one with p standardized items and the other with q standardized items.Each measurement model has been judged to be unidimensional by running a separate single-factor, uncorrelated error CFA.The goal is to determine the covariance between Bartlett factor variables f B1 and f B2 for the two measurement models.Covariance is defined as Cov(f B1 f B2 ) = E(f B1 f B2 ) -E(f B1 )E(f B2 ).From the above work, it is known that E(f B1 ) = E(f B2 ) = 0 when observed scores are standardized.As before, let Z i1 be the z-score variable for the i th item in the first measurement model and Z j2 the z-score variable for the j th item in the second measurement model.The covariance between Bartlett factor variables can now be expressed as , where λ i1 is the standardized factor loading for the i th item in the first measurement model and λ j2 is the standardized factor loading for the j th item in the second measurement model.This formulation can be further simplified as From this, we can conclude that the covariance of Bartlett measurement model scores computed using standardized item scores is an unbiased estimate of the true covariance between measurement model common factors.

,
with a neo-classical latent measurement model [2, Tenet 3] .The pathology true component and the symptom true component are linked by the exponential function

,
[14].Applied to symptoms measurement, experimental subjects select the integer value corresponding to the interval of ln(Y) containing their continuous ln(y) scores.The representation of a continuous symptom score ln(y) by an integer symptom scale representation ^(ln( )) y y results in loss of information defined as and referred to as quantization distortion or noise[14].The rationale for the noise designation can be illustrated by rewriting the above equation as ^(ln( )) ln( ) (ln( ))

µ
and s are the mean and standard deviation, respectively, of the non-transformed original sample data[17].

y
is the standardized log transformed observed score for the j th individual on the i th symptom statement and ^i λ and var( ) i ε are parameter estimates for the i th symptom statement obtained from performing an acceptable fitting CFA on the p x p correlation matrix of transformed Part II symptom data.In factor analytic terminology, this formulation is known as a Bartlett factor score[23].

Figure 1a
Figure 1a Histogram of a normal and a log-normal distribution for re-experiencing Item #1.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 1b
Figure 1b Histogram of a normal and a log-normal distribution for re-experiencing Item #2.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 1c
Figure 1c Histogram of a normal and a log-normal distribution for re-experiencing Item #3.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 1d
Figure 1d Histogram of a normal and a log-normal distribution for re-experiencing Item #4.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 1e
Figure 1e Histogram of a normal and a log-normal distribution for re-experiencing Item #5.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2a
Figure 2a Histogram of a normal and a log-normal distribution for withdrawal Item #1.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2b
Figure 2b Histogram of a normal and a log-normal distribution for withdrawal Item #2.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2c
Figure 2c Histogram of a normal and a log-normal distribution for withdrawal Item #3.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2d
Figure 2d Histogram of a normal and a log-normal distribution for withdrawal Item #4.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2e
Figure 2e Histogram of a normal and a log-normal distribution for withdrawal Item #5.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 2f
Figure 2f Histogram of a normal and a log-normal distribution for withdrawal Item #6.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 3a
Figure 3a Histogram of a normal and a log-normal distribution for arousal Item #1.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 3c
Figure 3c Histogram of a normal and a log-normal distribution for arousal Item #3.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 3d
Figure 3d Histogram of a normal and a log-normal distribution for arousal Item #4.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 3e
Figure 3e Histogram of a normal and a log-normal distribution for arousal Item #5.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 4a
Figure 4a Histogram of a normal and a log-normal distribution for self-persecution Item #1.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 4b
Figure 4b Histogram of a normal and a log-normal distribution for self-persecution Item #2.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 4c
Figure 4c Histogram of a normal and a log-normal distribution for self-persecution Item #3.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

Figure 4d
Figure 4d Histogram of a normal and a log-normal distribution for self-persecution Item #4.The lower box shows mean (Mu) and standard deviation (Sigma) for the raw data and normal and zeta and sigma for the transformed data, respectively.Summary data for the non-transformed data are shown in the upper side box.

λ
is the estimated standardized factor loading for the i th manifest symptom statement.Estimated standardized beta coefficients for the five symptom statements for Re-experiencing are B R = {0.3660.624 0.149 0.138 0.120}; for the six Withdrawal statements B W = {0.2910.153 0.393 0.431 0.237 0.274}; and B A = {0.2840.706 0.121 0.350 0.182} for the five Arousal statements.Marker variables have the highest beta weights for each of the three PTSD components.

Table 1
are univariate statistics and hence fail to take intra-pathology symptom correlation into account.A multivariate comparative index was computed based on the fact that the Mahalanobis d-squared statistic for a multi-normal distribution is chi-squared distributed[28].The squared distance between estimated and observed vectors of 1%, 5%, 10%, 25%, 50%, 75%, 90% and 95% quantiles was computed for the original and log transformed symptomatic sub-sample data.Comparative results show 2.797 for the normal vs. 1.879 for the log transformation for Re-experiencing; 4.608 for the normal vs. 0.833 for the log transformation for Withdrawal; 2.252 for the normal vs. 0.861 for the log transformation for Arousal; and 6.256 for the normal vs. 0.954 for the log To facilitate visual comparison, the computerized procedure fits both a normal and a log-normal distribution to the symptomatic sample histogram for each of the p symptom statements.A univariate and a multivariate comparative fit statistic measuring the agreement between observed and predicted quantiles are computed from procedural output.Acceptance of the multiplicative hypothesis is conditional on the log-normal being the better fit.If the multiplicative hypothesis is sustained, sample data are logarithmically transformed prior to Part II analysis.Part II analyses consist of testing the unidimensionality hypothesis by running and reporting CFA results.Prior to analysis, the p x p correlation matrix is corrected for attenuation due to scale coarseness.Scalar quantization is offered as the means of analog-to-digital conversion.Standard fit indices and heuristics for their interpretation are provided to aid users in ascertaining CFA model fit.The R Max coefficient is computed.Users should be reminded that its use is conditional on the unidimensionality hypothesis being sustained.Standardized Bartlett pathology scores are computed, with users having the option of storing for future use.Bartlett scores have the advantage of incorporating the collective measurement contribution of p manifest symptom statements into a single pathology score.Bartlett scores can be used as if they were empirically obtained, with the singular exception that their sample variance is inflated due to the presence of measurement error in the component manifest statements.