DATA, its collection, and related Issues

Data can be collected in a variety of ways, in different settings, and from different sources. Data Collection methods include face-to-face interviews, telephone interviews, computer-assisted interviews; questionnaires that are either personally administered, send through the mail, or electronically administered; observation of individuals and events with or without videotaping or audio recording; and a variety of other motivational techniques such as projective tests.

Data sources can be primary and/or secondary. Individuals, focus groups, and a panel of respondents specifically set up by the researcher whose opinions may be sought on specific issues from time to time are examples of primary data sources. Data can also be obtained from secondary sources, as for example, company records or archives, government publications, industry analysis offered by the media, and so on.

**Face-to-face interviews** provide rich data, offer the opportunity
to establish rapport with the interviewees, and help to explore and understand
complex issues. many ideas that are ordinarily difficult to articulate
can also be surfaced and discussed during such interviews. On the negative
side, face-to-face interviews have the potential for introducing interviewer
bias and can be expensive if a big sample of subjects is to be personally
interviewed.

**Telephone interviews** help to contact subjects dispersed over
various geographic regions and obtain responses from them immediately on
contact. This is an efficient way of collecting data when one has specific
questions to ask, needs the responses quickly, and has the sample spread
over a wide geographic area. On the negative side, the interviewer cannot
observe the nonverbal responses of the respondents, and the interviewee
can block a call.

**Personally administering questionnaires to groups of individuals**
helps to establish rapport with the respondents while introducing the survey,
provide clarifications sought by the respondents on the spot, and collect
the questionnaires immediately after they are completed. On the negative
side, administering questionnaires personally is expensive, especially
if the sample is geographically dispersed.

**Mailed questionnaires** are disadvantageous when responses to many
questions have to be obtained from a sample that is geographically dispersed
and when conducting telephone interviews to obtain the same data is difficult,
more expensive, or not feasible. On the negative side, mailed questionnaires
usually have a low response rate and one cannot be sure if the data obtained
are biased because the nonrespondents may be different from the those who
did respond.

**Observational studies** help to comprehend complex issues through
direct observation (either as a participant- or a nonparticipant-observer)
and then, if possible, asking questions to seek clarifications on certain
issues. On the negative side, they are expensive since long periods of
observation are required, and observer bias may well be present in the
data.

It is important to ask questions in a way that would ensure the least bias in the response. For example, "Tell me how you experience your job" is a better question than saying, "Boy, the work you do must be really boring; let me hear how you experience it." the later question is "loaded" in terms of the interviewer's own perceptions of the job. A loaded question might influence the types of answers the responded gives. Bias could be also introduced by emphasizing certain words, by tone and voice inflections, and through inappropriate suggestions.

because almost all data-collection methods have some biases associated with them, collecting data through multimethods and from multisources lends rigor to research. For instance, if the responses collected through interviews, questionnaires, and observations are strongly correlated with each other, then we will have more confidence about the goodness of the data that are being collected. If there are discrepancies in how the respondent answers the same questions when interviewed, as opposed to how he/she answers the question in a questionnaire, then we would be inclined to discard the data and being biased.

Likewise, if data obtained from several sources are highly similar, we would have more faith in the goodness of the data. For example, if an employee rates his performance as four on a 5-point scale, and his supervisor rates him the same way, we may be inclined to think that he is perhaps a better than average worker. In contrast, if he gives himself a five on the 5-point scale and his supervisor gives him a rating of two, then we will not know to what extent there is a bias and from which source. Therefore, high correlation among data obtained regarding the same variable from different sources lend more credibility to the research instruments and to the data obtained through these instruments. Good researchers try to obtain data from multiple sources and through multiple data-collection methods. Such research, though, would be more costly and time consuming.

The sample size is relative important. It first of all related to the entire population size. Usually the closer to the population size the sample is, the better. On the other hand, the randomness of the sample's elements is also a very important factor, in particular in the case of large populations. Also very important is the selection of the sample, i.e., the bias factor of its elements selection.

In our case, if the population size is say 6,000, a sample of 5,000 elements, even if biasedly chosen is probably a good unit of analysis as a 500 elements size would between though unbiasedly selected. If the population size is 1,000,000, than it will make little difference a sample of 5,000 or 500 elements. The randomness factor will be a key in this later case.

Other issue related to sampling is *precision, confidence* and
*efficiency*. Therefore the factors affecting decisions on the sample
size are: (1) the extent of precision desired (the confidence interval);
(2) the amount of risk allowable in predicting that level of precision
(confidence level); (3) the amount of variability in the population itself;
(4) the cost and time constraints; and, in some cases, (5) the size of
the population itself. As a rule of thumb, sample sizes between 30 and
500 could be effective depending on the type of research questions investigated.

First of all, there are couple of steps that need to be taken when we
collect data on the effects of treatment in experimental designs: *editing
the data*, i.e., the process of going over the data and ensuring that
they are complete and acceptable for data analysis; *handling blank responses*;
*coding*, i.e., converting non-numerical data into numerical format
that will help in the statistics applied to the model; *data analysis
and interpretation*, i.e., the explanation of the results derived from
the data collection step. During this later step basic statistics such
as *descriptive statistics *(frequency, histograms, barcharts, etc.)
and *measures of central tendencies and dispersion *(mean, median,
mode, variance, etc.) are used. These elements form the foundation of the
*Inferential Statistics*, such as *Pearson Correlation*, which
would be the most appropriate statistical test, when we collect data on
the effects of treatment in experimental designs. This type of statistical
test will allow for *Hypothesis Testing* using specialized tests such
as **t-Test** (a statistical test that establishes a significant difference
in a variable between two groups), **ANOVA **(Analysis of Variance,
which tests for significant mean differences in variables among multiple
groups), **Chi-Square Test **(a non-parametric test that establishes
the independence or otherwise between two nominal variables), **Multiple
Regression Analysis** (a statistic technique to predict the variance
in the dependent variable by regressing the independent variables against
the dependent variable).

After the data have been collected from a representative sample of the population, the next step is to analyze the data so that the research hypotheses can be tested. before we can do this, however, some preliminary steps need to be completed. These steps help to prepare the data for analysis, ensure that the data obtained are reasonably good and allow the results to be meaningfully interpreted. These four steps are: (1) getting data ready for analysis, (2) getting a feel for the data, (3) testing the goodness of data, (4) testing the hypotheses.

Some respondent biases could affect the goodness of the data, and the researcher may have no control over them. The validity and the replicability of the study could thus be impaired. Currently, technology is available where corrections in responses can be automatically taken care by the computer by Data Collection Methods. However, until such time as the collection of data through computers and automatic editing of data become more feasible, the data collected will have to be carefully edited manually.

Laos, sometimes, where items have a "do not know" response, they can be treated as a missing value and ignored in the analysis. If many of the respondents have answered "do not know" to a particular item or items, however, it might be worth further investigation to find out whether the question was not clear or something else is happening in the organization that might need further probing.

**Multiple Regression Analysis** is a statistic technique to predict
the variance in the dependent variable by regressing the independent variables
against the dependent variable. Whereas the correlation coefficient indicates
the strength of relationship between two variables, it gives us no idea
of how much of the variance in the dependent variable will be explained
when several independent variables are theorized to *simultaneously *influence
it. For example, when the variance in a dependent variable X (Say performance)
is expected to be explained by four independent variables, A, B, C, and
D (say, pay, difficulty, supervisory support, and organizational culture),
it should be noted that not only are the four independent variables correlated
to the dependent variables in varying degrees, but they might also be intercorrelated
(i.e., among themselves). For example, task difficulty is likely to be
related to supervisory support, pay might be correlated to task difficulty,
and all three might influence the organizational culture. When these variables
are jointly regressed against the dependent variable in an effort to explain
the variance in A, the individual correlation get collapsed into what is
called a **multiple r** or multiple correlation. The square of multiple
r, R-square or R2 as it is commonly known, is the
amount of variance explained in the dependent variable by the predictors.
Such analysis, where more than one predictor are jointly regressed against
the criterion variable, is known as **multiple regression** analysis.

In sum, multiple regression analysis helps us to understand how much of the variance in the independent variable is explained by a set of predictors. If we want to know which, among the set of predictors, is the most important in explaining the variance, which is the next most important, and so on, a stepwise multiple regression analysis can be done. If we want to know whether a set of job-related variables (e.g., job challenge, job variety, and job stress) would significantly add to the variance explained in the dependent variable (say, job satisfaction) by a set of organizational factors (e.g., participation in decision making, communication, supervisory relationship), a hierarchical regression analysis can be done.

Generalizability, or over-generalization, refers to the scope of applicability of the research findings in one organizational setting to other setting. Obviously, the wider the range of applicability of the solutions generated by research, the more useful the research is to the users of such research knowledge. For instance, if a researcher’s findings that participation in decision making enhances organizational commitment, is found to be true in a variety of manufacturing, industrial, and service organizations, and not merely in the one organization studied by the researcher, then the generalizability of the findings to other organizational settings is widened. The more generalizable the research, the greater its usefulness and value.

Of course not many research findings can be generalized to all other settings, situations, or organizations. For wider generality, the research sampling design has to be logically developed, and a number of other meticulous details in the data-collection methods need to be followed. However, a more elaborate sampling design, though it would increase the generalizability of the results, would also increase research costs. Most applied research is generally confined to research within the particular area where the problem arises, and the results, at best are generalizable only to other identical situations and settings. Though such limited applicability does not decrease its scientific value (if the research is properly conducted), its generalizability gets restricted.