Performance of prematurely born children in eight domains: a longitudinal study
Kevin C. H. Parker Kingston General Hospital Longitudinal data were collected on 89 prematurely born children from 3 months to 3 years of age. The children were tested from one to ten times (mean = 3.5) using the DISC (Amdur, Mainland, and Parker, 1984) as part of a clinical screening project in SW Ontario. A mathematical model was developed for each DISC scale using the test's original normative data, which had been collected in the same geographic area. The models describe the relationship between age and test performance by using an adaptation of item response theory. A number of mathematical models of the longitudinal development of 32 of the premature children will be presented. The most powerful model is one which uses the deviation of the premature children's scores from the normative predicted values as the dependent variable. The predictive power of a number of independent variables was measured: prematurity, gender, PGAR scores, birth weight, and age. Only gestational age was retained after deletion of non significant predictors. The differential responsiveness of the eight domains measured by the DISC was also evaluated, but there was no significant difference among the scales. The problem of correction for prematurity in the DISC and in more general normative testing is briefly addressed. Children who are born before full term are at increased risk for a variety of developmental, neurological, and health 'related problems. Vision, hearing, language skills, motor skills, and cognitive skills have all been shown to be impaired in some prematurely born children. While the problems of prematurity are ancient, the problem of measuring and predicting such problems 14th precision ix more recent. One fundamental measurement question asks if the performance of a premature child can be predicted without bias simply by measuring age from due date rather than front birth date? A second question is whether or not premature children catch up with full tern, peers, because of their common environmental experience or remain delayed because of their premature birth. In one approach to this problem, Palisano (1986) distinguished between delay associated with nervous system damage and delay associated ith biological immaturity. He focused on premature children with no apparent nervous system problems. When motor development was evaluated at 12, 15, and 18 months, lowrisk prematurely born children maintained a lag in motor development that could be accounted for by rr, ensuring ages from due dates rather than birth dates. On the other hand, Miller, Dubowitz, & Palmer (1984) focused on the impact of correction for prematurity on children at high risk. They found that the prematurity corrected measures tended to overestimate the performance of premature children, particularly at the youngest measurement age (6 months) but also at 9, 12, and 18 months. The net effect of correction has to mask the problems of those children ultimately found to have cerebral palsy, dystonia., and motor delay based on a oneyear old paediatric assessment. These two papers share a model of longitudinal assessment of children on a number of domains, and an attempt to develop a model of development as a function of age, prematurity and other predictive factors. The present study uses a data set derived from a group of 89 prematurelyborn children tested from one to ten times using the DISC (D'agnostic 'Inventory for Screening Children). The DISC is a normbased, individually administered developmental screening test that assesses in eight domains: Fine Motor, Receptive Language, Expressive Language, Gross Motor, uditory Attention and Memory, Visu.l ttention and Memory, Self Help Skills, and Social Skills. Method Subjects. Data for this study came from two sources. Data were taken from the sample originally used to produce norms for the DISC (Amdur, Mainland, & Parker, 1984). These data are described in detail in the DISC manual. They were derived from a census matching stratified sample of 570 children ranging from birth to five years. Each child was given the full DISC by a psychometrist trained according to the criteria defined in the DISC manual. The second set of data used for this study were collected as part of a governmentfunded clinical intervention programme for highrisk infants in the KitchenerWaterloo region of Ontario. Children were referred from three local hospitals to a programme run by the Public Health Unit. Reasons for referral included low birth weight, low APGR scores, respiratory distress syndrome, prematurity and low blood sugar. Demographic data on families are sparse, but the majority of families had intact marriages and were middle class. The data set was obtained courtesy of the Waterloo Region Public Health Unit. The data were taken from the files and forwarded to us in a form that maintained the longitudinal continuity of a child's, testing, but did not identify the child. The programme called for Public Health Nurses to visit the children's homes and test the children using the standard DISC procedures at designated ages of 3, 6, 9, 15, 24, 36, and 48 months. Hot4ever there was substantial variation in actual test ages when the programme was implemented. Nurses provided support to the family, counselling to the family on developmental issues, referral to community agencies, and liaison services between the family and the infant's physician. These data were then retained on clinical case files. Eightynine children were enrolled in the programme and tested at least once. However, the nature of the longitudinal analysis used with the data set led to the exclusion of more than half of the subjectss by the time children were eliminated due to missing data or testing. The residual sample consisted of 32 children who had been tested at 57 months, 810 months, and 1417 months and for whom age, DISC, and gestational age data were complete. Weight, gestational age and APGAR characteristics of the full and reduced sample are listed in Table 1. Simple ttests were used to compare the retained subjects to those eliminated. Although gestational age and 5 minute APGAR were not significantly different, he retained subjects had significantly higher birth weights ( ='.3, ='86,, B. < .001), indicating a bias against low birth weight. Procedures. The analysis of the data proceeded in two stages. In the first stage, the normative data set from the development of the DISC was used to produce a mathematical model of the expected value for each of the eight scores as a function of age in months. In the second stage, the performance of the prematurely born children on the DISC was transformed to the difference between their actual performance and the expected value and a series of analyses was done on this residual score. A positive residual indicates performance above expectation. Results Stage one: Normative expected values. The DISC manual provides the user.with normative tables broken down in three month blocks for younger children and six months blocks for older children. Given access to the DISC normative data, it seemed a good idea to develop a mathematical model of the expected value of each scale as a more precise function of age. (number of models were tried involving polynomial regression and various nonlinear techniques, but none was particularly satisfying. We decided to go back to first principles and exploit some techniques of item analysis instead. A model of each of the 216 items on the DISC was prepared relating probability of passing tine item to age in a logistic regression equation mimicking item response theory models of latent ability. The logistic equation describes the probability of passing as a function of the difficult>, of the idea (defined as the age at which'50 of children are likely to pass the item) and the discrimination of the item (roughly the slope of the curve plotting probability of passing against age). The 2 or 3 easiest items on each scale showed too few failures to allow, computation of a good model. The sum across all 27 items in a scale of the probability of passing each item is the expected value for the scale. For "those item's that had been too easy to compute the logistic equation, the probability of passing was arbitrarily set to 1.0. Thus each scale had a predicted value derived from a 25, 26 or 27 term equation with age as the only unknown. The equation for Fine Motor scores is shown in Table 2. The initial value of 3 indicates that the three easiest items lack good equations. Each term is a model of a different Fine Motor item. The correlation between predicted and observed values was determined for each scale and ranged from .958 (Social Skills) to .984 (Fine Motor) with a median of .980. The sum of DI?.C scales correlated .992 with the sum of tine predicted scores, With a sample size of 570,. all correlations are obviously significantly different from zero. Figure 1 is a plot of' the sum over the 8 scales of the residual differences between observed and predicted and values. The curve plotted through the points is the result of a LOWESS smoothing as implemented in SYGRAPH (Wilkinson, 1990). Note the sharp departure from the residual = 0 line at about 5 months. This arises because it was not possible to estimate the logistic regressions equations for the 2 or 3 easiest items on each scale and we decided to treat these easiest items as always passed, regardless of age. The scores of children under about five months of age are therefore overestimated when this model is used and it must be considered invalid below age 5 months. Stage Two: Models of residual scores. In Stage Two, the data from the prematurely born children were examined in the context of the expected values generated from the normative data set used in Stage One. This was not a simple problem of analysis. The nature of the problem required three different kinds of stepwise progressions through tbs data relationships. One stepwise progression involved the identification of the most appropriate betweensubject independent variables. A second progression involved tile identification of the most appropriate withinsubject variables, and the third involved the identification of tile most appropriate scaling and correction procedures for the dependent variables. Because of the overestimation problem with the youngest children, only data from testings after four months of age were retained. Children had to have been tested at least three times at around six months (Testing 1), 9 months (Testing 12) and 15 months (Testing 3) Any missing data disqualified the subject. This left 32 out of the original 89 subjects. In Table 3, all eight DISC scales are used as dependent variables for three separate MANOVAs. The first column reports the results of a MANOVA on the raw DISC scores, the second reports results from data corrected by subtracting the expected value for the child's age, based on the equations from the first part of the study. The third column reports MANOVA results from data here DISC scores were corrected for age which had itself been corrected for gestational age. Note that there are no main effects or interactions involving the differences among the eight scales. The main effect for gestational age which appears in the raw scores disappears with correction for age and appears to diminish further with correction for gestational age. Sex and weight are not significant in any effect. APGAR is significant in interaction with Testing (i.e. first second or third testing) only. The main effect for testings is significant. Table 4 shows the same array of MANOVA's following deletion of the between subjects independent variables which showed no main effects {i.e. Sex, eight and PG8). The main effect of gestational age is much mere evident in these analyses  reflecting the substantial multicollinearity of gestational age with the other variables. The differences among scales remain insignificant. Testings are significantly different, and so is the Testing by gestational age interaction. Table 5 shows the same array of independent variables as Table 4, in a repeated measures NOV following the summing of scores across the eight DISC scores to produce a single aggregate value for each testing and each subject. Note that the results of the MNOV are almost identical to these with the scales retained as separate scores. This set of analyses was chosen as the reference set of analyses for discussion of the results. Note that the main effect of gestational age is unaffected by correction for age, but removed completely by correction for gestational age. Note also that the effects of Testing." and the interaction of Testing by gestational age are retained through both corrections, and enhanced more than masked by the corrections. The results reported in Table 5 are presented graphically in Figures 2, 3 and 4 Figure 2 is a box plot of the residuals at each testing (i.e. the main effect of Testing). A positive residual indicates that the observed score is higher than the predicted score, indicating performance above expectation. The figure shows above par performance for testings 1 and 2 and at par performance (i.e. residual near 0.0) for testing 3. Figure 3 shows age corrected residuals plotted against gestational age with a linear smoothing for each of the first second and third testings. The steepest line shows the results for the first testing, with lowest levels of performance for the most premature children. The second testing sho4s results close to the zero residual, with a lope near zero, indicating quite close estimation. The third testing shows the furthest below par performance. Most points involve negative residuals, indicating that the premature children are performing below the level of their age peers. Figure 4 shows the same results as Figure 3 but with further correction for gestational age. Tine effect of this correction is to shift the origin of the Residual axis and rotate the plotted points clockwise. Although tine origin is shifted, the scale is the same. Most points now show positive residuals, indicating that the children are doing better than expected under this correction. Testing 1 shows a small residual increasing with gestational age, tearing 2 shows a large residual decreasing somewhat with gestational age, and Testing 3 shows estimation very close to observed values regardless of gestational age. Discussion In Part One of the study we developed a set of functions allowing prediction of the expected value of the DISC scals as a continuous function of age. Tine expected values matched observed values very well above the age of four months. In Part Two of the study, a progession of MANOVAs led to a simplification of the analysis to its most basic: terms. A MANOVA using all 8 scales as dependent measures failed to show, a significant impact of between scale differences, and any' predictive benefit derived from inclusion of independent variables other than gestational age of the children. The third set of repeated measures AMOVAs (in Table 5) is the reference set for this discussion. The correction for age does not remove the main effect for gestational age, a main effect for Testing or an interaction of testing with gestational age. Then gestational age is used to correct chronological age, which is then used to correct the raw scores, the main effect of gestational age disappears, but the main effect of testing and the interaction of testing with gestational age remain significant. The premature children in this study tended to perform at levels lower than expected of full term age peers. This effect increases as a function of prematurity. Correction for gestational age eliminates the simple relationship with gestational age, but leaves an interaction between gestational age and testing. It also produces a net overcorrection (i.e. a positive residual). In general, our results are more consistent with those of Miller et al (1984) (who found that prematurity corrected measures tend to overestimate the performance of high risk premature children) than they are of Palisano (1986) (who studied low risk children). For the children in our sample, correction for prematurity resulted in performances above expectation for most children at the first two testings. Hog, ever, by the third testing (about age 15 months) the correction for gestational age led to very small residuals. These results can hardly be considered definitive. The MANOVA sample of 32 was significantly biased against (at least) low birth weight in the context of the 89 subjects for whom there are data available. The sample of 32 is small and leaves the power low enough that failure to find between scale differences indicates very little. If we look at the study as a pilot study rather than a final study, a number of' recommendations can be found. Tine sample size ought to be higher (in the order of 100). The interaction between gestational age and age at testing is important to assess. Sampling bias and risk factors ought to be assessed in the development of the sample. Separation of low risk from high risk children would assist the development of expected values for prematurely born children as a function of age and gestational age. The next stage in this research would be to expand the database in size and to include data about the neurological risk factors. References Amdur, 3. R., Mainland, M K., & Parker, K. C. H. 1984. Diagnostic
Inventory for Screening Children (DISC)KitchenerWaterloo Hospital; Kitchener,
Ontario.
Table 1.
Note: Table 2. Prediction score on FM scale = 3+ Table 3.
Note: * P. < .05, ** P. .01 Table 4.
Note: * P < .5, ** P. .01 Table 5. ANOVA Source tables: Using sums of the eight scales as the dependent variable, gestational age as the independent variable.
Note: * P < .05, ** P. .01, P. < .001
