Relative validation of a food frequency questionnaire to estimate food intake in an adult population

ABSTRACT Background: Scientifically valid descriptions of dietary intake at population level are crucial for investigating diet effects on health and disease. Food frequency questionnaires (FFQs) are the most common dietary tools used in large epidemiological studies. Objective: To examine the relative validity of a newly developed FFQ to be used as dietary assessment tool in epidemiological studies. Design: Validity was evaluated by comparing the FFQ and a 4-day weighed food record (4-d FR) at nutrient and food group levels, Spearman’s correlations, Bland–Altman analysis and Wilcoxon rank sum tests were used. Fifty-six participants completed a paper format FFQ and a 4-d FR within 4 weeks. Results: Corrected correlations between the two instruments ranged from 0.27 (carbohydrates) to 0.55 (protein), and at food group level from 0.09 (soup) to 0.92 (alcohol). Nine out of 25 food groups showed correlations > 0.5, indicating moderate validity. More than half the food groups were overestimated in the FFQ, especially vegetables (82.8%) and fruits (56.3%). Water, tea and coffee were underestimated (–14.0%). Conclusions: The FFQ showed moderate relative validity for protein and the food groups fruits, egg, meat, sausage, nuts, salty snacks and beverages. This study supports the use of the FFQ as an acceptable tool for assessing nutrition as a health determinant in large epidemiological studies.


Introduction
Dietary intake and, therefore, questions on dietary assessment for nutritional epidemiology play an important role in the worldwide discussion on chronic disease and general public health issues [1][2][3][4][5]. Among environmental and life-style determinants, nutritional behaviour represents a major target for the prevention of several non-communicable diseases, such as cancer, cardiovascular diseases, diabetes, chronic obstructive pulmonary disease and other chronic diseases [6][7][8][9][10][11]. A number of methods have been used to assess usual dietary intake at the population level [12]. However, the accuracy and reliability of measuring diet still presents an ongoing challenge [12][13][14]. Although weighed food records and 24-hour recalls have been widely used, their substantial burden on respondents and their economic constraints make them inapplicable for most large epidemiological studies. Food frequency questionnaires (FFQs) are relatively inexpensive, put less burden on the respondents, and do not require trained interviewers [15,16]. Therefore, they represent the most commonly used tools in epidemiological studies [17]. However, due to lower accuracy, the information collected by FFQs needs to be compared with information collected by a more accurate dietary assessment method. This will be a measure of the relative validity of the FFQ in comparison with the reference method, i.e. to which degree the method captures what it is designed to measure [18]. Several approaches for the validation of FFQs exist. Because of their dissimilar error structures, weighed food records represent the gold standard as a reference method in FFQ validation studies [19].
A comparable online FFQ has been validated with a 4-day weighed food record (4-d FR) among adolescents, focusing on both the energy and macronutrient intake and validation at the food group level [20]. The results of this validation study showed good agreement for the energy and macronutrient intake except for protein, and a good agreement for frequently consumed foods at the food group level.
In the present study, we validated a FFQ in paper format to assess the dietary intake of adults versus a 4d FR. In addition to the energy and macronutrients intake (carbohydrates, protein, fat and fibre), the food group intake was also examined.
The FFQ was designed to be implemented in a randomized Swiss population, the Swiss Cohort Study on Air Pollution and Lung and Heart Diseases in Adults (SAPALDIA). This population is diverse and consists of German-, French-and Italian-speaking participants, all representing different eating cultures. In order to depict eating patterns with one instrument (in all three national Swiss languages), we need a robust tool, which will be able to compile data in a valid and reproducible manner. In order to validate the tool, we chose an environment to mimic similar challenging circumstances to establish proof of the robustness and usability of the instrument. We therefore chose a German speaking, randomized sample which included all age groups representing the target population of the SAPALDIA cohort.

Study population and design
In October 2012, study participants were recruited through advertisements and via email, telephone and word of mouth in the area of Jena, Germany. Sixty subjects were enrolled in the validation study, taking place between November 2012 and January 2013. For inclusion in the study, subjects were required to be at least 18 years of age, without chronic diseases requiring medication and not pregnant or breastfeeding. Written informed consent was obtained from all subjects for participation in this validation study. Participants completed both a FFQ in paper format and a 4-d FR as the reference method, within a period of 4 weeks. Both methods will be described in detail below. The subjects participating in the study were not reimbursed apart from being allowed to keep the scales at the end of the assessment method. In addition, there was a raffle for eight vouchers each with a value of 25 euros.

Dietary assessment
The 4-d FR (reference method): At the beginning of the study, participants filled in a 4-d FR. The 4 days had to consist of both weekdays and weekend days. The study population was randomized into two groups with 30 subjects who filled in the 4-d FR continuously from Wednesday to Saturday and the other 30 subjects from Sunday to Wednesday. A paper template was handed out to each participant, consisting of 8 pages: 2 pages for each day. Each sheet was sub-divided into four columns in which the food and beverages consumed were recorded as: amount in grams or millilitres, specified food, type of meal (breakfast, lunch, dinner, snacks) and general comments. The participants were asked to weigh each food item or meal prior to its consumption and to record the leftovers. They were instructed to use the scales for each meal, including out-of-the home consumption, i.e. restaurants and canteens (cafeterias). The participants returned the completed 4-d FR within a period of 1-3 weeks.
Paper form FFQ (test method): Subsequently, the 127-itemed, semi-quantitative paper form FFQ was handed out and filled in self-administered. The FFQ covered the period of the previous 4 weeks, and thus covered the time of the weighed food record. The FFQ was designed at the ZHAW (Zurich University of Applied Sciences) to assess the habitual food intake of adults and collected consumption information for 127 food items (www.ernaehrungserhebung.ch). The 127 food items were selected according to the most typically consumed food products in Switzerland and, in addition, complemented the findings of the MONICA study, the CoLaus study and household budget data [21][22][23]. The portion size of each food item was defined according to the data described in the MONICA study, including a standard portion size of ± 30% for a small and a big portion size, respectively, as in the National Nutritional Survey II in the Federal Republic of Germany [24,25]. Subjects were asked to indicate, on average, the frequency, portion size and number of portions of each food item (out of 127) they consumed during the previous 4 weeks. The frequency was asked in nine categories ranging from 'never' to 'daily'. If a food item was eaten several times a day, participants were asked to take this into account indicating the number of portions. The participants indicated the portion size in the three categories 'small', 'pre-set' and 'big' (specified by pictures placed next to each food item to make the indication of portion sizes comparable among the participants). For each category, a metric amount in grams or decilitres/centilitres was assigned.
Additional information collected included preparation and cooking methods, use of specific types of oil, butter and margarine, and the take-out foods consumed. The FFQ also collected information on the frequency of use of dietary supplements. The FFQ was pretested on several user groups. In addition, a 'users data sheet' was handed out (together with the food record and the FFQ paper form) to collect demographic information (age, sex, height, weight, educational level, job position, residential area), as well as additional information on the current diet (e.g. weight reduction diet), physical activity, household size and smoking habits.

Statistical analysis
Data pre-processing Prior to data entry and food coding, the FFQ paper form and the 4-d FR were checked for completeness and possible errors. Two out of a total of 60 subjects did not return the questionnaires. Participants who completed fewer than 4 days of the 4-d FR were excluded, i.e. two out of the remaining 58 participants (completion rate = 3 days). After scanning the FFQ paper forms, each questionnaire was checked for completeness, missing values and structurally impossible answers (e.g. two boxes checked where only one should be checked). The following data management procedures included the sections on frequency, the number of portions and the portion size. If there were neither indications of frequency nor portion size nor number of portions, the frequency information 'never' was assigned to that food item. If at least one of frequency, portion size or number of portions was indicated, the following strategy was applied: if there were missing values of frequencies or number of portions, the mean value of the frequency or number of portions relating to that food item was entered. Missing values of portion sizes were corrected with an entry of a pre-set standard portion size. From a total of 58 questionnaires (58 × 127 × 3 = 22,098 possible entries), 43 (74%) FFQs showed missing information on the mentioned categories. However, in 32 of the 43 (74.4%) questionnaires there were fewer than five missing entries per questionnaire. The most frequent missing values were found in the number of portions (N = 93 over all questionnaires). Previous studies showed that there are in general fewer missing values in more frequently consumed foods [26].
To check for implausible energy intakes and to avoid a bias from wrongly reported food habits in the FFQ, the distribution of the total energy intake computed from the FFQ reports was examined. A cut-off was defined at the 75th percentile plus 1.5 times the interquartile range (3553.3 kcal) and the 25th percentile minus 1.5 times the interquartile range (190.9 kcal) [27]. This led to the exclusion of two FFQs with an over-reporting of energy intake (4250.3 kcal, 5414.3 kcal). The corresponding energy values in the 4-d FR for both excluded FFQs were well within the plausible range (2099.7 kcal, 2119.9 kcal).
Data post-processing Based on the similarity of type of food and nutrient composition, the 127-food items listed in the FFQ were grouped into 25 predefined food groups, see the first column in Table 3.
The categorization corresponded to a similar grouping already used in the National Nutritional Survey II in the Federal Republic of Germany [25]. The mean intake of each food item per day was calculated using frequency, portion size and number of portions: Frequency × [number of portions × 100] × portion size /28. In order to receive the nutrient intakes per day, the calculated food data were linked to the Swiss Food Composition Database (www.naehrwertdaten.ch) and, where necessary, completed using the German Nutrient Data Base (www.bls.nvs2.de). The 4-d FR data was entered in an online input mask that was designed at the ZHAW (www.ernaehrungserhebung. ch). Therefore, each food item from the 4-d FR was matched to the corresponding FFQ food item.

Statistical methods
Correlation between macronutrients and food groups of 4-d FR and FFQ were assessed with Spearman's rho, since some of the macronutrients showed clear deviations from normally distributed residuals to a linear model (assessed through the Shapiro-Wilk test, Kolmogorov-Smirnoff test and QQplot).
Descriptive statistics for energy, nutrients and food groups intake are presented as means, medians and interquartile ranges. To evaluate the agreement between the FFQ and 4-d FR, the mean difference and percentage difference were calculated as the mean of all individual differences between the FFQ and 4-d For the examination of relative validity, the Spearman's correlations were corrected for the day-today variation within-person using the de-attenuation method [19]. The corrected correlation, r c , was calculated using the following formula: where r o is the observed correlation, S 2 w /S 2 b is the ratio of the within-and between-person variances and n is the number of replicates per person for the given variable. Within-person variation and between person variation were calculated from replicated 4-d FR.
For visualization, Bland-Altman diagrams and Box-Whisker plot were drawn. The Wilcoxon rank-sum test was used to examine reporting behaviour between participants groups. The statistical analysis was calculated using R version 3.0.1, SAS version 9.4 (2012-2012 SAS Institute Inc., Cary NC, USA) and Microsoft® Excel 2007. P values less than 0.05 were considered significant, all tests were performed two sided.

Results
The characteristics of the 56 study participants are given in Table 1. The mean age was 40 years, ranging from 22 to 85 years and 60.7% were women. The mean height was 172.5 cm and the mean weight 72.3 kg. The mean body mass index was 24.2, ranging from 19.8 to 32.0.
The energy and macronutrient intake as reported in the FFQ was compared to that of the 4-d FR. Table 2 shows the means, medians and interquartile ranges for both instruments. Their mean and percentage differences are also given, as well as the correlations (Spearman's rho) between the two methods, including the variance ratio and the de-attenuated (corrected) correlation coefficients. The final analysis included 54 subjects. The mean differences between FFQ and 4-d FR for carbohydrates, fibre and protein intake were positive, and negative for energy and fat intake. The correlations of intake derived from FFQ versus 4-d FR ranged between 0.27 (for carbohydrates) and 0.55 (for protein). Except for carbohydrates, all correlations were statistically significant.
The ratio of within-and between-person variance calculated from the 4-d FR was between 0.64 and 1.79, and the de-attenuated (corrected) correlation coefficients were similar or slightly higher than the crude correlations (Table 2).
To examine the agreement in energy intake between the 4-d FR and FFQ, a Bland-Altman plot is presented in Figure 1. On average, the energy intake in the FFQ was slightly lower (50.2 kcal) than reported in the 4-d FR. A slight tendency for larger (absolute) differences between the instruments with increasing energy intake was observed for both men and women. Reporting behaviour between men and women did not differ (P = 0.90, Wilcoxon rank sum test), even though male participants reported higher energy intakes with both instruments (P < 0.0001, Wilcoxon rank sum test). Table 3 shows the comparison of the food group intake as reported in the FFQ and 4-d FR, overall and stratified by gender, sorted by the magnitude of Spearman's rho.
The corrected Spearman correlation coefficients ranged from 0.92 (alcohol) to 0.09 (soup). All correlations were significant except those for dessert, cheese, preparation fats and savoury spreads, composite foods, sauces, legumes and soups. Those food groups with a lower or non-significant correlation tended to include less frequently consumed foods, e.g. legumes and sauces. The correlations of 18 (72%) out of a total of 25 food groups were significant.
The mean difference between FFQ and 4-d FR varied among intakes, and there were almost as many foods that were underestimated (n = 12) as overestimated (n = 13) when compared with the reference method (Table 3). In general, frequently consumed foods such as bread, meat, fruits, vegetables, dairy products, cheese and sweet spreads were overestimated in the FFQ in comparison to the intakes assessed by the 4-d FR. No gender differences were observed for these food groups except for dairy products and dessert, which showed an underestimation in the FFQ for women compared to men (−0.6 g v. 20.4 g, −4.5 g v. 17.2 g). Vegetable and fruit intake were Expressed as a multiple of 24-hour basal metabolic rate [28] particularly overestimated by the FFQ by 138.1 g (82.8%) and 102.4 g (56.3%), respectively. Food intakes that were underestimated in the FFQ comprised beverages (water, tea and coffee, soft drinks with and without sugar, alcoholic beverages), soup, sauce, preparation fats and savoury spreads, salty snacks, meat alternatives, eggs and cereals and grains. The lowest degree of underestimation was observed for water, tea and coffee with −200.4 ml (−14.0%).
Regarding gender, differences were found only for meat alternatives and soft drinks without sugar. Women, on average underestimated their intake of soft drinks without sugar (−28.2 g v. 5.8 g in men), while men on average underestimated their consumption of meat alternatives in the FFQ (−3.2 g v. 1.1 g in women).
In addition, the relative deviations of FFQ and 4-d FR are shown for each food group (Figure 2). In order to obtain comparability among the food groups, differences between FFQ and 4-d FR were divided by the mean reported intake value of the corresponding food group in the 4-d FR. The ordering of the food groups on the x-axis is according to decreasing magnitudes of Spearman's rho (see Table 3).

Discussion
This study focused on assessing relative validity of a paper form FFQ with a 4-d FR. The validity was      ]} x 100 §Variance within subjects/Variance between subjects. The ratio is not given when the variance between subjects is zero.
The Spearman correlation coefficient was adjusted using the de-attenuation method [19] assessed both, at the macronutrient and food group levels. From the 60 eligible participants, 58 completed the 4-d FR and FFQ according to the experimental design, but 56 subjects were considered for the analysis. The relative validity of the FFQ, compared to the 4d FR, varied among intakes of energy, macronutrients and food groups (Tables 2 and 3). The FFQ overestimated as well as underestimated the absolute intake of various nutrients and foods, which was comparable to other validation studies [29,30]. We observed that in general, frequently consumed foods tended to be overestimated in the FFQ compared to the 4-d FR, in particular vegetables and fruit intake, as reported in other FFQ validation studies [13,31]. Food items consumed daily (e.g. bread, dairy products) are better estimated by the FFQ as described in other studies [32,33]. These food groups may represent in general more frequently consumed foods for this study population, as they reflect common dietary habits. In contrast, food groups such as soup, sauce, preparation fats and savoury spreads and meat alternatives were underestimated in the FFQ when compared with the 4-d FR. These items may include rather rarely consumed foods, on the other hand, they may include food groups that are difficult to estimate portion size and rather tended to be ignored (e.g. sauce and preparation fats and savoury spreads). Furthermore, it should be considered that information on some of the food items was collected in a predefined manner in the FFQ compared to the open-end tool of the 4-d FR, where food items were weighed right at the time of consumption. For example, preparation fats and savoury spreads may have not been reported in the FFQ.
The application of correlation coefficients to assess relative validity in FFQ validation studies is still under debate, but there is a common agreement that correlations above 0.5 are moderate or good, and that correlations below 0.4 indicate a low degree of linear correlation [18,34]. Therefore, nine out of 25 food group intakes can be considered to have an acceptable validity for assessing intakes on a group level (all statistically significant, Table 3).
The correlation coefficients for energy and macronutrient intakes showed in general similar or lower values than those observed in other studies [9,13,29,31]. Protein and fibre intakes exhibited good correlations with values of 0.55 and 0.44. The lowest degree of linear association was found for carbohydrates (r = 0.27), which was also considerable at the food group level for legumes (r = 0.16), vegetables (r = 0.35) and desserts (r = 0.32). This finding may be related to the fact that some of the foods contributing to carbohydrate intake are consumed less frequently than weekly or only by a limited number of persons. Similarly, only five persons reported the consumption of legumes in the 4-d FR. Several persons reported legumes intake only once a month in the FFQ. The FFQ retrospectively assesses the diet covering the  Results obtained through the Bland-Altman method for energy intake showed slightly lower intakes on average for the FFQ than reported in the 4-d FR (50.2 kcal), with a slight tendency for larger (absolute) differences between the instruments with increasing energy intakes. This result could be partly explained by a higher tendency of underreporting in the FFQ for calorie-dense foods compared with the 4-d FR. Similar findings were reported in another study [33].
The results of this study point to relevant differences in reporting food intake between men and women. Compared to men, women reported a significantly higher intake of meat, fruits, sweet spreads and cheese in the FFQ compared with the 4-d FR. In response to social desirability, it is well known that women may be more likely to over-report food items related to a positive health image, e.g. fruits and vegetables, whereas sweets and cakes are usually associated with a rather negative health image and thus tend to be underreported [35]. In addition, the FFQ used for this study included a list of several fruits (n = 15) that also could lead to an over-reporting of fruit intake, as discussed elsewhere [36]. This poses a challenge to participants in estimating the overall fruit consumption [36]. Similar findings were observed for meat (n = 8) and cheese (n = 7) in this study. Additionally, the order of requested food items in the FFQ (e.g. meat is asked at the first position) could explain the significant differences between the two instruments.
For cereals and grains, women reported a significantly lower intake in the FFQ than in the 4-d FR, compared to men. Irrespective of the gender difference, reporting the portion sizes of these food items in the FFQ (e.g. noodles, rice, corn) could have been a challenge due to difficulties in the volume estimation by means of the food pictures.
There are some limitations in our study. First, the study participants from Jena, Germany may not be representative of the target Swiss population, for which the FFQ was designed. Therefore, this fact has to be kept in mind when citing this validation.
As previously discussed, the applied assessment tools contain several limitations. Despite the fact that the weighed food record (FR) is often denoted as the gold standard, it might cause a bias that has to be considered. On the one hand it is an invasive instrument that can induce changes in dietary habits, on the other hand it may not capture longer-term dietary patterns well. The FFQ in contrast, even though aiming at capturing food intake over longer time periods, faces the challenge of recall and difficulties in estimating portion size [19].
Due to the short sequence of data collection between the 4-d FR and FFQ, the awareness of an individual's food habits could potentially affect the way the FFQ is filled in and therefore might also result in inflated correlations. A solution to this problem could be to let half of the group fill out the FFQ first and the other half to fill out the 4-d FR first.
Both instruments are time consuming for participants. While the FFQ is only filled in once and takes about 30-45 minutes, the time investment related to the FR is higher. It is an open-end tool performed several times per day for a fixed period of time and thereby puts a higher burden on daily life for weighing and recording food intake. In order to minimize the respondents' burden, the use of emerging technologies, e.g. internetbased assessment tools presents a promising approach to tackle this challenge [37].
As already mentioned, an additional limitation of the FFQ could be the large number of listed food items within the food groups (from 1 for legumes to 22 for vegetables). This leads to a high variety of level of detail in the different food groups. Food groups including more items may lead to a cumulative effect and a tendency for over-reporting regarding that specific food group (e.g. fruits). Conversely, food groups containing only one item (e.g. egg) may lead to an underreporting effect due to the aggregation of foods (e.g. scrambled egg, fried egg, etc.) to the main group. This presents a challenge in the estimation of food intake.
In addition, the seasonality aspect must be taken into account. Due to the assessment period in the winter season, only a selected number of season-specific foods were reported in the 4-d FR, whereas the FFQ consists of a fixed food list and the study participants have to estimate their intake under consideration of the respective season. Another limitation of the study was the small sample size, which represents one of the most limiting factors of the current study. A sample size of a minimum of 50 subjects but preferably 100 or more is recommended for validation studies [18]. Sample size post-calculations indicated that with a minimum sample size of 50, the power to detect significant correlations of 0.35, 0.40 and 0.45 would respectively be 0.74, 0.85 and 0.94 (two-tailed and alpha = 0.05).
Further, we did not use biomarkers or other objective reference measures to assess validity, which presents a major limitation of this study. The FFQ assessed dietary intake over a period of four weeks and inclusion of concentration biomarkers in plasma or in adipose tissue would have added valuable information about its validity [38][39][40]. Nonetheless, there is a lack of biomarkers to reflect wider aspects of dietary intake, and the use of biomarkers for validation of dietary assessment methods is costly.
In conclusion, the 127-food itemed self-administered FFQ showed moderate relative validity for protein and various foods such as fruits, egg, meat, sausage, nuts, salty snacks, beverages such as water, tea and coffee, soft drinks with sugar and alcoholic beverages, thus showing comparable results with other FFQ validation studies and acceptable validity for the other macronutrients and frequently consumed food groups. Therefore, it can be considered as an appropriate tool to assess and characterize usual dietary intake of adults in epidemiological studies. But in these studies, the observed gender differences in under-and over-reporting of specific food items and groups may need to be considered in interpreting observed gender differences in the association between nutrition and health.