Pilot study
To identify suitable physiological and behavioural parameters for the assessment of emotional states in dogs, an a priori approach was adopted to select scenarios that generated emotional states within each of the four core affect quadrants. A pilot study utilizing 20 adult dogs (8 males, 12 females; 5 Labrador Retrievers, 5 Beagles, 5 Norfolk Terriers, and 5 Petite Basset Griffon Vendéens) was conducted. Dogs were housed in pairs or groups of three within kennels located at the Waltham Petcare Science Institute (Leicestershire, UK), which allowed for free access to both an indoor and outdoor environment. Throughout the study, all dogs experienced comprehensive training and socialization programs as per the Institute’s standard animal care requirements. Additionally, dogs were habituated to all testing environments and associated equipment prior to testing.
The dogs were exposed to four scenarios anticipated to induce positive valence emotions: provision of a long-lasting chew (chew); calm petting by a familiar handler (petting); engaging in play with a toy (toy); engaging in a game throwing treats (treat). Additionally, video footage from previous research exploring five different scenarios anticipated to induce negative valence emotions was reviewed: confinement to the inside portion of their home enclosure whilst isolated from conspecifics (baseline); social isolation in a familiar room (separation); housed in a kennel in a vet suite (kennel); a veterinary examination (consult); and car travel (car). Video recordings of dogs experiencing these scenarios were scored by two trained dog behaviour coders on a scale of one to seven for valence (1—very negative, 7—very positive) and arousal (1—no arousal, 7—high arousal) in order to assess if the emotional state induced fell within the required quadrant for a majority of dogs tested. In instances where a scenario exceeded 10 min, only the first 10 min were scored. Four scenarios (two utilizing a food reward and two utilizing a social reward) resulting in emotional arousal and valence characteristic of Q1 and Q2 were selected. Additionally, two scenarios suitable for eliciting emotional responses consistent with Q3 and Q4 were also identified. These scenarios were selected based on the percentage of dogs that fell within the defined emotional quadrant. Further considerations resulted in selection of sessions that ensured the highest level of separation between the different quadrants (Fig. 2).
The chew and treat scenarios were selected to induce emotions in the presence of food within Q2 (positive valence/low arousal) and Q1 (positive valence/high arousal) respectively, while toy and petting scenarios were chosen to induce Q2 and Q1 emotions without the presence of food. Emotions in Q3 (negative valence/low arousal) and Q4 (negative valence/high arousal) were induced using the separation and car scenarios respectively.
Statistical powering
The sample size for this study was determined through a priori power analysis by simulation, for each primary measure of interest (Cortisol, HR, HRV-RMSSD, both QBA component scores). Plausible effect sizes and within- and between-animal variance components were estimated and/or extrapolated from a subset of existing data (control diet, pre- and post- first exposure) collected in previous research measuring the same parameters in negative emotion settings only. These values were used to simulate 1000 datasets in the proposed experimental design (2 × 2 crossover), for each primary measure and at each of a range of potential sample sizes. Each simulated dataset was analyzed according to the planned statistical approach and pairwise contrast design for the main study (see below), and the proportion of simulations in which induced pairwise effects of interest were detected was recorded for each measure as an empirical power estimate. Based on the results of these analyses a sample size of 60 dogs was chosen for the main study in order to achieve power exceeding 80% to detect a difference relating to valence and/or arousal for each of the five primary measures. This assumed that residual variability and effect sizes of interest would not be substantially greater or smaller in positive valence conditions, respectively—an assumption that was generally borne out in our subsequent data, entailing no major concerns around statistical power for this study.
Animals and husbandry
Upon completion of the pilot study, 60 healthy, adult dogs consisting of 31 males (4 entire) and 29 females (7 entire) from three breeds (30 Labrador Retrievers, 12 Beagles, 18 Norfolk Terriers) participated in the main study. Dogs ranged in age from 0.9 to 6.1 years old (mean age = 3.2) at the start of the study. Dogs varied in their experience with the selected scenarios based on their training history and previous participation in research studies. Dogs for the main study were housed, managed, and habituated in the same manner as the pilot dogs. Additional training to facilitate sample collection (i.e., blood draws, mouth and ear handling, wearing multi-parameter harnesses) was also conducted prior to testing. All dogs visited the room used for test sessions a minimum of two times prior to the start of the study, with additional visits provided if the dog showed signs of negative emotional reactions (e.g., fear, anxiety) or high arousal positive emotional reactions (e.g., excitement, anticipation). Visits were combined with the dog’s usual daily exercise and included being let off-lead in the test room for a few minutes. Dogs were able to freely investigate the area, and the handlers were instructed not to encourage the dogs using food, toys, or play in order to minimize the dogs developing strong positive or negative associations with the space. Further, all dogs were taken to the test room for at least one recovery session (with additional sessions provided if negative emotional reactions observed) in between test sessions in order to minimize the impact of previous scenarios on the dogs’ responses to entering the room. All dogs were trained to walk up or onto a ramp or box (based on the dog’s individual preference) to enter the car and into a crate fixed inside the car. The number of training sessions provided to dogs was based on the emotional reaction and training progression of the individual dog. All dogs were required to be comfortable and willing to enter the crate in the car without strong positive or negative emotional reactions (e.g., extremely excited or nervous) prior to their car test session.
Dogs were excluded from the study if they failed to adequately habituate to these sample collections or test areas prior to the start of the study, and replacement dogs were selected. Additionally, for the purposes of dog and human safety, dogs were excluded based on previous observations of excessive destructive behaviour, or resource guarding, as well as any dietary restrictions that would not allow the consumption of the treats used during testing.
This study was approved by the Waltham Animal Welfare and Ethical Review Body (WAL 102424) and conducted under the authority of the UK Animals (Scientific Procedures) Act 1986. All methods were performed in accordance with relevant guidelines and regulations and are reported in accordance with ARRIVE guidelines.
During each test session dogs were closely monitored through means of live-feed CCTV cameras (Dahua 4K IR Turret Network Camera; Dahua Technology, Leeds, UK). Dogs were monitored for signs of distress and/or safety concerns based upon predefined end-point criteria. These included hyperventilation, extreme hypersalivation, excessive barking or whining, cowering, repeated performance of vigorous escape attempts, and behaviours that had the potential to result in self-harm and/or the ingestion of a foreign body. No dogs had to be removed from the study due to signs of distress, however, one dog’s car scenario was terminated early due to unrelated mechanical issues with the car. Additionally, one male Labrador was removed from the study (and therefore any subsequent analyses) after being diagnosed with atypical Addison’s disease.
Study design
Each dog was exposed to each of the six selected scenarios over a period of 19 weeks using a cross-over study design with order randomized based on a balanced Latin square. Test sessions were scheduled 3 weeks apart with some exceptions due to scheduling conflicts (min. 12 days). In order to minimize the potential impact of routine vaccinations and/or certain medications on the immunological parameters collected, dogs skipped sessions within 4 weeks of the administration of these substances, resulting in up to 6 weeks between sessions. To maintain balanced ordering, missed sessions were rescheduled to the next time-slot and remaining sessions pushed backwards in turn, with the end of the study slightly delayed for these animals. Due to the nature of the test sessions used, the handlers and experimenters were not able to be blinded.
All scenarios lasted 10 min and, with the exception of car travel, occurred within a test room (5.23 m × 3.68 m) which the dogs had been previously habituated to. The test room contained multiple resting areas (two pieces of vet bed on the floor and a piece of vet bed on an elevated platform) and fresh water (water bowl that was emptied and re-filled at the beginning of each test session). To mask potentially inconsistent background noises that might distract dogs during the scenarios, a radio was played either directly outside the test room or through the car speakers, set to a consistent volume and radio station. To minimize the effect of external temperature all testing and sampling areas were maintained at 18 ± 2 °C. Throughout all testing and sampling procedures each dog was handled by an individual who regularly worked with and trained that dog. This resulted in different handlers being used for different dogs, as appropriate. The research team, including authors (S.L.M), oversaw the test sessions, but were not directly involved in handling the dogs.
The six scenarios utilized in the main study to elicit various emotional states in dogs are outlined in detail below:
Positive valence/high arousal/with food—treat throwing
The dog was taken into the test room by a familiar handler, the lead removed, and given 2 min to acclimate and explore. The handler then retrieved a container of pre-prepared treats (CRAVE™ Protein Chunks; Mars Petcare, Slough, UK) from a shelf and sat on a chair located in a corner of the room. The number of treats prepared for this scenario was determined based on the individual weight of the dog being tested, with dogs over 25 kg receiving 18 chunks, dogs between 10 and 25 kg receiving 13 chunks, and dogs under 10 kg receiving 8 chunks. These chunks were then cut into smaller pieces so that each dog had a total of 72 treat pieces available for throwing. The handler took single treat pieces from the container and threw them in random directions and distances, utilizing the entire test room. Treat pieces were thrown approximately once every 5–10 s. The handler could speak to the dog as required to engage them in the game. After 10-min the handler re-attached the lead and walked the dog to an adjacent room for post-test sampling.
Positive valence/high arousal/without food—toy play
The dog was taken into the test room by a familiar handler, the lead removed, and given 2 min to acclimate and explore. The handler then engaged the dog in play with a selection of toys for 10-min. All dogs had exposure to a range of toys prior to testing, and their top two preferred toys were used during testing. Handlers were instructed to engage the dogs in their preferred style that maximised engagement and excitement. This could include fetch, tug or chase style games. After 10-min the handler retrieved the toys, re-attached the lead, and led the dog to an adjacent room for post-test sampling.
Positive valence/low arousal/with food—long lasting chew
Positive valence/low arousal/without food—petting
The dog was taken into the test room by a familiar handler, the lead removed, and given 2 min to acclimate and explore. The handler then sat on vet bedding, which was placed on the floor, and gently encouraged the dog to come close. The handler then stroked or scratched the dog in a calming or soothing manner, based on the dog’s individual preferences. Handlers were instructed to halt and/or alter their approach if the dog showed signs of excessive excitement or discomfort (e.g., yawning, panting, moving away). If dogs became disengaged from the handler and moved out of reach, the handler periodically encouraged them to return, but the dogs were otherwise allowed free choice whether to continue the interaction. After 10-min the handler re-attached the lead and led the dog to an adjacent room for post-test sampling.
Negative valence/high arousal—car travel
Dogs were walked on lead by their handlers, to a minivan vehicle (Ford S-MAX; Ford Motor Company Ltd., Essex, UK) parked outside the post-test sampling room. Dogs entered the rear of the car via a ramp or platform (depending on the dogs predetermined preference) and were closed within a crate secured within the car boot. The size of the crate used was dependent on the size of the dog (small crate: 76 × 48 × 54 cm, medium crate: 78 × 54 × 62 cm, large crate: 90 × 58 × 66 cm, XL crate: 106 × 71 × 70 cm), and each crate contained a piece of non-slip vet bedding. The car then underwent a standardized 10-min car journey consisting of a range of maneuvers including a sharp U-turn and a three-point-turn. The speed of the car never exceeded 10 mph due to being in a private enclosed car park area. Upon completion of the route, the handler opened the car boot and crate, re-attached the lead, and led the dog out of the car via the ramp or platform and into the building for post-test sampling.
Negative valence/low arousal—separation
The dog was taken into the test room by a familiar handler, the lead removed, and given 2 min to acclimate and explore. The handler then left the room, and the dog was left alone for a period of 10-min while being monitored by a researcher in an adjacent room via a CCTV system. After 10-min the handler returned, re-attached the lead, and led the dog to an adjacent room for post-test sampling.
Data collection and processing
A range of behavioural and physiological parameters were captured during and after testing to determine which parameters, or combination of parameters, could be successfully utilized to differentiate between different emotional states. These parameters included data generated during the test sessions from wearable devices worn by the dog, and behavioural data coded from video footage. After test sessions, dogs were taken from the testing area to a room for post-test sampling. Prior to entry to the sampling room, infra-red videos were collected for measurement of surface body temperature of key areas of the dog. Upon entry to the sampling room, tympanic temperatures were collected, followed by blood samples for measurement of cortisol, serotonin and ACTH, and saliva samples for measurement of sIgA. Further details related to the collection and processing of these parameters are outlined below.
Wearable technology parameters
Two different wearable technologies were used to measure a range of parameters during test sessions. These included activity monitors (Whistle™ FIT accelerometer; Mars Petcare, McLean, VA, USA) which have been previously validated for collection of activity data, and multi-parameter harnesses (Dinbeat UNO; Dindog Tech, S.L., Barcelona, Spain) which have been previously validated for collection of HR and HRV data and also provided readings for body position (unvalidated).
The activity monitors were attached to the dog’s collar and worn throughout testing. One minute Activity Points generated by the activity monitor indicative of duration and intensity of activity during that time period were matched to the test session times and summarized to determine mean Activity Points during the test session.
For the multi-parameter harnesses, on the day prior to testing, dogs had their fur clipped in three specific areas on the sides of their chest (one area on either side of their rib cage about an inch from their arm pit and one area on their right side towards the end of their rib cage) to allow for the application of electrocardiogram (ECG) electrodes. On the day of testing, dogs were equipped with the multi-parameter harness which was worn throughout testing.
Following testing, data were downloaded from the devices, which consisted of HR (bpm) and categorical position readings (standing, sitting, lying sternal, lying left lateral, lying right lateral, supine, on two legs) provided 24 times per second. RR intervals (ms) based on continuous ECG data were also obtained. These data were matched to the test session times and summarized to determine mean HR and proportion of time spent in each position during the test session. Additionally, HRV was calculated as the root mean square of successive RR interval differences (RMSSD) as well as the standard deviation of the RR intervals (SDRR). A single HRV value was generated for both RMSSD and SDRR for each 10-min test session. As HR readings occasionally dropped out when ECG nodes moved, or the device lost connection, any sessions with more than 50% missing readings for HR or RR interval were excluded from analysis (n = 32). Furthermore, a total of five dogs did not wear the multi-parameter harness due to failure to successfully habituate to the device, as demonstrated by alterations to their normal behaviour. Also, 40 videos (11.4%) were randomly selected to be coded by a trained dog behaviour coder and used to assess agreement between the harness readings and manual coding (Table 1). The appropriate number of videos to assess reliability was determined from a review of literature on sample size requirements for reliability analyses based on assumed moderate to good agreement (ICC ~ 0.60)42,43,44. However, three videos could not be compared to corresponding Dinbeat harness readings due to data not being available from the harnesses for that session. For the purposes of comparison and analysis, lying sternal, left lateral, right lateral, and supine as measured by the multi-parameter harness were combined for a total proportion of time lying, sitting was used to determine proportion of time sitting, and standing and on two legs were combined for a total proportion of time standing. Meanwhile, the video coded behaviours of lateral lie down and sternal lie down were combined for a total proportion of time lying, sit was used to determine proportion of time sitting, and stand, walking, trotting and vigorous activity were combined for a total proportion of time standing.
Video parameters
Video footage for coding of dog behaviour data were collected via four CCTV video cameras mounted in each corner of the room for scenarios conducted within the test room. During the car scenario, video footage was recorded using two Logitech 922 webcams (Logitech, Lausanne, Switzerland) which were mounted with a view of the front (on car center console) and rear (on rear car window) of the crate.
Videos from each 10-min test session were coded for a number of behaviours anticipated to vary based on emotional state using a detailed ethogram (Table 2). One trained dog behaviour coder scored all videos using ‘The Observer XT 15’ (Noldus, Netherlands, Europe). Further, a random selection of 10 videos (2.9%) were re-coded by the same coder for a total of three repetitions, with repeats randomly distributed throughout the course of data collection, to assess intra-rater reliability. The appropriate number of videos to assess intra-rater reliability was determined from a review of literature on sample size requirements for reliability analyses based on assumed good to excellent agreement (ICC ~ 0.80)42,43,44. Video names were encoded so that the coder was blind to which videos were repetitions. To account for minor differences in video length, state behaviours were analyzed as a proportion of time spent performing the behaviour by dividing the duration of the behaviour by the total video length. Further, due to the mouth of the dog not always being visible from the available camera angles, the proportion of time spent panting was divided by the duration of the video where the mouth was visible. Videos (n = 18) where the mouth was not visible for more than 25% of the video duration were not included in the analysis of panting behaviour.
Additionally, three trained dog behaviour coders provided QBA scores on all videos collected during this study, using a list of terms (Table 3.) modified from previous research assessing dog emotional states in different settings22,49. New terms (‘agitated’, ‘calm’, ‘confident’ and ‘happy’) were added to ensure inclusion of terms covering a range of emotional states from across the four emotion quadrants. After watching each 10-min video, coders provided one score per term. Terms were scored using a visual analog scale, where a score of 0 was given when the dogs were expressing a total lack of, or negligible amount, of the emotion indicated by the term, and a score of 124 was given when the dog was strongly expressing the emotion indicated by the term. A random selection of 10 videos (2.9%) were re-coded by all three coders for a total of three repetitions, with repeats being randomly distributed throughout the course of data collection, to assess intra-rater reliability. As with the video coding, the number of videos were selected based on a review of literature42,43,44 and video name was encoded in order to blind the raters to which videos were repetitions.
At the same time as providing QBA scores, coders were also instructed to score the emotional valence (i.e., how emotionally positive or negative they perceived the dog to be) and arousal (i.e., the intensity they perceived the dog’s emotional state to be) of the dog. In order to allow for more granularity in the response of the coders a visual analog scale ranging from 0 to 124 was used in place of the 1 to 7 scale implemented in the pilot study. The left-hand side of the scale (score 0) was defined as a very negative emotional state, or very low arousal/calm emotional state. The right-hand side of the scale (score 124) was defined as a very positive emotional state, or very highly aroused/excited emotional state. These scores were used to confirm the dogs responded to the scenarios as anticipated but were not otherwise used in the data analysis.
Temperature parameters
A portable infra-red camera (FLIR T840, FLIR, OR, USA) was used to capture infra-red videos for measurement of the surface temperature of the eye and nose of the dog. The infra-red camera had a thermal range of − 20 to 150 °C and a resolution of 464 × 348 pixels. Additionally, the camera has an accuracy of ± 2 °C or ± 2% of reading and sensitivity to detect temperature differences within a frame of < 30 mK. During video recordings the value of emissivity was set at 1 as per manufacturer guidelines. Dogs were recorded in a climate-controlled hallway immediately after the end of the test session prior to entry to the post-test sampling room. The camera was positioned on a tripod approximately 1-m away from the dog, with the lens parallel to the floor and in line with the dog’s head. To minimize the effect of external temperature on infra-red readings all testing and sampling areas were maintained at 18 ± 2 °C. The temperature of the test room (or car for car test sessions), hallway, sampling room, and outside temperature were monitored and recorded at the end of every test session using digital thermohygrometers (Doqaus, Shenzhen, China).
Following infra-red video recording, dogs proceeded into the sampling room, where tympanic temperature of both the right and left ear was measured using an infra-red thermometer (Braun Thermoscan 7 IRT6520; Frankfurt, Germany) with probe covers inserted into the dog’s ear canal. The thermometer has a reported accuracy of ± 0.2 °C. The difference between left and right ear temperature was then calculated by subtracting the right ear temperature from the left ear temperature.
Video footage from the infra-red camera was analysed using FLIR Tools software (FLIR, OR, USA). The frame in which the dog directly faced the camera and was most in focus was selected for temperature capture. Mean left and right eye temperature were collected through the use of an ellipse drawn within the anterior surface region of each eye. Mean nose temperature was collected from an ellipse drawn encompassing the anterior surface of the nose (Fig. 3). The difference between left and right eye temperature was then calculated by subtracting the right eye temperature from the left eye temperature.
Blood parameters
Following collection of body temperature parameters, blood samples were collected in order to measure serum cortisol, serum serotonin, and plasma ACTH. Prior to sampling, a small patch of hair was shaved from the injection site on the dog’s neck. A disinfectant wipe (Vetasept; Animalcare Ltd, York, UK) and topical anesthesia (Ethycalm Plus; Invicta, West Sussex, UK) was then applied to the area before a 3.2 mL blood sample was collected from the jugular vein by a qualified technician. In order to minimize the impact of collection stress on these parameters, blood collection was terminated if not completed within five minutes of the end of the scenario.
Blood samples for cortisol and serotonin analysis were collected into serum gel tubes and left to stand for 30 min before being transported to the onsite laboratory. These samples were spun down using a centrifuge set at 2000g for 10 min at ambient temperature within two hours of collection before being aliquoted and stored at − 80 °C in preparation for later analyses. Blood samples for ACTH were collected into EDTA tubes, inverted 10 times and immediately stored on ice for transportation to the onsite laboratory for further processing. These samples were spun down using a centrifuge at 2000g for 10 min at 4 °C within an hour of collection before being aliquoted and stored at − 80 °C until analysis.
Cortisol analysis was performed in-house using the R&D Systems, Parameter Cortisol Immunoassay (bio-techne, Minneapolis, USA) following the manufacturer’s protocol with an intra-assay variation of < 10%. Serotonin and ACTH were shipped on dry ice to an external laboratory (Nationwide Specialist Laboratories, Cambridge, UK) for analysis. There, serotonin was analysed using the Enzo LifeSciences Serotonin ELISA (Enzo Life Science, Lausen, Switzerland) while ACTH was analysed using the Biomerica ACTH ELISA Kit (Biomerica, Irvine, USA). Both tests were performed in accordance with the manufacturers protocol.
Salivary parameter
Following blood collection, saliva samples for analysis of sIgA were collected using Salimetrics Childrens’ Saliva Swabs (Salimetrics, LLC, Carlsbad, California, USA). Twenty minutes prior to saliva collection, food was withheld from dogs, with exception of the food provided during food-based interventions, to minimize potential contamination to the sample. One end of the swab was inserted into the dog's buccal cavity, targeting the lower gum line behind the end molar where saliva pooled, and held in position for 30 s. The end of the swab was then removed and placed into the collection tube before the unused end was used to collect saliva from the other side of the dog's mouth. Both swab tips were placed within the same collection tube, which was immediately placed on ice until transported to the onsite laboratory. In order to minimize the impact of collection stress on this parameter, saliva collection was terminated if not completed within 15 min of the end of the scenario.
At the onsite laboratory, saliva swabs were spun down in a centrifuge at 4 °C sequentially at 1000g for 5 min, followed by 2000g for 5 min and finally 5000g for 10 min. Samples were then stored at − 80 °C until analysis. Salivary sIgA was analysed in-house using the Abcam IgA Dog ELISA Kit following the manufacturer’s protocol with an intra-assay variation of < 10%.
Statistical analysis
All analyses were performed using R Statistical Software version 4.2.2. Inter- and intra-rater reliability of the QBA scores and behavioural coding was assessed using Intraclass Correlation Coefficients (ICCs) from a two-way mixed effects model using the R package ‘irr’. Consistency agreement was used for inter-rater reliability, and absolute agreement was used for intra-rater reliability. These values were interpreted as poor (ICC < 0.50), moderate (ICC: 0.50–0.75), good (ICC: 0.75–0.90) or excellent (ICC > 0.90). Consistency agreement from ICCs were also used to determine the agreement between manual coding, and the multi-parameter harness for position data.
QBA terms with poor (ICC < 0.50) inter-rater reliability, or poor intra-rater reliability for multiple coders were excluded from further analyses. The remaining QBA terms were then summarized using a principal components analysis (PCA) via the ‘FactoMineR’ R package. Prior to the PCA being conducted, the suitability of data for inclusion was tested using the ‘performance’ R package. Data met the requirements of a Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy with KMO values > 0.50 (overall KMO = 0.91) and a significant Bartlett’s test of sphericity (p < 0.001). When retained PCA components were interpreted, terms with loadings ≥ |0.50| were considered to be salient. Component scores were generated using each terms weighting on the key components. Inter- and intra-rater reliability of the component scores was assessed using ICCs as described above.
To understand the relationships between each collected outcome parameter and the emotional quadrants of valence and arousal, data from the scenarios without food (i.e., separation, car, petting, toy) were fitted separately to linear mixed effects models for each parameter (via ‘nlme’ R package), with the respective parameter as the response variable, valence (negative vs positive) and arousal (low vs high) as categorical fixed effects (negative valence and low arousal as the reference categories), plus the two-way interaction between valence and arousal, and animal nested within breed as the random effects structure (intercept-only). Variance weights by arousal level were also incorporated into the models to compensate for heteroscedasticity between high and low arousal scenarios. Outdoor temperature was included as an additional (continuous) fixed effect within models exploring temperature parameters (with the exception of models pertaining to lateralised differences in temperature). Model residuals were plotted and assessed by visual inspection, and parameters were log-transformed if judged to violate model assumptions. The estimated means (back-transformed where appropriate) and 95% confidence intervals (95% CI) were extracted from the model and plotted via the R package ‘ggplot2’. The significance of the fixed effects were assessed using Wald’s test via the R package ‘car’. Pairwise planned comparisons were also performed, between valence levels within each arousal category, between arousal levels within each valence category, and for the two-way interaction (i.e., the difference in the differences), and multiplicity adjusted p-values reported. Family-wise error rate (FWER) adjustment was made using the ‘single-step’ approach of the R package ‘multcomp’ (according to the multivariate t distribution), to control for α-inflation across comparisons within each model. Further, a Bonferroni adjusted α criterion for significance of α = 0.01 was used, based on analysis of five primary parameters (i.e., Cortisol, HR, HRV-RMSSD, QBA PC1_Valence, QBA PC2_Arousal). Secondary analyses of additional parameters applied the same α criterion, to maintain a consistent Type 1 error rate across all measures.
Infrequent behaviours of shake and whining (occurring in < 50% of observations) were analyzed as present/absent for occurrence using binomial generalized linear mixed-effects models (via ‘lme4’ R package), using the same model and pairwise contrast structure as specified above, with some variations. First, excluding variance weighting, which is inappropriate for binary logistic models. Second, due to absence of whining behaviours in the positive valence conditions, this parameter was analyzed within the negative valence conditions only, with the sole categorical fixed effect of arousal and corresponding pairwise contrast between levels low/high. The estimated probabilities of the dogs performing the behaviour and 95% CIs were extracted from the model and plotted. The behaviours of barking, yawning, and howling were not analyzed due to rare occurrence (< 10% of observations).
To understand the influence of food on the collected parameters, and how this may interact with arousal, data from the four scenarios anticipated to elicit positive emotional states (i.e., petting, toy, chew, treat) were fit to further mixed effects models. The same model and pairwise contrast structure as defined above for assessment of emotional quadrants was used, with food (absent vs present) replacing valence as a categorical factor in the design.