COVID-19 and the Ethics of Democratizing Voice Computing
By Ankita Moss
|Image courtesy of Pixabay
NeuroLex is a company with the mission of using voice analysis techniques to detect health conditions early to reduce healthcare costs and improve patient outcomes. The company has created a product called SurveyLex, the “first voice-enabled survey platform,” that can collect voice samples across a wide range of digital devices. In a recent interview hosted by Microsoft for Startups, former NeuroLex CEO and current Vice President of Data and Research at Sonde Health, Jim Schwoebel, discusses the founding history of NeuroLex and its growth and utility during the COVID-19 pandemic. Mr. Schwoebel also answered questions for The Neuroethics Blog Team, addressing how and why neural data is useful, how voice data fits under the umbrella of neural data, how voice data relates to COVID-19, and how he manages ethical decision-making in this work. It is important to note that this interview was prior to the acquisition of SurveyLex by Sonde Health.
Neural Data and Biomarkers
The concept of “voice data” is relatively new. Jim Schwoebel defines neural data as “any sort of data that is related to the brain or nervous system in general.” “Keeping the definition of neural data broad” and including voice data under this umbrella will be imperative to new waves of medicine, specifically telemedicine, in the future.
Voice data can be categorized as neural data because it is a predictor of neural function. Speech is part of everyday life for most individuals; therefore, it can be measured more readily and rapidly than other forms of data. Efficiency and ease which will be important when diagnosing patients remotely. Moreover, speech is an incredibly nuanced phenomenon and a “good proxy for things like brain function and brain region change.” Schwoebel states that, “testosterone, as a hormone, regulates the fundamental frequency of the voice [and beyond that,] neural function.” According to the team at NeuroLex, voice is an easy biomarker under the umbrella of neuroscience that may possibly lead to better patient diagnoses and outcomes. A vocal biomarker “is [essentially] a signature that can be useful to detect symptoms or a disease.” Voice is readily available and can be used to make earlier diagnoses possible for a host of medical conditions. NeuroLex traditionally analyzes vocal biomarkers for Alzheimer’s Disease or dementia in general, both of which can result in changes in both the brain and voice.
|Courtesy of Pixabay
NeuroLex has now ventured into the COVID-19 diagnostic testing space. Jim Schwoebel and his team state that, “by analyzing the voice beyond what the human ear can hear, we could unveil dedicated vocal biomarkers that will enable the healthcare community to get insights on the symptoms and onset of the COVID-19 virus.” While the more obvious symptoms are fever and shortness of breath, the NeuroLex team believes that multiple voice characteristics might be signatures for COVID-19. While researchers around the globe are working on antibody testing to find solutions for the pandemic, the NeuroLex team is working on new ways to diagnose COVID-19 early using voice data. COVID-19 symptoms might be caught early on using a speech test. For example, “speech tasks like ‘count up from 1 at a normal pace as much as you can without taking a breath’ are good proxies for respiratory function, specifically around the duration of the session and power of the vocalizations.”
Inspired by 23andMe’s large repository of genetic information, the NeuroLex team developed a Voiceome study to identify key voice biomarkers. NeuroLex then extended this partnership to include Biogen, with the goal of creating the largest normative voice data repository to identify key indicators and health traits. The partnership is aiming for 10,000 participants for the voice study. Participation involves voice tasks, such as answering questions. The dataset will demonstrate how attributes such as age, sex, and ethnicity can impact speech. For COVID-19 in particular, the study includes a tailored questionnaire. NeuroLex has also partnered with Voicemed, which uses NeuroLex’s SurveyLex software to collect voice samples. The goal for these two partnerships is to glean a large amount of data to test for COVID-19 while mitigating biases (such as those with regards to gender, race, ethnicity, identity, etc).
To Schwoebel, it seems inevitable that one day voice data might be a large part of transforming medicine. However, a rapid turnaround and prioritization of efficiency can spark ethical concern. For example, “there may be less burden of proof to get this [type of] vocal diagnostic test to patients.” Down the road, “this can create ethical dilemmas around whether or not… [to] scale this tech prematurely to achieve revenue.” Voice data, as a biomarker, can create footprints that have the potential to result in unwanted biases. Therefore, there are some risks in using voice data as a proxy for neural function. Individuals may be perceived a certain way or denied certain benefits because of a risk of diagnosis for any condition indicated by their voice data. A smaller dataset particularly can also be skewed toward one gender, ethnicity, age, etc, and therefore, is not conducive to creating accurate voice diagnosis algorithms. Recognition data, so far, has generally included information from white males and has not been remotely representative of what is needed to benefit humanity. One can see how many factors can easily contribute to bias in a growing space. This is an example of the dangers that can occur if the deployed tech requires less burden of proof in a rapidly changing healthcare climate.
The NeuroLex team typically works with large datasets and “segments voice data by age, race, microphone type, and gender.” Jim Schwoebel stresses that it is “important to test [any] algorithm with these biases in mind.” NeuroLex currently has “relatively normally distributed datasets in the USA population. With cohorts 8,000-10,000+ in size with the Voiceome study, it is slightly biased toward a younger population but relatively representative in terms of ethnicity and individuals with other chronic health conditions.” Jim Schwoebel states that, to help prevent biases, “self-reported survey information is used for the labels of the identities of participants in our clinical trials.” For COVID-19 in particular, “the datasets [NeuroLex has] curated are relatively small (<100 positive COVID-19 patients),” as this work is a new proof of concept driven by the current pandemic. They are “working to curate a larger dataset of over 1000 patients to scale across various languages, dialects, ages, genders, and microphones.” Schwoebel is cautious of making any claims before ensuring the data set is large and the algorithm is generalizable.
Even if an algorithm can be generalizable and a large amount of data is gathered, the process of data collection itself poses ethical concerns. Many of the companies Neurolex can be compared to, like 23andMe, have been ethically scrutinized about their data collection, ownership, and sale of data. As voice tech companies grow, we must learn from the genetic test kit hype and ensure that sending over a voice sample will not be as risky as sending a saliva sample. For example, 23andMe has been accused of data privacy concerns. Instead of simply using data for “research purposes,” which evokes a benefit to humanity at large and nonprofit work, 23andMe shared customer data with for-profit pharmaceutical giants. What’s more 23andMe monetarily benefits from these partnerships. Even though 23andMe gathers this data after customers have consented, the implications are not clear to users. This lack of clarity has to change within all subsectors of biological data collection. Such implications provide a call to action in the new voice data space.
To address concerns of privacy and bias, Neurolex uses a robust voice security framework that is HIPAA compliant to ensure the safety and confidentiality of patients and research volunteers. In addition to working to collect large amounts of data to mitigate biases, Jim Schwoebel has also created a Voice Ethics Framework to mitigate privacy and consent issues (Schwoebel, p. 374). He has featured this framework in his book Introduction to Voice Computing in Python, in which he encourages prioritization of the user’s privacy and ethical data collection/distribution within the novel voice computing community. Schwoebel mentions that users should be notified whenever they are being recorded and should have explicit knowledge of where their data is going and how it is being used.
|Jim Schwoebel, Introduction to Voice Computing in Python, page 374
He also discusses loopholes in the Americans with Disabilities Act of 1990, which prohibits any discrimination based on disability (or toward individuals with health conditions). Jim cautions that, in the 1990 act, “there is no such mention of ‘screening for disabilities’ by employers with built in machine learning models or other such things that could be used for discrimination” (Schwoebel, p. 375). This is an example of how policy has not caught up to technological innovation. One can see how the simple act of screening can result in an underlying bias or the potential to act based on these biases. To combat such loophole biases, Schwoebel calls for the creation of a “working panel of people around these topics to help ensure protections to people who may have disabilities and/or be screened with disabilities through voice or language models” (Schwoebel, p. 376). Schwoebel’s mention of a working group comes on the heels of the creation of neuroethics working groups within various brain initiatives.
NeuroLex’s team members have also published scholarship on voice data and ethics within The American Journal of Bioethics Neuroscience. In “‘Sorry I Didn’t Hear You.’ The Ethics of Voice Computing and AI in High Risk Mental Health Populations,” Dr. Chris Villongco and Fazal Khan discuss the importance of ethical considerations with the increasing development of voice computing, AI, and machine learning. Villongo and Khan specifically dissect the policy and ethical concerns regarding screening for mental health conditions in low-income and minority populations. Ultimately, the NeuroLex team “urges researchers developing AI tools for vulnerable populations to consider the full ethical, legal, and social impact of their work.”
These efforts are examples of how young yet growing spaces, like voice tech, can mitigate ethical concerns early on and prioritize the end user. In order to ensure that voice tech is beneficial to humanity in the long-run, the companies that produce the technology should themselves treat users fairly throughout all steps of the process: innovation, distribution, and data collection.
As the increased need for telemedicine and telehealth solutions skyrockets during the COVID-19 pandemic, we must ensure that rapid turnaround tech will be ethically sound years from now, when life hopefully goes back to normal. Yes, voice data could possibly be exponentially useful in diagnosing individuals. Yes, voice data could be misused like genetic data from testing kits. It is up to the voice community to decide, early on, whether or not to adopt sound ethical practices upon which to innovate, distribute tech, and collect user data.
Want to cite this post?
Moss, A. (2020). COVID-19 and the Ethics of Democratizing Voice Computing. The Neuroethics Blog. Retrieved on , from http://www.theneuroethicsblog.com/2020/09/covid-19-and-ethics-of-democratizing.html