Ewan Dunbar
ˈju ən ˈdʌn ˌbɑɹ
Speech · Phonology · Learning
Modelling speech perception and learning · Sound inventories · Low resource speech technology · Decoding neural network representations · Controllable speech synthesis · Open and replicable science
use the link bar above to navigate between my active research projects.
papers referenced below:
team members on this work:
  • amelia kimball (post-doctoral researcher, université de paris)
  • aixiu an (phd student, université de paris)
  • juliette millet (phd student, université de paris/inria)
  • nika jurov (now phd student, university of maryland)
  • emmanuel dupoux (inria/école normale supérieure/facebook research)
  • sébastien gadioux (now master's student, université de paris)
  • adèle richard (now master's student, université de paris)
  • nicolas brasset (now master's student, université de paris)
  • clara delacourt (master's student, université de paris)
  • antoine hédier (now master's student, université de paris)

Dunbar and Dupoux (2016) established that the sound inventories of human languages show three curious typological properties. Their abstract geometries tend to be regular in three ways:

Each of these hypercubes shows the inventory of consonants or vowels in some (real or hypothetical) language. Each of the three or four dimensions is a different phonological feature. The pair on the left illustrates (feature) economy, a property already documented by Clements (2003) and by Mackie and Mielke (2011): if two inventories have the same number of sounds, those sounds tend, all things being equal, to require relatively few phonological features to characterize: inventory (b) would be more likely than inventory (a), all things being equal (though all things are not equal in that case, as languages tend to have nasal sounds). The middle pair illustrates local symmetry: if two inventories have the same number of sounds, using the same number of features, those sounds tend to make relatively many “minimal oppositions” between pairs of sounds, which differ only in one feature: inventory (b) has more such pairs of sounds (like [u] and [o]). This geometry is more typologically frequent. The right-hand pair illustrates global symmetry: if two inventories are matched geometrically on the other two properties, they tend to have sounds that are more symmetrically distributed: few features show far more sounds at one axis than at the other (for example, far more front than back vowels, or vice versa). Inventory (b) is more globally symmetrical, and more typical.

The idea that there are geometric tendencies in sound inventories goes back at least to Martinet and Trubetskoy, but, as yet, there is no good explanation for why inventories have the types of geometries they do. One of the principal goals of the GEOMPHON project (project funded by Agence Nationale de la Recherche, 299k €, 2018 to 2021) is to test explanations for these tendencies. The idea is that historical change will be more likely to add some sounds to inventories than others, depending on whether the resulting inventory is relatively geometrically regular or irregular. In psycholinguistic experiments probing some of the conditions that would need to hold for historical change to take place, we should see active psychological evidence of such preferences.

For example, although Dutch has [p t k b d]—an almost complete set of voiced and voiceless obstruents—it lacks [ɡ] historically. This is an odd inventory geometry: adding [ɡ] would make the language more economical, more locally symmetrical, and more globally symmetrical. With respect to the Dutch obstruent inventory, then, [ɡ] sits in a geometrically privileged position, and, according to the general logic, is very likely to enter the language. Indeed, today, [ɡ] has entered the language through loanwords: most people say [ɡuɡəl] for Google, rather than the [ɣuɣəl] or [xuxəl] that would be dictated by Dutch orthography.

In my group, we are currently testing a speech perception explanation in a pre-registered study (see Open and replicable science).  The hypothesis is that sounds which are not part of a language’s inventory, but which would give rise to a geometrically more regular inventory, are easier for native listeners to perceive. For Dutch speakers, [ɡ] is relatively easy to distinguish from acoustically similar sounds (like [d] and [ɣ]), whereas a sound in a less geometrically privileged position would be more difficult to distinguish from the sounds of Dutch, like the ejective [kʼ], which reduces the economy of the system as it requires a new featural contrast. The hypothesis predicts positive effects of the geometric properties on performance in a speech discrimination task, even when effects of acoustic similarity are controlled for.

Additional hypotheses about these tendencies, complementary to the perception hypothesis, will be tested later: artificial language experiments will assess whether there is an effect of geometric privilege when participants learn to group arbitrary classes of sounds.

References

Becker, S., and Hinton, G. (1992). Self-organizing neural network that discovers surfaces inrandom-dot stereograms. Nature 355(6356): 161.

Benkí, J. R. (2001). Place of articulation and first formant transition pattern both affect perception of voicing in English. Journal of Phonetics 29:1–22.

Best, C. T. (1994). The emergence of native-language phonological influences in infants: Aperceptual assimilation model. In The development of speech perception: The transitionfrom speech sounds to spoken words, 167(224), 233–277.

Chaabouni, R., Dunbar, E., Zeghidour, N., and Dupoux, E. (2017). Learning weaklysupervised multimodal phoneme embeddings. In Proc. INTERSPEECH 2017.

Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of Dirichlet processGaussian mixture models for unsupervised acoustic modeling: A feasibility study. InProc. INTERSPEECH 2015.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Clements, G. N. (2003). Feature economy in sound systems. Phonology, 20(3), 287–333.

Dale, R., & Duran, N. D. (2011). The cognitive dynamics of negated sentence verification.Cognitive Science 35(5): 983-996.

Dillon, B., Dunbar, E., and Idsardi, W. (2013). A single-stage approach to learningphonological categories. Cognitive Science 37(2):344–377.

Dillon, B., and Wagers, M. (2019). Approaching gradience in acceptability with the tools ofsignal detection theory. Ms.

https://people.umass.edu/bwdillon/publication/dillonwagers_inprep/dillonwagers_inprep.pdf

Dunbar, E. (2019). Generative grammar, neural networks, and the implementationalmapping problem: Response to Pater. Language 95(1): e87–e98.

Dunbar, E., Algayres, R., Karadayi, J., Bernard, M., Benjumea, M., Cao, X-N., Miskic, L.,Dugrain, C., Ondel, L., Black, A. W., Besacier, L., Sakti, S., and Dupoux, E. (2019).

The Zero Resource Speech Challenge 2019: TTS without T. INTERSPEECH 2019: 20th Annual Congress of the International Speech Communication Association.

Dunbar, E., Cao, X-N., Benjumea, J., Karadayi, J., Bernard, M., Besacier, L., Anguera, X.,and Dupoux, E. (2017). The Zero-Resource Speech Challenge 2017. In 2017 IEEE

Automatic Speech Recognition and Understanding Workshop (ASRU).

Dunbar, E., and Dupoux, E. (2016). Geometric constraints on human speech soundinventories. Frontiers in Psychology: Language Sciences 7, article 1061.

Dunbar, E., Synnaeve, G., and Dupoux, E. (2015). Quantitative methods for comparingfeatural representations. In Proceedings of ICPhS.

Dyer, C., Kuncoro, A., Ballesteros, M., & Smith, N. (2016). Recurrent neural network grammars.Preprint arXiv:1602.07776.

Feldman, N., Griffiths, T., Goldwater, S., & Morgan, J. (2013). A role for the developing lexicon in phonetic category acquisition. Psychological Review 120(4): 751–778.

Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis.Cognition 28(1-2): 3–71.

Gelman, A. (2015). The connection between varying treatment effects and the crisis ofunreplicable research: A Bayesian perspective. Journal of Management, 41, 632–643.

Guenther, F., and Gjaja, M. (1996). The perceptual magnet effect as an emergent property ofneural map formation. The Journal of the Acoustical Society of America 100(2): 1111-1121.

Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., and Baroni, M. (2018). Colorless greenrecurrent networks dream hierarchically. Preprint: arXiv:1803.11138.

Ioannidis, J. P. (2005). Why most published research findings are false. PLos Med, 2(8), e124.

Le Godais, G., Linzen, T., and Dupoux, E. (2017). Comparing character-level neural languagemodels using a lexical decision task. In Proceedings of the 15th Conference of theEuropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers: 125–130.

Li, B., and Zen, H. (2016). Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNNbased Statistical Parametric Speech Synthesis. Proc. INTERSPEECH 2016.

Liberman, A. M. (1963). A motor theory of speech perception. In Proceedings of the Speech Communication Seminar, Stockholm. Speech Transmission Lab.

Linzen, T., Dupoux, E., and Goldberg, Y. (2016). Assessing the ability of LSTMs to learnsyntax-sensitive dependencies. Transactions of the Association for ComputationalLinguistics 4: 521–535.

Mackie, S., & Mielke, J. (2011). Feature economy in natural, random, and synthetic inventories.In Clements, G. N., and Ridouane, R. Where Do Phonological Features Come From?: Cognitive, physical and developmental bases of distinctive speech categories,  John        Benjamins Publishing. 43-63.

Maldonado, M., Dunbar, E., and Chemla, E. (2019). Mouse tracking as a window intodecision making. Behavior Research Methods 51(3):1085–1101.

Marcus, G. (2001). The algebraic mind. Cambridge: MIT Press.

Marr, D. (1982). Vision: A computational investigation into the human representation andprocessing of visual information. Cambridge: MIT Press.

Maye, J., Werker, J., and Gerken, L. (2002). Infant sensitivity to distributional information canaffect phonetic discrimination. Cognition 82: B101–B111.

McCoy, R. T., Linzen, T., Dunbar, E., and Smolensky, P. (2019). RNNs implicitly representtensor product representations. ICLR (International Conference on LearningRepresentations).

Mikolov, T., Yih, W. T., and Zweig, G. (2013). Linguistic regularities in continuous space wordrepresentations. In Proceedings of the 2013 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human Language Technologies, 746–751.

Millet, J., Jurov, N., and Dunbar, E. (2019). Comparing unsupervised speech learning directlyto human performance in speech perception. In Proceedings of the 41st Annual

Meeting of the Cognitive Science Society (Cog Sci 2019).

Palangi, H., Smolensky, P., He, X., & Deng, L. (2018). Question-answering withgrammatically-interpretable representations. In Thirty-Second AAAI Conference on Artificial Intelligence.

Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section onreplicability in psychological science: A crisis of confidence?. Perspectives on Psychological Science, 7(6), 528-530.

Riad, R., Dancette, C., Karadayi, J., Zeghidour, N., Schatz, T., & Dupoux, E. (2018).Sampling strategies in Siamese Networks for unsupervised speech representation learning. Interspeech 2018.

Riochet, R., Castro, M. Y., Bernard, M., Lerer, A., Fergus, R., Izard, V., & Dupoux, E. (2018). IntPhys: A framework and benchmark for visual intuitive physics reasoning. Preprint:arXiv:1803.07616.

Schatz, T., Feldman, N., Goldwater, S., Cao, X-N., and Dupoux, E. (2019). Early phonetic learning without phonetic categories: Insights from machine learning. Preprint:https://psyarxiv.com/fc4wh/

Smolensky, P. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2): 159–216.

Synnaeve, G., Schatz, T., and Dupoux, E. (2014). Phonetics embedding learning with side information. In 2014 IEEE Spoken Language Technology Workshop (SLT), 106–111.

Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., and Dupoux, E. (2015). A hybriddynamic time warping-deep neural network architecture for unsupervised acoustic modeling. Proc. INTERSPEECH 2015.

Turing, A. M. 1950. Computing machinery and intelligence. Mind 59(236): 433–460.

Vallabha, G., McClelland, J.L., Pons, F., Werker, J., and Amano, S. (2007). Unsupervisedlearning of vowel categories from infant-directed speech. Proceedings of the NationalAcademy of Sciences 104(33): 13273–13278.

Versteegh, M., Thiolliere, R., Schatz, T., Cao, X. N., Anguera, X., Jansen, A., & Dupoux, E.(2015). The Zero Resource Speech Challenge 2015. Proc. INTERSPEECH 2015.

Warstadt, A., Singh, A., & Bowman, S. R. (2019). Neural Network Acceptability Judgments.Transactions of the Association for Computational Linguistics 7: 625–641.

Yeung, H., and Werker, J. (2009). Learning words’ sounds before learning how words sound: 9-month-olds use distinct objects as cues to categorize speech information. Cognition

113(2): 234–243.