A course taught at the 34TH EUROPEAN SUMMER SCHOOL IN LOGIC, LANGUAGE AND INFORMATION (ESSLLI 2023).

General Information

Lecturers Timothée Bernard and Pascal Amsili
Area Language and Computation (LaCo)
Level Foundational
Abstract This course aims to provide an introduction to the fields of formal grammars and syntactic parsing, with a focus on their application to natural language. We introduce the concepts of formal language, formal grammar and automaton, and the notion of complexity reflected by the Chomsky-Schützenberger hierarchy. We present how natural language and popular syntactic formalisms fit into this picture. We review at length the evolution of parsing algorithms for natural language, from the classic chart-based paradigm to contemporary shift-reduce parsers, graph-based algorithms, and CCG parsing. We discuss the impact on the field of the advances in machine-learning and introduce some of the key aspects of neural-based parsers and in particular the use of the vector representations of linguistic units produced by language models.

Content

Additional or selected references
Day 1 Formal languages and syntactic complexity. (Partee et al., 2012), (Sipser, 2006)
Day 2 The complexity of natural language. (Pullum & Gazdar, 1982), (Shieber, 1985), (Joshi, 1985)
Day 3 Historic algorithms for parsing. For readers of French: (Yvon & Demaille, 2016)
Day 4 Modern approaches to parsing. (Haruta et al., 2022), (Steedman & Baldridge, 2011)
Day 5 Neural networks and error propagation.

References

  • Bar-Hillel, Yehoshua, Micha Perles & Eliahu Shamir. 1961. On formal properties of simple phrase structure grammars. STUF-Language Typology and Universals 14(1-4). 143–172.
  • Bangalore, Srinivas & Aravind K. Joshi. 1999. Supertagging: An Approach to Almost Parsing. Computational Linguistics 25(2). 237–265. https://aclanthology.org/J99-2004.
  • Baucom, Eric, Levi King & Sandra Kübler. 2013. Domain Adaptation for Parsing. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, 56–64. Hissar, Bulgaria: INCOMA Ltd. Shoumen, BULGARIA. https://aclanthology.org/R13-1008.
  • Bojanowski, Piotr, Edouard Grave, Armand Joulin & Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5. 135–146. https://doi.org/10.1162/tacl_a_00051.
  • Chen, Danqi & Christopher Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 740–750. Doha, Qatar: Association for Computational Linguistics. http://www.aclweb.org/anthology/D14-1082.
  • Chomsky, Noam. 1995. The minimalist program. Cambridge, Mass.: MIT Press.
  • Collins, Michael. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 1–8. Association for Computational Linguistics. https://doi.org/10.3115/1118693.1118694.
  • Collins, Michael & Brian Roark. 2004. Incremental Parsing with the Perceptron Algorithm. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 111–118. Barcelona, Spain. https://doi.org/10.3115/1218955.1218970.
  • Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423.
  • Dozat, Timothy & Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=Hk95PK9le.
  • Gaddy, David, Mitchell Stern & DanKlein.2018. What’s Going On in Neural Constituency Parsers? An Analysis. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 999–1010. New Orleans, Louisiana: Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1091.
  • Gibson, Edward & James Thomas. 1999. Memory limitations and structural forgetting: the perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes 14(3). 225–248.
  • Goldberg, Yoav & Joakim Nivre. 2012. A Dynamic Oracle for Arc-Eager Dependency Parsing. In Proceedings of COLING 2012, 959–976. Mumbai, India: The COLING 2012 Organizing Committee. https://www.aclweb.org/anthology/C12-1059.
  • Gupta, Abhijeet, Gemma Boleda, Marco Baroni & Sebastian Pad\'o. 2015. Distributional vectors encode referential attributes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 12–21. Lisbon, Portugal: Association for Computational Linguistics. https://doi.org/10.18653/v1/D15-1002.
  • Haruta, I., Mineshima, K., & Bekki, D. (2022). Implementing Natural Language Inference for comparatives. Journal of Language Modelling, 10(1), 139–191. https://doi.org/10.15398/jlm.v10i1.294
  • Hockenmaier, Julia & Mark Steedman. 2007. CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank. Computational Linguistics 33(3). 355–396. https://doi.org/10.1162/coli.2007.33.3.355.
  • Huang, Liang, Suphan Fayong & Yang Guo. 2012. Structured Perceptron with Inexact Search. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 142–151. Montr ́eal, Canada: Association for Computational Linguistics. https://aclanthology.org/N12-1015.
  • Joshi, A. (1985). Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives (Studies in Natural Language Processing, pp. 206-250). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511597855.007
  • Köhn, Arne. 2015. What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2067–2073. Lisbon, Portugal: Association for Computational Linguistics. https://doi.org/10.18653/v1/D15-1246.
  • Kudo, Taku & Yuji Matsumoto. 2002. Japanese Dependency Analysis using Cascaded Chunking. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). https://aclanthology.org/W02-2016.
  • Lê, Minh & Antske Fokkens. 2017. Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 677–687. Valencia, Spain: Association for Computational Linguistics. https://www.aclweb.org/anthology/E17-1064/.
  • Lewis, Mike & Mark Steedman. 2014. A* CCG Parsing with a Supertag-factored Model. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 990–1000. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1107.
  • Lowerre, Bruce T. 1976. The HARPY Speech Recognition System. Pittsburgh, PA, USA: Carnegie Mellon University Ph.D. Thesis. https://apps.dtic.mil/docs/citations/ADA035146.
  • Marcus, Mitchell P., Beatrice Santorini & Mary Ann Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2). 313–330. https://www.aclweb.org/anthology/J93-2004.
  • de Marneffe, Marie-Catherine, Christopher D. Manning, Joakim Nivre & Daniel Zeman. 2021. Universal Dependencies. Computational Linguistics 47(2). 255–308. https://doi.org/10.1162/coli_a_00402.
  • McDonald, Ryan, Fernando Pereira, Kiril Ribarov & Jan Haj\v{\i}c. 2005. Non-Projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 523–530. Vancouver, British Columbia, Canada: Association for Computational Linguistics. https://aclanthology.org/H05-1066.
  • Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado & Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani & K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 26, 3111–3119. Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
  • Mrini, Khalil, Franck Dernoncourt, Quan Hung Tran, Trung Bui, Walter Chang & Ndapa Nakashole. 2020. Rethinking Self-Attention: Towards Interpretability in Neural Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020, 731–742. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.65.
  • Nivre, Joakim. 2005. Dependency Grammar and Dependency Parsing. Tech. rep. MSI 05133. Växjö University, School of Mathematics & Systems Engineering. http://stp.lingfil.uu.se/~nivre/docs/05133.pdf.
  • Nivre, Joakim. 2010. Dependency Parsing. Language and Linguistics Compass 4(3). 138–152. https://doi.org/10.1111/j.1749-818X.2010.00187.x. (29 November, 2021).
  • Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji\v{\i}c, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty & Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 1659–1666. Portoroz, Slovenia: European Language Resources Association (ELRA). https://aclanthology.org/L16-1262.
  • Partee, Barbara BH, Alice G. ter Meulen, and Robert Wall. Mathematical methods in linguistics. Vol. 30. Springer Science & Business Media, 2012.
  • Pennington, Jeffrey, Richard Socher & Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. http://www.aclweb.org/anthology/D14-1162/.
  • Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee & Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–2237. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202.
  • Pullum, Geoffrey K., and Gerald Gazdar. "Natural languages and context-free languages." Linguistics and Philosophy 4 (1982): 471-504.
  • Sagae, Kenji & Alon Lavie. 2005. A Classifier-Based Parser with Linear Run-Time Complexity. In Proceedings of the Ninth International Workshop on Parsing Technology, 125–132. Vancouver, British Columbia: Association for Computational Linguistics. https://www.aclweb.org/anthology/W05-1513.
  • Shieber, Stuart M. "Evidence against the context-freeness of natural language." Linguistics and Philosophy 8.3 (1985): 333-343.
  • Sipser, Michael. Introduction to the theory of computation. Second Edition. Boston: Thomson Course Technology, 2006.
  • Sutton, Richard S. & Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. Second Edition (Adaptive Computation and Machine Learning series). MIT Press. https://mitpress.mit.edu/books/reinforcement-learning-second-edition.
  • Stabler, Edward P. 2011. Computational Perspectives on Minimalism. In Cedric Boeckx (ed.), The Oxford Handbook of Linguistic Minimalism. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199549368.013.0027.
  • Steedman, Mark, et Jason Baldridge. « Combinatory Categorial Grammar ». In Non-Transformational Syntax, édité par Robert D. Borsley et Kersti Börjars, 181‑224. Wiley-Blackwell, 2011. https://doi.org/10.1002/9781444395037.ch5.
  • Steedman, Mark. 2000. The syntactic process. Vol. 24. MIT Press.
  • Tenney, Ian, Dipanjan Das & Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4593–4601. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1452.
  • Tenney, Ian, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das & Ellie Pavlick. 2019. What do you learn from context? Probing for sentence structure in contextualized word representations. In International Conference on Learning Representations. https://openreview.net/forum?id=SJzSgnRcKX.
  • Tian, Yuanhe, Yan Song, Fei Xia & Tong Zhang. 2020. Improving Constituency Parsing with Span Attention. In Findings of the Association for Computational Linguistics: EMNLP 2020, 1691–1703. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.153.
  • Vijay–Shanker, K. & David J. Weir. 1994. The equivalence of four extensions of context–free grammars. Mathematical Systems Theory 27. 511–546.
  • Yvon, François & Demaille, Akim. 2016. Théorie des langages. Notes de cours [in French]. Link.