LING78100/73800: Methods in Computational Linguistics I

Fall 2022

CUNY Graduate Center

Instructor: Prof. Kyle Gorman
Practicum leader: Cameron Gibson
Lecture: Tuesday 4:15-6:15, GC 3207
Practicum: Friday 4:15-6:15, GC 7314
Office hours: Monday 1-3, GC 7400.05 and by request

Synopsis

This course is the first of a two-semester series introducing modern software development. The intended audience are students interested in speech and language processing technologies, though the materials will be beneficial to all language researchers.

Objectives

Using the Python programming language, students will be able to write programs which count the frequencies of various linguistic phenomena in text. They will be able to process text stored in various structured data formats. They will come to understand how computers encode multilingual text. They will learn the basic principles of command-line design and master regular expressions.

Materials

Some readings will be assigned. Students are strongly encouraged to bring a laptop computer to the lecture and practicum. Students are also welcome to use the Computational Linguistics Laboratory (7400.13) for practice and assignments.

Assignments

Assignments will take the form of a small software development projects accompanied by a write-up describing the general approach taken and any challenges encountered. Students will usually be able to verify the technical correctness of their code by running a provided unit test. Students will also be graded on the readability of their code, and the quality of the write-up. We will use GitHub Classroom for assignment turn-in.

The final assignment will be an open-ended project which will involve collecting basic statistics (e.g., counts) of some linguistic phenomenon from either raw text or structured data. Students are encouraged to conceive of projects relevant to their research interests. Students should discuss project plans with the instructor during office hours to confirm that it is both feasible and of appropriate scope. Because of the open-ended nature of the final assignment, unit tests will not be provided.

Grading

80% of students' grades will be derived from the assignments; the remaining 20% will be reserved for participation and attendance. Assignments must be submitted on time or will receive a 0 grade (barring a documented emergency).

Accommodations

The instructor will attempt to provide all reasonable accommodations to students upon request. If you believe you are covered under the Americans With Disabilities Act, please direct accommodations requests to Vice President for Student Affairs Matthew G. Schoengood.

Attendance

Students are extended to attend all lectures and practica. However, students who have reason to believe they may be contagious for COVID-19 or other infectious diseases should attend the course online after contacting the instructor. Other absences will not be excused, and the instructor reserves the right to tie grades to attendance records. The instructor and practicum leader are not responsible for reviewing materials missed to absence.

Integrity

In line with the Student Handbook policies on plagiarism, students are expected to complete their own work. However, a student is permitted to collaborate with another student during the coding phase of an assignment so long as they: do not share lines of code with each other, mutually disclose their collaboration in their write-ups, and do not collaborate at all on their write-ups.

The instructor reserves the right to refer violations to the Academic Integrity Officer.

Respect

For the sake of the privacy, students are asked not to record lectures. Students are expected to be considerate of your peers and to treat them with respect during class discussions.

Schedule

(Please note that this is subject to change.)

T 8/30 Lecture Syllabus and motivations Notes
Lecture
Bird et al. §1, Joyner §1, Shaw preface
F 9/2 No class Notes
T 9/6 Lecture Literals; variables; operators Notebook
Lecture
Joyner §2, Shaw §1-14
F 9/9 Practicum Notebook
Practicum
T 9/13 HW1 due
[solution]
Lecture Control flow Notebook Joyner §3-3.3, Shaw §27-33
F 9/16 Practicum Notebook
Practicum
T 9/20 HW2 due
[solution]
Lecture Indexing Notebook
Lecture
Joyner §4.2-4.3, Shaw §34, Shaw §36-38
F 9/23 Practicum Notebook
Practicum
T 9/27 No class
F 9/30 "Practicum" (Kyle) Functions Notebook
Practicum
Joyner §3.4, Shaw §18-19, Shaw §21
T 10/4 No class
F 10/7 Practicum Notebook
Practicum
T 10/11 HW3 due
[solution]
Lecture File I/O; generators; imports Notebook
Slides
Practicum
Bird et al. §3-3.2, Joyner §4.4, Shaw §15-17
F 10/14 Practicum Notebook
Practicum
T 10/18 HW4 due
[solution]
Lecture Text encoding Slides
Notebook
Lecture
Bird et al. §3.3, Gorman, Spolsky, chardet, unicodedata
F 10/21 Practicum Notebook
Practicum
random
T 10/25 Lecture Hash-backed containers; comprehensions Notebook
Lecture
Kuchling, collections, Joyner §4.5, Shaw §39
F 10/28 Practicum Notebook
Practicum
T 11/1 HW5 due
[solution]
Lecture Sorting; searching Notebook
Lecture
Joyner §5.2, Sorting HOWTO
F 11/4 Practicum Notebook
Practicum
T 11/8 HW6 due
[solution]
[style notes]
Lecture Regular expressions Slides
Notebook
Lecture
Bird et al. §3.4, Regular expression HOWTO, re
F 11/11 Practicum Notebook
Practicum
T 11/15 Lecture Module layout; command-line design Slides
Lecture
argparse, Church
F 11/18 Practicum Notes
Notebook
Practicum
T 11/22 HW7 due
[solution]
Lecture CSV; TSV; JSON; YAML Slides
Notebooks:
1 2 3 4
Lecture
csv, json, yaml
F 11/25 No class
T 11/29 Term paper idea
[specification]
Lecture Classes Notebook
Lecture
Joyner §5.1, Shaw §40-44
F 12/2 Practicum Notebook
Practicum
T 12/6 HW8 due
[solution]
Lecture Unit testing Slides
Notebook
Lecture
unittest
F 12/9 Practicum Notebook
Practicum
T 12/13 Lecture NLTK Slides:
1 2
Lecture
Bird et al. §5
W 12/14 Reading Day
M 12/26 Term paper due

References