Applying NLP Techniques to TC39s meeting notes


This report details a coding project aimed at leveraging Natural Language Processing (NLP) techniques to extract and analyze information from TC meeting notes, a crucial aspect of the ECMAScript standardization process. The background provides context on ECMAScript, TC39, and the significance of meeting notes in shaping web development. The implementation outlines the meticulous process of text extraction and the application of various NLP techniques, from sentiment analysis to semantic role labeling. The resulting data structure in JSON format offers a clear representation of the extracted information, while a sentiment graph visually depicts emotional dynamics within proposals. The project aligns with a broader goal of enhancing transparency and collaboration within the ECMAScript standardization process, empowering developers with nuanced insights into language changes and committee discussions.


The ECMAScript standard stands as a foundational element, providing the guidelines upon which the scripting language JavaScript is built. The continuous improvement of ECMAScript is steered by the TC39 committee, a vital body within Ecma International. This committee, comprised of representatives from diverse organizations, plays a pivotal role in shaping the standard, ensuring its consistency and adaptability across various implementations.

The focus of this coding project is to explore Natural Language Processing (NLP) techniques for extracting and analyzing information embedded within the meeting notes of the TC39 committee. The meeting notes serve as a comprehensive record of discussions, decisions, and proposals, reflecting the dynamic nature of the ECMAScript standard evolution.

This report details the background of ECMAScript and the role of the TC committee, emphasizing the significance of their meeting notes in tracking the development of the language. Subsequently, it delves into some NLP techniques, ranging from sentiment analysis to semantic role labeling, highlighting their relevance in understanding the nuances of textual content.

The implementation section provides a step-by-step walkthrough of the script’s design, explaining how text extraction, timestamp identification, proposal section recognition, and utterance processing are performed. The utilization of frameworks such as TextBlob, Universal Sentence Encoder, Yake, and others is detailed, showcasing a first draft approach to information extraction and analysis.

Visual representation in the form of sentiment graphs enhances the interpretability of the data, allowing for a deeper understanding of sentiment dynamics within each proposal. Additionally, the JSON format output provides a structured and readable overview of the processed data, facilitating further analysis or sharing of results.

This project aims to create a tool that applies NLP techniques on TC39’s meeting notes. It has the goal of extracting valuable information from the corpus that can provide deeper understanding about how a proposal is discussed, participants’ attitudes towards it and how this changes over time.



ECMA-262 is a scripting language specification that serves as the standard upon which JavaScript is based. It is developed and maintained by Ecma International, a standards organization. ECMAS-262 provides the rules and guidelines that a scripting language must follow to be considered ECMAScript- compliant.

JavaScript is the most well-known implementation of ECMAScript, but other languages like JScript and ActionScript also adhere to the ECMAScript standard. The goal of ECMA-262 is to standardize the scripting language to ensure interoperability and consistency across different web browsers and environments.

The ECMAScript specification evolves over time, with new features and improvements being added to meet the demands of developers and the evolving landscape of web development. Each version of ECMAScript introduces new features, enhancements, and bug fixes. Developers often refer to the different versions of ECMAScript by their edition number, such as ECMAScript 6 (ES6) or ECMAScript 2015, which brought significant enhancements to the language. Subsequent editions, like ECMAScript 2016, ECMAScript 2017, and so on, have continued to build upon the standard. With the most recent one at the time of writing being the 14th Edition, ECMAScript 2023.

TC39 committee

The TC39 (Technical Committee 39) is a committee within Ecma International responsible for the standardization of the ECMAScript programming language. The primary goal of TC39 is to develop, maintain, and evolve the ECMAScript standard.

TC39 is composed of representatives from various organizations, including browser vendors, language designers, interested parties from the software development community, and academia. The committee collaborates to propose and discuss new features, improvements, and changes to ECMAScript.

The process of introducing a new feature or modifying an existing one typically involves several stages within TC39:

  1. Stage 0: An initial idea or proposal is presented as a strawman. This is an informal stage to get feedback and initial thoughts from the committee.

  2. Stage 1: The proposal is formalized, and its high-level design and motivation are presented to the committee. If accepted, it moves to the next stage.

  3. Stage 2: The proposal is further refined, and a preliminary specification is created. This stage involves more detailed discussions and collaboration on the proposed feature.

  4. Stage 3: The proposal is considered feature-complete, and a complete specification is provided. At this stage, it is ready for initial testing and feedback from implementers.

  5. Stage 4: The proposal has received feedback, has been tested, and is ready to be included in the ECMAScript standard. Once the committee reaches consensus, the feature is added to the standard.

The TC39 committee plays a crucial role in the ongoing development and improvement of ECMAScript, ensuring that the language evolves to meet the needs of developers and the changing landscape of web development. The committee’s work has a direct impact on the features and capabilities available to developers when writing JavaScript or other languages based on ECMAScript.

Meeting Notes

The TC39 meeting notes are documents that summarize the discussions, decisions, and outcomes of the committee’s meetings. These notes provide a detailed record of what was discussed during a particular meeting, including proposed language features, changes to the ECMAScript standard, and any other relevant topics.

Here are some key points about these meeting notes:

  1. Agenda and Topics: Meeting notes typically include an agenda that outlines the topics to be discussed during the meeting. This could include specific proposals for new language features, updates on existing proposals, discussions about language design principles, and more.

  2. Attendees: The notes often list the participants who attended the meeting, including representatives from various organizations, language designers, and interested parties. This provides transparency about who is contributing to the discussions.

  3. Discussion and Decisions: For each agenda item, the notes summarize the discussions that took place. This includes the viewpoints expressed by different participants, potential concerns, and any decisions or outcomes reached by the committee. It provides insight into the reasoning behind the decisions made during the meeting.

  4. Proposal Updates: If there are updates on specific language proposals (features being considered for inclusion in ECMAScript), the meeting notes will highlight these updates. This could include advancements to a higher stage in the proposal process or changes based on feedback received.

  5. Actions and Next Steps: The notes often include action items and next steps that arise from the discussions. These could involve further research, addressing concerns, or preparing materials for the next meeting.

  6. Links to Materials: Meeting notes may include links to additional materials, such as presentation slides, documents, or external references that were discussed during the meeting.

By reviewing these meeting notes, developers, implementers, and other interested parties can stay informed about the ongoing work of the TC committee. It allows the broader community to understand the rationale behind language changes, track the progress of specific proposals, and provide feedback on the evolving ECMAScript standard.

Markdown Files In the GitHub repository, at the root level of the GitHub repository, there may be various files and folders related to the TC39 project. Among them, there is a designated folder where meeting notes are stored, called “meetings”.

This folder contains folders representing each month of each year where there has been a meeting, meaning one folder for every two months since May of 2012. In each of these folders are the meeting notes, as well as other relevant files for the meeting,

However, the only relevant files for this project, is the meeting notes themselves.

Formatting of the meeting notes The meeting notes are formatted in such a way that each utterance can be tied to a specific person. In this way, what each person has to contribute to the current proposal is easily distinguishable from the other people involved in the meeting. It can be broken down in the following way.

  1. Speaker’s Acronym: The three-letter acronym at the beginning of each line represents the identifier of the person speaking. These acronyms are usually unique to each participant and are used consistently throughout the meeting notes.

  2. Colon (:) Separator: The colon serves as a separator between the speaker’s acronym and the content of their utterance. It visually distinguishes the speaker from their comment.

  3. Utterance Content: Following the colon, the actual content of what the person is saying is presented. This is the substance of the participant’s contribution to the discussion, and it could include statements, questions, proposals, concerns, or any other relevant remarks.

Here’s an example fromfeb-01.mdin the2023-01-folder:

ABC: Just a note to SYG to follow up with offline and to everyone interested in implementing this and trying implementation...

DEF: Ephemeron collection.

ABC: Thank you. I was trying to remember the word. By doing the transpose thing, the case that needs to be cheap becomes cheap.

DEF: So there are a couple different implementation strategies. Trade off, the big O notation of the run, the get, or the wrap. (...)

In this example, ABC and DEF are three-letter acronyms representing different participants. After the colon, each line presents the content of the participant’s utterance or comment.

An Overview of NLP techniques

  1. Sentiment Analysis is a natural language processing technique designed to discern and quantify the emotional tone expressed in a piece of text, typically categorized as positive, negative, or neutral sentiment. This process involves the use of machine learning algorithms to analyze words and phrases within a context, considering linguistic nuances and variations. Sentiment analysis is particularly valuable in the business realm for gauging customer satisfaction through reviews and social media comments. Additionally, it aids in monitoring public sentiment towards products, services, or brands, helping organizations make informed decisions based on the prevailing attitudes within the target audience. (Devopedia. 2022.)

  2. Named Entity Recognition (NER) is a crucial component of information extraction in natural language processing. It involves identifying and classifying entities such as names of people, organizations, locations, dates, and other specific terms within a given text. NER systems employ machine learning algorithms that are trained on annotated datasets to accurately locate and categorize these entities. Applications of NER range from extracting structured information from unstructured text, improving search engine capabilities, to facilitating question-answering systems by identifying key entities within a document. (Devopedia. 2020.)

  3. Semantic Role Labeling (SRL) is a semantic parsing task that focuses on understanding the relationships between different elements in a sentence by assigning specific roles to words or phrases, such as identifying the agent, patient, or beneficiary in a given action. This technique goes beyond traditional syntactic parsing to capture the deeper meaning and roles of each component within a sentence. SRL is instrumental in tasks requiring a nuanced understanding of natural language, including machine translation, question answering, and sentiment analysis, where discerning the roles of entities is crucial for accurate interpretation. (Devopedia. 2020.)

  4. Part of Speech Tagging POS tagging, assigning part-of-speech tags to words, tackles ambiguity in natural language processing by resolving multiple meanings based on context. Originally linguistic, POS taggers transitioned to a statistical approach with models achieving over 97% accuracy. This pre-processing step is fundamental in NLP, supporting applications such as information retrieval, named entity recognition, and text-to-speech systems.(Devopedia. 2019.)

  5. Text Summarization is a text processing technique that aims to distill the essential information from a document while preserving its core meaning. There are two main types of summarization: extractive , which selects and combines existing sentences, and abstractive , which generates new sentences to convey the summarized content. Summarization finds applications in news articles, research papers, and document management, providing a concise overview of lengthy texts and aiding in information retrieval and decision-making processes. (Devopedia. 2020)

  6. Semantic Similarity quantifies the likeness between two pieces of text based on their meaning rather than relying solely on lexical or syntactic similarity. These measures take into account the context, semantics, and relationships between words, enabling a more nuanced understanding of similarity. Semantic similarity is applied in various NLP tasks, including duplicate detection, document clustering, and recommendation systems. By capturing the underlying meaning of text, semantic similarity enhances the accuracy and relevance of systems that require matching or grouping textual information. (Harispe et al., 2015)

  7. Keyword Extration involves identifying and extracting the most relevant and significant words or phrases from a given text. This process helps to distill the key themes, topics, or concepts within a document, enabling a more concise representation of its content. NLP algorithms use various techniques, such as statistical analysis, natural language processing, and machine learning, to determine the importance of words based on their frequency, context, and relationships within the text. Ultimately, keyword extraction aids in summarizing and understanding the essential information contained in a body of text. (Beliga et al., 2015)


In this project, i applied sentiment analysis, semantic similarity and keyword extraction on the meeting notes corpus. To extract relevant data and produce plots from the meeting notes, this seemed sufficient.

For this entire project, i’ve chosen to use python for the implementation of the techniques. This is mainly due to the vast amount of libraries available, but also because of the increased readability by most developers. In addition, it is also the language i’m the most confident in.

You can find the code for the implementation in my Github repository, which is listed in the references section. (Engelsen, 2023).

Text extraction

  1. File Selection: The script uses the globmodule, which is a python module that can be applied to identify path names that fits a specified pattern, to identify markdown files within the specified directory, excluding certain files like “” or “” This ensures that only relevant files are considered for processing, narrowing down the scope of analysis.

  2. Reading Markdown Files: For each markdown file, the script opens and reads its content by using python’s builtinopenfunction and read method. The custom process_markdown_filefunction encapsulates this operation, facilitating the extraction of text from individual markdown files.

  3. Timestamp Extraction: The code extracts timestamps from markdown file names by employing regular expressions to recognize patterns like It maps these patterns to corresponding month numbers and the current year, generating accurate timestamps for each proposal section.

  4. Proposal Section Extraction: Within each markdown file, the script identifies proposal sections by applying a regular expression (proposal_section_pattern). The re.findall function is used to extract titles of these proposal sections, forming a list of titles for subsequent processing.

  5. Utterance and Text Extraction: For each proposal section, the script iterates through its titles, extracting the corresponding text. It filters out irrelevant sections based on predefined criteria using theisDumbTitle function. Relevant text is then extracted by slicing the content based on the position of the title in the markdown text.

  6. Text Cleaning: Extracted proposal text undergoes cleaning through regular expressions, removing undesirable information such as presenter details and slide references. Patterns like presenter names and slide references are identified and eliminated using the re.sub function, ensuring the text is focused on the core content.

  7. Utterance and Sentence Processing: The script processes the proposal text by splitting it into utterances using a regular expression (utterance_pattern) to identify speaker contributions. Each utterance is further divided into sentences using there.splitfunction. Sentences are processed individually, and keyword extraction is performed using the Yake library.

Proposal Dictionary Object

The proposal dictionary encapsulates essential information about a specific proposal extracted from the meeting notes. Here is an explanation of properties of the object

  • title: This field stores the title of the proposal, providing a concise identifier for the proposal’s subject matter.

  • timestamp: Represents the timestamp associated with the proposal, typically extracted from the markdown file name.

  • utterances: This is a list containing individualutterancedictionary objects. Each utterance corresponds to a section of the proposal where a distinct speaker contributes.

  • full text: The entire content of the proposal is stored here, facilitating comprehensive analysis and comparisons.

Utterance Dictionary Object

Theutteranceobject represents a speaker’s contribution to the discussion which contains the utterance. Here is an explanation of properties of the object:

  • utterance_number: The chronological number of the utterance within the proposal.

  • timestamp: Carries the timestamp associated with the proposal. This will correspond to the date of the meeting, which is a date translated from the name of the markdown file.

  • sentences: This list contains the list of individualsentencedictionary objects, each representing a sentence within the utterance.

  • polarity: Represents the overall sentiment polarity of the entire utterance.

  • subjectivity: Reflects the subjectivity of the utterance as a whole.

  • keywords: Stores keywords extracted from the utterance using the Yake library, providing insights into the main topics.

Sentence Dictionary Object

Thesentenceobject represents an individual sentence within an utterance. Here is an explanation of properties of the object

  • sentence_number: A unique identifier for each sentence within an utterance.

  • text: a string containing the contents of the individual sentence.

  • polarity: The sentiment polarity of the sentence.

  • subjectivity: The subjectivity of the sentence.

Frameworks Used

  • TextBlob TextBlob is a library that simplifies common natural language processing tasks. In this script, it is employed for basic sentiment analysis, allowing the determination of the sentiment polarity and subjectivity of both sentences and entire utterances. (Loria, 2020)

  • Universal Sentence Encoder TensorFlow’s Universal Sentence Encoder (USE) is a pre-trained model that converts text into high-dimensional vectors. This script uses USE to generate embeddings for sentences, enabling the calculation of semantic similarity between different texts. (Cer et al., 2018)

  • Yake Yake is a keyword extraction library that identifies significant keywords within a given piece of text. In this script, Yake is utilized to extract keywords from each utterance, aiding in the understanding of the main topics discussed. (Campos et al., 2020)

  • Matplotlib Matplotlib is a plotting library in Python. In this script, it is used to create sentiment analysis plots. These plots visually represent how sentiment changes over the course of utterances in a proposal. (Hunter, 2007)

  • Regular Expression Regular expressions are applied for pattern matching and extraction. In this context, they help identify specific sections of markdown files and clean proposal texts by removing irrelevant information such as presenter details and slides. (Python Software Foundation, 2023)

  • TensorFlow TensorFlow is an open-source machine learning framework, and in this script, it is used to load and leverage a pre-trained model for encoding sentences into meaningful vectors. In this code, it is utilized to load and utilize the Universal Sentence Encoder model. (Abadi et al., 2015)

Sentiment Graph The sentiment graph is generated using Matplotlib and serves to visually depict the sentiment dynamics within a proposal. Here is an explanation of the graph-properties.

  • X-axis: Represents individual utterances within the proposal.

  • Y-axis: Depicts the sentiment polarity, showcasing shifts in sentiment from positive to negative between -1.0 and 1.0. -1.0 means completely negative, 0 means completely neutral and 1.0 means completely positive.

  • Highlights: Points on the graph highlight utterances with particularly high positive or negative sentiment, providing a quick overview of sentiment peaks and troughs. The utterances that is the cause of this peak, is written out in a JSON-file.

For each proposal in the proposals-list, a sentiment graph is plotted to visualize the sentiment for each utterance on that proposal, and how the sentiment might change over the course of number of utterances.

Proposal Data Structure in JSON Format At the conclusion of the script, the data structure of each proposal is printed in JSON format. This output provides a detailed view of the processed data, including titles, timestamps, utterances, and full text. JSON format is chosen for its readability and ease of inspection, making it convenient for further analysis or sharing of results.

Figur 1: Example of a sentiment graph of a proposal object.

Stanford NLP (Stanza)

Stanford NLP, now known as Stanza, is a robust natural language processing library developed by the Stanford NLP Group. It provides a suite of state-of-the- art tools for various language processing tasks, including tokenization, part-of- speech tagging, named entity recognition, and dependency parsing. Stanza offers pre-trained models for multiple languages, enabling users to perform complex linguistic analyses with ease. One of its key strengths lies in its deep integration with deep learning techniques, resulting in high accuracy and efficiency across a range of NLP tasks. Its focus on multilingual support makes it a versatile choice for researchers and developers working with diverse linguistic datasets. (Peng et al., 2020)

Spacy NLP

Spacy is a popular open-source natural language processing library designed for efficiency and ease of use. It excels in providing fast and accurate linguistic annotations, including tokenization, part-of-speech tagging, named entity recognition, and dependency parsing. Spacy’s streamlined API and pre-trained models make it user-friendly for both beginners and experienced developers. It is known for its efficiency, allowing for real-time application in various contexts. Spacy also supports custom model training, enabling users to adapt it to domain-specific language patterns. Overall, Spacy is a versatile tool for NLP tasks, striking a balance between performance and simplicity. (Honnibal et al., 2020)

Hugging Face Sentence Transformer (all-MiniLM-L6-v2)

Hugging Face’s Sentence Transformer library, specifically the model “all- MiniLM-L6-v2,” is a part of the broader Transformers library. It is developed by Hugging Face, a platform that hosts a vast collection of pre-trained models for natural language processing tasks. The Sentence Transformer model excels in creating embeddings for sentences or text snippets, making it valuable for tasks such as semantic similarity and information retrieval. “all-MiniLM-L6-v2” refers to the specific architecture and version of the MiniLM model used in this implementation. The Hugging Face Transformers library simplifies the integration of advanced transformer models into various NLP applications, fostering accessibility and innovation in the field. (Hugging Face, n.d)

Conclusion & Future Work

The conclusion i can draw from this project, is that the best approach to extracting usable data from the meeting notes, is to use an ensemble of different NLP libraries and techniques. Only using a single pretrained model is too inadequate for the purpose of this project.

The plots produced by the implementation in this project seem to all have a positive bias. The average sentiment from all the plots, by qualitative measure, can be estimated to be between 0.5 and 0. While there are some peaks in the graphs in both negative and positive direction, the sentiment analysis estimates most utterances to be either neutral or slightly positive.

It should be acknowledged, however, that the greatest challenge of this project has been to find a way to accurately estimate whether two, or more, discussions are talking about the same proposal. The implementation in this project is not as nuanced as it could be, and as a result the utterances measured in each proposal might not completely reflect the true evolution of sentiment of each proposal.


The natural next steps to improve this implementation would be to improve the runtime. As of now, on an average desktop, it takes between 36 and 48 hours to create sentiment graphs for all proposals mentioned in the meeting notes, dating back to 2016. As the list of proposals in theproposals-list becomes longer, the runtime increases, due to an accumulation of unique elements.

Initially, my thought was to concatenate the fulltext of each proposal that meets the similarity threshold, to increase the accuracy of the semantic similarity calculation. However, the script eventually stopped due to lack of available RAM.

A potential avenue of investigation could be to create summaries of the full texts of each proposal and concatenate those for accurate comparison.

Unique identifier for each proposal

The greatest challenge of this project was how to determine if two sections in the meeting notes are actually discussing the same proposal. My thought was that if the semantic similarity between the two sections are above a certain threshold, they must talk about the same thing. However, this is not necessarily the case. The following scenario might be the case. A unique proposal is discussed in 2017. Later, in 2019, a different unique proposal is discussed. The later proposal,

however, is completely dependent on the earlier proposal. The discussion of the later proposal therefore contains a lot of references and discussions about the earlier proposal. By only evaluating semantic similarity, these two sections would be deemed part of the same proposal, which is not the case.

Because of this type of dilemma, an improved way of estimating similarity is necessary to accurately determine if two sections are discussion pertaining to the same proposal.

My first suggestion is to give each unique proposal discussed at the meetings, a unique identifier, e.g. “AYD245”. Then every time the same proposal is discussed, the same identifier is applied to the section in the meeting notes. This way, the estimation is no longer dependent on meeting a threshold, but rather verifying the unique identifier.

My second suggestion is to use an ensemble of different NLP techniques to create a composite score of similarity. This way, there is more nuance involved in the estimation of similarity.


