Top Ranked Issues:
All daily proceedings recorded by Hansard were used to determine frequently mentioned words and phrases as well as top issues. Daily proceedings include government orders, member statements and, of course, question period. This was done in order to capture issues driven not only by the government through government orders, but also by the opposition through member statements and question period.
​
Government departments determined the general categories within which each issue fell. For example, there exists an “International Trade Diversification” department, “Employment, Workforce and Labour” department, “Environment and Climate change” department etc. A full listing of current government departments can be found at https://www.tpsgc-pwgsc.gc.ca/recgen/pceaf-gwcoa/1718/txt/rg-3-num-eng.html
​
Government department websites were scraped to identify common words and phrases used within a department as well as issues covered in these respective departments.
​
Once categories were defined and key words/phrases were identified, stop words (eg. "the," "to," "for," etc.) and other commonly used but non-informative words in the House of Commons (eg. "Canadians", "member", "parliamentary secretary" etc) were removed from the analysis. Words were not "stemmed" (i.e. reduced to their base).
​
Text mining then provided counts for all words used by political parties during the parliamentary session. Words most frequently mentioned by each political party were identified through text mining packages tm and corpus. The text count feature counted the frequency with which words were mentioned.
​
Top mentioned words were then organized into their government department categories, as determined through website scrapes. “Trade” was categorized under International Trade diversification, “Tax” was organized under “Finance”, “Indigenous” was organized under “Indigenous Services”, and so on.
Due to the ambiguous nature of some frequently mentioned words that were otherwise informative, these were removed from consideration for top mentioned words as well as from visualizations of top mentioned words (they do, however, remain in the raw data files, which are available for download). This was done to avoid over/underestimating the importance of an issue.
For example, consider the word "work", which could either be used to indicate jobs/employment or, to indicate that something is operating as it should. The former is an issue word; the latter is not. Assuming that a political party mentioned the word "work" entirely as it relates to jobs would result in overestimating jobs as an issue. On the other hand, excluding the word entirely would underestimate the importance of the issue.
Random samples using instances of ambiguous words were taken for additional context. Again, using “work” as an example, in cases where "work" does refer to jobs - random samples illustrate that the term is often preceded by words like "retain", "need", "find" and "meaningful". Hansard is then re-mined counting all occurrences of "retain work", "need work", “find work" and "meaningful work" etc.. This disentangles work as it relates to jobs from work as it relates to the effective operation of something.
Ambiguous word are then organized into their appropriate government department category.
Key phrases were determined using text counts with ordered sequences of two to seven words. Similar to text counts of singular words, a count of ordered words provided the frequency with which phrases were used.
Key phrases were also organized according to their government department categories. In cases where a government department category potentially covered more than one issue, key phrases were further coded to key issues. For example, the category of “Finance” covers taxes, the budget, the economy and more. A phrase like “debt-to-GDP” would fall under the category of “Finance” and the issue of economy whereas the phrase “base erosion and profit shifting” would fall under the category of Finance and the issue of taxes.
A final mining of non-top mentioned words was conducted and words were organized into their appropriate categories and issues. For example, “trade” which, for all parties was a top ten mentioned word, falls under the “International Trade Diversification” category. Words related to trade but that don’t make explicit use of the word trade include negotiated trade agreements like the new NAFTA and CPTPP as well as more commonly used trade words like “imports”, “exports” and “tariffs”.
Once all words and phrases were organized into their appropriate categories and issues, their counts were summed and ranked to determine top ranking categories and issues for the parliamentary session.