Updated model 2020 : As outlined in the poster the test set for this model has been the votes of the winter session 2019 of the National Council. When making predictions for the spring and summer session 2020 it was found that a bias was introduced into the original Lasso model. This possibility has been described as a limitation in the poster. All information about votes to be taken in the National Council is only available after the vote took place. When developing a model the information about the outcome of the vote has therefore to be removed from the data set to ensure an unbiased prediction.
Removing a bias: It was found that the word “erledigt” (done/completed) that was originally included in the Lasso model has a significant negative association with outcomes. This word has now been removed from the data set in the same way as the previously removed words “accepted”, “rejected”, “written off” etc. All suggestive words from the final protocol have been removed. It cannot completely be excluded that other confounders remain in the model and this remains a limitation of this study.
Question: What exactly is the input data?
Answer: It depends on the type of political affair and of the course it has taken through the parliament. A motion (example) consists of a rather structured outline of the request including an explanation. A motion can be rejected as such. In this case all documents related to the motion as well as available are included. If the motion is accepted the Federal Council has to give an answer that can be opposed by one of the chambers. For these votes the statement of the Federal Council and all amended reports from commissions and affair summaries are taken into account as input data.
For parliamentary initiatives (example) an initial vote on entering the debate takes place and all related documents are taken into account. If the affair is entered an extensive legislative process with votes taken on small changes usually follows. These intermediate steps/votes are not included in the model but only the final vote on the „matter of the final vote“ (in German: Schlussabstimmungstext) containing the final legislation together with the amended reports of the commissions, affair summaries as well as the statement of the Federal Council.
In some cases, reports from external experts are also included.
The idea was to train the model based on a broad background of information and to put the model into the same spot as a politician (so to speak) who when voting not only takes into account the specific legislation but also background information.
The above described legislative proceedings are still a gross simplification, a more detailed flowchart can be found on the site of the Swiss parliament:
The schema for motions in the Swiss parliament.
The schema for parliamentary initiatives.
Question: Is this model based on NLP?
Answer: First of all, what is Natural Language Processing (Wikipedia: NLP)?
NLP is a specialized branch of machine learning that analyzes, classifies and creates texts.
In this study standard NLP techniques have been applied . All documents were collected and a so-called tokenizer was applied to extract roughly 46,000 unique words out of the whole corpus of text. In the updated model from summer 2020 individual tokens as well as sequences of two and three words were included. The text was then transformed creating a (huge, sparse) one-hot matrix for every record indicating if a particular term is used in that record or not.
Remark: In the updated version the original Lasso model described in the poster has been replaced by a a Gradient Boosting Classifier as described in the tag ‚Corrigendum‘.
The challenge: As mentioned in the poster many votes on incremental adaptations of articles are excluded from the data set. This means that the model has not been trained on these votes and therefore cannot predicting their outcome either. This meant to exclude the main type of affairs handled in the parliament, the „Affairs of the Federal Council“ (Geschäfte des Bundesrats). In these cases, the parliament cannot reject the substance as a whole but can only improve it until it’s finally released as new legislation.
The remaining question: How could a machine learning model or possibly a reinforcement learning model be designed to include the votes on small adaptations at the level of articles while still considering the whole background of the political matter?
Any input on how to address this question would be highly appreciated.