Predictions (winter session 2019 until autumn session 2020): The accuracy of the prediction of the updated Gradient Boosting Classifier (GBC) model for the votes from the winter session 2019, spring, summer and autumn session 2020 is 83% as shown in the graph at the right side. The GBC model has replaced the original Lasso model described in the poster. Please also refer to the section ‚update‘.
Remark: Due to Covid19 the spring, summer and autumn sessions 2020 were particular in the sense that the overall majority of votes was taken on ‚Affairs of the Federal Council‘ (415 out of 484 in the summer session alone). These types of votes have been excluded because they always end up being released after a final approval (i.e. ‚yes‘). They are therefore too predictive to be modeled (please also refer to the poster and the selection of the types of votes as well as to the section ‚my question‘). Therefore, when combining the winter 2019 , spring, summer and autumn sessions 2020 only 76 (unique per subject) votes took place to be included in the test set. The details of the individual predictions and outcomes of the votes are shown in the table at the right side.
The meaning of yes and no in the votes and predictions: It is important to note that the true outcome of the vote displayed in the table above doesn’t always correspond to the effective outcome of the vote taken by the parliament. Very often the parliament does not vote directly on the matter at hand but on a recommendation of a parliamentary commission or of the Federal Council. For instance, if a commission recommends to reject a matter and the outcome of the parliamentary vote is „no“ , the matter as such is accepted. Accordingly, the model has been trained to predict whether a matter as such is accepted (yes) or rejected (no) ignoring the recommendations.
Updated model 2020 : As outlined in the poster the test set for this model has been the votes of the winter session 2019 of the National Council. When making predictions for the spring and summer session 2020 it was found that a bias was introduced into the original Lasso model. This possibility has been described as a limitation in the poster. All information about votes to be taken in the National Council is only available after the vote took place. When developing a model the information about the outcome of the vote has therefore to be removed from the data set to ensure an unbiased prediction.
Removing a bias: It was found that the word “erledigt” (done/completed) that was originally included in the Lasso model has a significant negative association with outcomes. This word has now been removed from the data set in the same way as the previously removed words “accepted”, “rejected”, “written off” etc. All suggestive words from the final protocol have been removed. It cannot completely be excluded that other confounders remain in the model and this remains a limitation of this study.
Question: What exactly is the input data?
Answer: It depends on the type of political affair and of the course it has taken through the parliament. A motion (example) consists of a rather structured outline of the request including an explanation. A motion can be rejected as such. In this case all documents related to the motion as well as available are included. If the motion is accepted the Federal Council has to give an answer that can be opposed by one of the chambers. For these votes the statement of the Federal Council and all amended reports from commissions and affair summaries are taken into account as input data.
For parliamentary initiatives (example) an initial vote on entering the debate takes place and all related documents are taken into account. If the affair is entered an extensive legislative process with votes taken on small changes usually follows. These intermediate steps/votes are not included in the model but only the final vote on the „matter of the final vote“ (in German: Schlussabstimmungstext) containing the final legislation together with the amended reports of the commissions, affair summaries as well as the statement of the Federal Council.
In some cases, reports from external experts are also included.
The idea was to train the model based on a broad background of information and to put the model into the same spot as a politician (so to speak) who when voting not only takes into account the specific legislation but also background information.
The above described legislative proceedings are still a gross simplification, a more detailed flowchart can be found on the site of the Swiss parliament:
Answer: First of all, what is Natural Language Processing (Wikipedia: NLP)?
NLP is a specialized branch of machine learning that analyzes, classifies and creates texts.
In this study standard NLP techniques have been applied . All documents were collected and a so-called tokenizer was applied to extract roughly 46,000 unique words out of the whole corpus of text. In the updated model from summer 2020 individual tokens as well as sequences of two and three words were included. The text was then transformed creating a (huge, sparse) one-hot matrix for every record indicating if a particular term is used in that record or not.
Remark: In the updated version the original Lasso model described in the poster has been replaced by a a Gradient Boosting Classifier as described in the tag ‚Corrigendum‘.
The challenge: As mentioned in the poster many votes on incremental adaptations of articles are excluded from the data set. This means that the model has not been trained on these votes and therefore cannot predicting their outcome either. This meant to exclude the main type of affairs handled in the parliament, the „Affairs of the Federal Council“ (Geschäfte des Bundesrats). In these cases, the parliament cannot reject the substance as a whole but can only improve it until it’s finally released as new legislation.
The remaining question: How could a machine learning model or possibly a reinforcement learning model be designed to include the votes on small adaptations at the level of articles while still considering the whole background of the political matter?
Any input on how to address this question would be highly appreciated.