Answer: It depends on the type of political affair and of the course it has taken through the parliament. A motion (example) consists of a rather structured outline of the request including an explanation. A motion can be rejected as such. In this case all documents related to the motion as well as available are included. If the motion is accepted the Federal Council has to give an answer that can be opposed by one of the chambers. For these votes the statement of the Federal Council and all amended reports from commissions and affair summaries are taken into account as input data.
For parliamentary initiatives (example) an initial vote on entering the debate takes place and all related documents are taken into account. If the affair is entered an extensive legislative process with votes taken on small changes usually follows. These intermediate steps/votes are not included in the model but only the final vote on the „matter of the final vote“ (in German: Schlussabstimmungstext) containing the final legislation together with the amended reports of the commissions, affair summaries as well as the statement of the Federal Council.
In some cases reports from external experts are also included.
The idea was to train the model based on a broad background of information and to put the model into the same spot as a politician (so to speak) who when voting not only takes into account the specific legislation but also background information.
The above described legislative proceedings are still a gross simplification, a more detailed flowchart can be found on the site of the Swiss parliament:
Answer: First of all, what is Natural Language Processing (Wikipedia: NLP)?
A broad definition could be that as soon as a computer handles text it’s NLP.
Or NLP can be the very specialized branch of machine learning that analyzes, classifies and even creates texts applying the latest technology.
I would rather stick with the first definition. So, for this study NLP has been applied and at a relatively basic level. Once all documents were collected a so-called tokenizer was applied to extract roughly 46,000 unique words out of the whole corpus of text. The text was then transformed creating a (huge, sparse) one-hot matrix for every record indicating if a particular term is used in that record or not. The model was then trained by gradient descent to weigh each term in order to be able to predict the outcome of the vote correctly. To avoid overfitting on particular terms that would restrain the capability of the model to generalize on new data, so-called lasso regularization was applied. This procedure extracted 789 terms out of the initial 46,000. This turned out to be the best performing model in terms of the resulting accuracy on the validation and test set.
These models models have proven to be working very well on popular votes. But in this study these models did not excel in terms of accuracy on the validation/test set by not exceeding a level of approximately 90/75%. As a side note it remains an open question how well these models could still perform in the long run on unseen data.
This is why it was decided to go back to the very basic logistic regression classifier which was just meant to be a baseline model. By simplifying this model even more (using 1-grams instead of 1-3-grams, using lasso instead of L2 penalty etc.) the final results were obtained.
Issue: As mentioned in the poster many votes on incremental adaptations of articles are excluded from the data set. This means that the model has not been trained on these votes and therefore cannot predicting their outcome either. This meant to exclude the main type of affairs handled in the parliament, the „Affairs of the Federal Council“ (Geschäfte des Bundesrats). In these cases the parliament cannot reject the substance as a whole but can only improve it until it’s finally released as new legislation.
The remaining question: How could a machine learning model or possibly a reinforcement learning model be designed to include the votes on small adaptations at the level of articles while still considering the whole background of the political matter?
Any input on how to address this question would be highly appreciated.