Effectiveness of proactive password checker based on Markov modelsTaneski, Viktor (Avtor)
Brumen, Boštjan (Mentor)
Jolevski, Ilija (Komentor)
passwordspassword analysispassword securitypassword problemspassword strengthsystematic literature reviewMarkov modelsIn this doctoral dissertation we focus on the most common method of authentication, the username-password combination. The reason for the frequent use of this authentication mechanism is its simplicity and low cost of implementation. Although passwords are so useful, they have many problems. Morris and Thompson, for the first time almost four decades ago, found that textual passwords were a weak security point of information systems. They have come to the conclusion that users are one of the biggest threats to information system’s security. Since then, we face these problems on a daily basis. Users do not perform the behaviours they need to be done in order to stay safe and secure, although they are aware of the security issues. Because this is a research area that security experts have been dealing with for a long time, in this dissertation we wanted to identify problems related to textual passwords and possible suggested solutions. For this purpose, we first performed a systematic literature review on textual passwords and their security. In doing so, we wanted to evaluate the current status of passwords in terms of their strength, ways of managing passwords, and whether users are still the “weakest link”. We found that one of the less researched solutions is proactive password checking. A proactive password checker could filter out the passwords that are easy-to-guess and only let through the passwords that are harder to guess. In order for a proactive password checking to be more effective, it is necessary for the checker to be able to check the probability that a certain password will be selected by the user. For this purpose, the better password checkers usually use certain tools to calculate password probability i.e., password strength. To find out which method is most suitable for calculating password strength, we have looked at similar solutions throughout history. We have found that Markov models are one of the most common methods used for password strength estimation, although we may encounter some problems when using them, such as sparsity and over-fitting. By reviewing similar solutions, we found that Markov models are mostly trained on only one dataset. This could limit the performance of the model in terms of correctly identifying bad or very strong passwords. As training datasets are important in the development of Markov models, it is clear that they will have some effect in the final assessment of the password’s strength. What we explore in our dissertation, is the importance of this effect on the final password strength estimation. Mainly, we focus on exploring the effect of different but similar datasets on password strength estimation. For the purposes of our study, we analysed publicly available sets of “common passwords” and processed them regarding the frequency distribution of the letters contained in these passwords. We built different Markov models based on these datasets and frequency distribution. This helped us determine if one Markov model was sufficient or if several models were needed to effectively estimate password strength for a wide range of passwords. The results showed statistical differences between the models. In more detail, we found that:
- different Markov models (trained on different databases) showed statistically different results when tested on the same dataset,
- more diverse datasets are needed to be able to calculate the strength of as many passwords as possible, since one “universal” model, trained on one “universal” dataset is less effective at classifying passwords in different categories (i.e., weak, medium, strong),
- different Markov models of 1st and 2nd order, in most cases, give no statistically different outputs,
- overall, Markov models can be used as a basis for constructing a more effective password checker that uses multiple different and specific Markov models, which could be more effective if we want to cover a wider range of passwords.[V. Taneski]20192019-06-13 11:17:59Doktorsko delo/naloga73782Mariborsl