Back Building Risk-Based Authentication for Shibboleth in Two Weeks

Building Risk-Based Authentication for Shibboleth in Two Weeks


I just revolutionized how people log in to Shibboleth by implementing modern security defenses into it using a neural network combined with heuristics. This was my group’s submission for Michael Roman’s assignment in my cybersecurity class.

Shibboleth is an identity provider that gives you one account to log in to your university systems. It’s one of the most important pieces of software, and there’s a good chance you’ve never even heard about it. Whether that’s Canvas or your email, we all login to our portals daily without even thinking about it.

Shibboleth has many security defenses built in already, but they mostly relate to the back channel. However, the front channel security, the technical term for the login page you see, is not as secure. There are many security defenses in login portals such as detecting impossible travel and behavioral analysis that signals a bot is trying to compromise your account. They exist on many login pages such as Microsoft or Google, but not on Shibboleth’s by default.

I worked with four other AI and cybersecurity students at Duke to bring these modern security measures to Shibboleth. As soon as you visit a Shibboleth login page, it starts collecting multiple data points. This data, along with additional information about the login, is sent to an external server with a trained neural network. The server receives the information and immediately determines if the IP address trying to log in is high risk. The behavioral data is sent to a neural network we trained to detect suspicious user behavior. Both numbers are ensembled to give a final score to predict how risky the login is. The higher the number, the riskier the login.

Intercepting the Shibboleth login flow to have our risk-based authentication analysis run was tedious and took nearly a week of development time. There were so many XML scripts to modify, and since I wasn’t familiar with Shibboleth prior to this project, I had to learn everything from scratch.

As for the neural network, we initially took a naïve approach of finding an existing Kaggle set of login data and using that to train. However, this flopped hard. The neural network was learning to assess risk based on browsers that were 4 to 5 years out of date. This neural network consistently ranked a login from me as 55% likely it was malicious. To be completely transparent, that is outrageously bad. However, seeing your initial solution crash and burn is part of the process.

Instead, we started with zero data and collected our own. We collected behavioral data from the login page and bootstrapped the initial risk scores with statistics. They were personalized for each user and would learn “on average, you spend this long on the page, click this many times, type this fast” and so on.

We applied the same thought of personalization to the neural network itself. There is an embedding layer for the username, which allows the neural network to learn how each user logs in and creates a personalized risk score. The best part is that it’s computationally cheap to create an embedding layer. The decision to make this a personalized score for each user was a no-brainer, and a pillar to this project.

After all our hard work, I’m very pleased to say the neural network accurately scores me at 10-20% that it’s malicious. This project has strengthened my knowledge of how data can exist but not be useful. I learned how to start from zero with the data collection process and go all the way to a working model. We found a use case and truly integrated it all seamlessly.

This was an incredible project to develop in just two short weeks. I also want to thank Jenessa Lu, Lalit Lakamsani, Michael Saju, and Sera Tan for collaborating on the project, giving moral support at 2AM, and logging into our Shibboleth instance nearly a thousand times to collect data.

Share this post:

Copyright © 2025 Sam Packer.