Project Description

USING INTELLIGENT ALGORITHMS TO FIND COMPANIES SUBJECT TO THE PENSION SCHEME OBLIGATION

IMPROVING THE SUPERVISION OF PENSION FUNDS

BRIEF

In many countries, particularly the Netherlands, it is a mandatory requirement for businesses to contribute to their employees’ pension funds. Image the negative impact on pension fund companies if these businesses are incorrectly categorized in the industry section at the Chamber of Commerce! This incorrect classification can lead to a situation where they have to pay pensions to individuals whose employers have not contributed to their pensions.

Using web scraping and machine-learning techniques, we helped one of our clients, a pension fund company, to automatically identify businesses subject to the obligation. We did this by linking as many company registration numbers as possible to the right company websites. Our client could then investigate each company by searching for specific keywords related to their industry. The more relevant the keyword, the more likely a company belongs to that industry. No more missed premiums, say hello to efficiency, increased revenue and improved processes!

THE CONCEPT

The amount of pension premium that businesses have to pay depends on the type of industry in which they operate. Today, due to industry blurring, many companies are registered with the Chamber of Commerce under the wrong business category. As a result, pension fund companies run the risk of having to pay pensions to employees for whom premiums have never been paid. Moreover, they run the risk of missing out on premiums of around €4,000 – €5,000 per employee. So the question was “How can we help the client to automatically identify the companies that fall under the obligation category?”.

CUSTOMER CHALLENGE

During this project, we had to link as many Chamber of Commerce numbers as possible to the correct company websites. This way we could automatically check each company’s website for specific keywords that show what industry the company is in. The more these keywords matched the industry, the more confidently we could say the company belonged to that industry. To validate and refine the accuracy of this tool, we employed a machine learning model trained on a dataset comprising 2,000 company websites. Given the intensive computational demands of web scraping and machine learning processes, especially when dealing with a large volume of data, we opted to leverage cloud computing services to ensure fast performance and scalability.

TECHNOLOGY STACK

During this project, our team used the following methods and technologies:

  • Web scraping

  • Machine learning model and algorithms

  • Microsoft Azure

RESULTS

Previously, our client faced significant challenges in identifying companies that did not pay premiums. They had to conduct manual research, which was highly prone to human error and involved long processing times. Such inefficiencies not only introduced uncertainty into the research process, but also made it a tedious and time-consuming task. Thanks to the web scraping tool, our client now has an intelligent and generic database. The tool then automatically generates a report sorted by relevance. In addition, it is only necessary to investigate the companies that are really relevant, or the top 1% highest scoring online companies, which significantly improves the work efficiency. This tool also prevents our client from losing €4,000 – €5,000 premium per employee. Last but not least, it helps identify potential customers faster, resulting in an overall increase in revenue.

WANT TO SEE MORE?

OTHER PROJECTS