Predicting or estimating the selling price of a property can be of great help when making important decisions such as the purchase of a home or real estate as an investment vehicle. It can also be an important tool for a real estate sales agency, since it will allow them to estimate the sale value of the real estate that for them in this case are assets.

Having an estimate of the value of the property allows to increase the negotiation capacity, both for the buyer and for the seller. In addition, having this knowledge serves as a comparative tool to evaluate growth projections in different residential sectors.

In DataSource.ai we have conducted a competition in data science where participants could download a dataset with information about sales that have been made in the past, and try to estimate the value of a new property for sale.

The objective of this competition was to create a machine learning model to predict the price of apartments for Argentina and Colombia, but these same models can be applied to other countries or even cities, given the main variables that describe these properties, such as: the area, the number of bathrooms, the location, etc.

In this case study we will go through the following points:

Defining the problem
Acquiring the data (dataset)
Configuration and requirements of the competition
Choosing a winning model
Deploying an API in production
Deploying the model in a visual app (Streamlit)

Definition of the problem

After the previous introduction, where we talked about the importance of being able to predict the prices of a property, and the competitive advantages that this gives us, the problem we want to face in this case is a Regression task, where we have a list of properties that have been sold in the past, with the characteristics that define it, such as: the area, the number of bathrooms, the location, etc. So we must predict a continuous value.

Gathering datasets

The main data set contains information about the departments/apartments for sale in Argentina and Colombia during the period 2019 - 2020. "data provided by properati"

If you want to see the datasets that this platform provides you can find them here:

https://www.properati.com.co/data/, these data are open, so we thank Properati for taking the trouble to open their data!

DataSource.ai cleaned and pre-prepared the data needed for the competition.

We defined the following features:

ID = Unique property identifier

country = Country where the property is located

city = City where the property is located

province_department = province or department where the property is located

property_type = Type of property (In our case it is only an apartment)

operation_type = Type of business (sale)

rooms = Number of rooms

bedrooms = Number of rooms

surface_total = Total area in m2

currency = currency in dollars

price = price of the property

Note: For Colombia, a conversion rate of $3,633 COP was applied to calculate the price in USD

The dataset was divided as follows:

train_apartmentos.csv corresponds to 80% of the data, with this set you will train the machine learning model.

test_apartments.csv corresponds to 20% of the data, with this set of data you will predict the price column. Unlike the set Train, the dataset Test does not contain the data in the column price, these must predict with your model.

Competition configuration and requirements

In order to set up the competitions, we will need the following guidelines:
Competition start date.
End date of the public phase: in this period we evaluate the models based on a first subdivision of the data.
End date of the private phase: in this period we evaluate the models based on a second and last subdivision of the data to avoid overfitting.
The description of the details of the competition
The rules of the competition
The evaluation metrics of the models. This metric is subject to the type of problem to be solved.
The prizes to be awarded to the competitors who achieve the best scores.
The data in csv format and with the corresponding subdivisions.

Selection of a winning model

At the end of the competition, these were the top 5 winners

Each of them has sent us the source code with their respective solutions, and we have chosen one of them to deploy in production.

Deploying an API in production

This is a screenshot of the code we used to deploy an API on our own servers, and be able to attend HTTP requests.

You probably don't understand much about it, don't worry, we have the engineers to do this whole process of deployment to production of the winning models. The important thing is what comes next.

Let's say hypothetically your company has a web platform, to which you want to connect these predictions, and every time you need to make a prediction you only have to enter the characteristics of the property by filling out a form, and you get these predictions. Well, that is the intention of this API, that the form you fill in your web page communicates automatically and via API with our server, and immediately returns the predicted sale price.

Let's perform the simulation using Postman.

Here we can see that at the top we are sending the data to a URL with the following structure: https://mlendpoints.com/real-state-price-forecast/predict

This is the endpoint where we have uploaded the code shown above. Then in the Body we are sending some characteristics about the property to which we want to predict the final value, or sale price. Which is located in Argentina, has a total of 4 rooms, 2 bedrooms and 3 bathrooms, and has an area of 137 mts2. Given these characteristics and using one of the winning models, we obtain the result at the bottom, with an estimated sales value of $402,473 usd.

This is where the power of Machine Learning lies with a model deployed to production, we can make predictions on the fly!

Deploying the model in a visual app (Streamlit)

As a final and additional step, we can simulate a visual platform, where by means of a form the prediction results are obtained. In this case we are using Streamlit. Let's see the result: You can access the following link and play a little with the prediction parameters

In the left bar we have the model parameters, in order to make the predictions. Such as country, rooms, bethrooms, etc. And in the central part we have the Score of the model, the parameters specified on the left and the result of the price prediction. For these characteristics, for example, the final price is $314,011 usd (they are different parameters from the previous one, that is why the price changed).

Conclusion

As you can see, with the competitions we cover the whole machine learning process, and we do it hand in hand with you trying to solve a specific problem with data science.

If you want to learn more about competitions, you can read this article, and you can also schedule a free consultation with our data science experts.

Most Related Articles

Use Cases

Inventories Optimization with Machine learning

Inventory management is a reality for businesses that produce or market tangible goods, and their managers must face the daily challenge of how to manage it optimally, i.e. ensure the availability of goods to meet customer demand, but at the same time keep inventory levels at reasonable levels. UnplashCompanies that optimize their inventories achieve benefits such as cost reduction, working capital recovery, sales improvements due to reduction of out-of-stocks, customer loyalty and lower losses due to damage or obsolescenceOne way to optimize inventories is through improved demand forecasting, which consists of estimating probable future demand for a product or service. This activity is usually also known as demand planning, which is a process that begins with forecasts but is not limited to this. As a company increases the levels of accuracy of its forecasts, its inventory levels can be adjusted and it can perceive the benefits of this changeFor forecasting processes, traditional statistical methods have existed for years and are still used today. These methods use data from the past to forecast the future (common practice is to use data from two or more years); time series can be constructed to forecast sales, although these models are not entirely accurate due to changes in demand and market volatility. That is why forecasting activities are complemented with machine learning techniques.It is important to consider that not all information is always available to forecast all products, so it is important to determine how feasible it is to accurately forecast a product before starting a project of this type. It should be analysed that the minimum data required are available: Historical prices and sales quantitiesProduct & Shop DirectoryMarketing CampaignsSurplusRelying on statistical models, machine learning uses external and internal information sources to make more accurate and data-based predictions. Machine learning engines can use information such as sales reports, marketing surveys, macroeconomic indicators, social media signals, weather forecasts and more. Companies that are incorporating machine learning into their existing systems are achieving 5-15% improvements in the reliability of their forecasts (reaching accuracies of up to 95%). Additionally, they eliminate activities such as manual readjustments and model calibrations, however, it is important to clarify that these methods require high availability of good quality data and computing power. Some of the cases in which machine learning models have a better performance than traditional models whereDemand patterns are volatileShort and medium-term planning scenarios should be carried outYou have products with a short life cycleThere are many models of Machine learning and although there is not one that fits all situations, some of the most efficient and most used for forecasting are Linear regressionsRandom treesGradient boostingDeep learning

Juan Guillermo Gómez Ramírez

Apr 17, 2021

Use Cases

How to Optimize the Operation of a Retail Store

Let's imagine a beauty products manufacturing company with more than 50 stores and, in turn, that each store uses its sales representatives to perform the store's distribution manually and calculate the store's inventory quota (this for more than 1000 references); consequently, only 10 minutes are dedicated for each working hour in active sales or solving customers' doubts.Photo by Clay Banks at UnsplashThis use case is focused on helping companies to improve their distribution of references for all stores and thus avoid running out of stock or having an over-supply of some products in certain stores, also to predict when and how to supply each store.One of the ways to solve these problems is by using demand prediction algorithms, combining internal data, customer profiling and public data such as weather and holidays, to estimate demand volumes in different locations where supply is needed.In addition, sales reps while working with the customer are collecting data with devices that allow the company in real-time to answer some questions such as: Are the references correctly distributed in each category and group, which reference has not been correctly distributed, is the distribution correctly executed in each store?With these methods it would be possible to increase up to 150% the activity of the representatives so that they are available in active sale and fidelizing the client, also the time of audit that is used for the checking of the inventory and the collection of data is reduced up to 50%.Data collection allows us to analyze products that have great sales potential, calculate replacement orders and increase or decrease your inventory; in addition to correcting errors to improve distribution, increase revenue, create artificial intelligence models that make these decisions on their own, and thus increase efficiency in your company.

Karim David Barragan

Apr 17, 2021

Use Cases

Learn How Data Science is Increasing Productivity in Retail Stores

The advancement and growth of new technologies and algorithms in the data science sector has given rise to many applications and use cases for all kinds of industry. Thus, making the most of the resources and processes that are indispensable to optimize and have desired results. This use case will be focused on the commercial sector specifically sales, where our main variables will be price and promotions. These are some of the variables that are commonly handled in all types of businesses whether they are small, medium or large. Regardless of the size, everyone has the opportunity to take advantage of the data. Thus, data can be collected, analyzed and presented to make important decisions, whether it is to set the right price, make an offer, offer certain products and above all, to know the right time to implement a certain strategy. All this can be managed through techniques, algorithms and models to predict and make management decisions that take the company to the next level.image credit: Mike PetrucciWhile it is true that technology has advanced to the point that most people have a mobile device to access the Internet. Thus, over time we have noticed a growth of companies offering their products online. In this way, the public can see offers and buy products in different parts of the world. This is a great advantage for customers, but also for the business itself. Currently, there are algorithms that take into account customer preference to offer products and even compare prices between the same market niche. This is easy to observe when looking for travel tickets or simply looking for a household item.Companies that have a presence on the Internet and are associated with the sales market mostly handle e-commerce as online sales points. However, not everyone uses data science to give their business an advantage. On the other hand, companies use predictive models to adjust prices and promotions. So companies like Darwing Pricing LLC provides its customers a novel service of pricing and promotions that are automatically adjusted according to the location. For this, they use neural network intelligence algorithms to give us a real-time model of pricing and promotions. In this way, companies can produce more effective promotional campaigns and increase their productivity by more than 50%. Additionally, companies using these tools will have more time to focus on customer service. Data science, specifically areas such as machine learning and AI (artificial intelligence) have contributed greatly to commercial business. Let's look at some statistics, results and studies of some companies that have applied machine learning or AI between 2017 and 2020. By 2017 McKinsey told us that commercial companies in the USA saved more than 19% using data analysis. Likewise, Target company saved between 15-30% of their profits using machine learning models. Another strong and recognized company is Amazon, which uses machine learning in many of its products. Thus 55% of their sales are handled by machine learning model recommendations. Also, Netflix is saving around 1 billion per year by using these models alone. All this is possible because many processes are optimized so that sales, pricing and promotions are totally effective. Forbesprovides us with very useful information about how companies are revolutionized by the use of AI (artificial intelligence) and machine learning. For example, the fact that many companies save between 40% and 60% of their costs. In addition, they predict that by the current year 2020 B2B companies will invest 30% of their resources in AI. In this way, companies will be more focused on their productivity and effectiveness. As technology advances, companies will have more opportunities to use machine learning or IA models in their processes. This is evidenced by the large investment that the retail sector has made and will make in the coming years. According to studies by Global Market Insight, this sector will invest 8 billion by 2024. This gives us a very clear picture of the importance and projection that companies have towards data management. On the other hand, we could ask ourselves, if all this money is directed to prices and promotions. As we can imagine, no, but these variables occupy an important percentage in the investment of companies. Let's see a little bit how companies use AI (artificial intelligence) in their different areas according to a study created by IBM.85% in supply chain planning85% in demand forecasting79% Customer intelligence75% Marketing, advertising and campaign management73% Pricing and promotion73% Store operationsWhat is the benefit of this retail sector investing in the previously mentioned areas? It is estimated that by 2022 this sector will save approximately 340 billion each year. This, according to the survey made by Capgemini. Additionally, let's not forget that many machine learning and AI applications for sales companies are focused on security and customer service as we will see later on. What if our business is not on the Internet? How can we take advantage of the science of data for our business? Well, there are image recognition and facial registration applications to know if a product was picked up or not. Additionally, you could know if a customer has more preference for certain products than others and even know the satisfaction of the service. Forbes presents a clear example of this, where it shows us how Wallmart implemented an image recognition model to determine the satisfaction of its customers. On the other hand, image recognition and certain learning models provide us with the security service for our products. This, especially in physical places where there is a large volume of people. In this way we can avoid any loss in business. As we all know the prices and offers depend on many factors. Such as processes, inventory, market among many others. For each of them, there are tools in data science to manage and take advantage of each of these variables. This way we can directly impact on prices in real time. A very common example is to apply machine learning or artificial intelligence algorithms to avoid waste in manufacturing processes and be able to reduce sales prices. In addition, we could analyze the time and place where people buy. Determining also the frequency of purchase to be able to launch timely strategies. Finally, we can conclude that any company of different size and business sector, can make use of data science tools to make effective decisions and impact on processes, products and strategies to achieve desired results. Thus, companies will not launch campaigns with their eyes closed but based on data analysis.

Danilo Galindo

Apr 17, 2021

Use Cases

Innovation With Adaptation In Sales, Towards Intelligent Stores

The last decade has been marked by a great technological evolution and with it integrating great changes in the way how certain activities were being developed, for example, the personal relations or the interaction of some with others, in the labor aspect it has been managed to implement diverse tasks with great success in a remote way, to optimize or to automate processes that normally can demand more complexity to us; and thus a without number of activities that by very trivial or complex that is the technology has arrived to inject the plus that makes our growth dynamic.Fotfoto - unsplash.comIn this particular case we are going to stop a bit in the online and physical trade, its boom and growth along with technology. Maintaining the competitive advantage is one of the most challenging tasks of all managers or business owners, analyze all the edges of the market and not succumb to a bad decision or a giant that monopolizes the market are some of the challenges to be faced every day.During the recent Wharton Trade Conference entitled "E-Commerce", one of its panelists talked about the resources of the mobile environment, social networks and measurement tools that play a very important role in the present and future of commerce. According to Dave Larkins, vice president of NetPlus Marketing at Conshohocken in Pennsylvania, and one of the creators of The Colony e-shop, "mobility will be a key element of commerce, especially for online shopping. Mobile technology has not reached its full potential and its continuous evolution makes it suitable for commerce with better ways to address the customer in a personalized way, depending on where they live or make their purchases regularly, offers and direct promotions on products of interest. For example, geographically-specific social networks, such as Foursquare, which simply ask users to share their business preferences with friends, are considered another way for brands to enter communities, he added.The consumer was adapted to expect that added value provided by the brand online, that creativity and integration with which it provides the precise product information they need in the different media they frequent, clarifying doubts instantly and seeing first-hand experience with reviews from other users, in addition to different payment mechanisms or the ease of payment at a single click.The physical stores or retailers in the face of this boom cannot be left behind and an alternative is being prepared and adjusted for this type of physical stores with a great technological and innovative commitment when it comes to making purchases. The objective is to automate the stores, to establish stores without cashiers or annoying lines at the time of paying for items purchased, where the comfort, practicality and good name of customers when shopping is the priority.Several companies in Europe, Asia and America specialized in process automation by means of Machine Learning and Artificial Intelligence, have been making strong commercial alliances to develop this type of technology. One of the first deliveries of this technology was made in San Francisco, where in an area of no less than 500 square meters one of these stores, formed with various items of the family basket, distributing 27 cameras along the ceiling related to a software fed with a large amount of behavioral data.Users or buyers would only have to download and install a mobile application where they would register, no more strange than filling in personal data and a payment record. The store does not have checks at the entrance or exit, the cameras will be in charge of identifying the buyer, the items he takes, the ones he browses and the ones he carries with him and his account will be automatically debited when the buyers leave the store. The cameras record the movements, speed, stride and look of the buyers, the store knows when I look at a poster and for how long, it knows if I slow down, grabbed a chocolate bar and put it back, it knows if it is in front of the cereals, but the face is on the popcorn, in short all this information can lead to predict and prevent a robbery in the stores. Once the system decides that it has detected a possible theft behavior, a store employee will receive a text message and be led into "polite conversation or persuasion.Walmart, the world's largest retailer, 120 of the 4,700 U.S. stores, shoppers can also scan items, including fruits and vegetables, using the camera on their smart phones and pay for them using the devices. When customers check out, an employee verifies their receipts and performs a "spot check" on the items they purchased.Self-payment points have been common in supermarkets and other shops. Sensors and predictive analysis tools to better anticipate when more cashiers will be needed Sensors on the shelves help automatically count the cookies, chips and soft drinks that shoppers take out and put in their bags. Shoppers scan a code on their phones to enter and, once inside, scan the items they wish to purchase. The store opens the exit door after they have paid through their phones.On the other hand, one might think that growth in technology without ATMs could hurt the workforce; for example, there are nearly five million retail workers in the United States. But because this technology is designed, the main thing is not to look to store owners to replace workers. The aim is to change their role, to empower those functions, their workers could wander more around the stores, adding quality to the customer experience in the hope of attracting them back to retail. Technology and these breakthroughs are calling us to implement new services, making shopping more fun, innovative, and worthwhile offline.These companies seek to put technology at the service of retailers to compete with the industry giants. One of them, AiFi, is working on cashless payment technology that it says will be flexible and affordable enough for family or smaller retailers and large retail outlets to use. In the United States, venture capitalists have invested $100 million in retail automation start-ups in each of the past two years, up from $64 million in 2015, according to Pitchbook, a financial data company."There's a gold fever feeling about this," said Alan O'Herlihy, CEO of Everseen, an Irish company that works with retailers on automated payment technology that uses artificial intelligence.

Julio Bertty

Apr 17, 2021

Data Science Study Case - Real Estate Price Forecast

Contents Outline

Daniel Morales

Data Science Study Case - Real Estate Price Forecast

Related Posts

Categories

Join Competition

Juan Guillermo Gómez Ramírez

Karim David Barragan

Danilo Galindo

Julio Bertty

Data Science Study Case - Real Estate Price Forecast

Contents Outline

Social Sharing

Daniel Morales

Related Posts

Categories

Join Competition

Most Related Articles

Juan Guillermo Gómez Ramírez

Karim David Barragan

Danilo Galindo

Julio Bertty