Hello everyone, I'm Daniel Morales co-founder of DataSource.ai and today I want to share with you very good news and the new advances we are making in order to have an excellent data scientist community globally.

New CEO

This is perhaps the most important news of all, as we have a new CEO who has been working on DataSource.ai for the last 3 months, understanding the internal processes, the competitions, the community and the technology we work with. This is a huge achievement since he is a person with extensive experience in tech companies and startups, with more than 20 years working for 500 fortune companies like IBM, Cisco and AT&T and is based in San Francisco, CA. heart of Silicon Valley. His name is Dimitry Kushelevsky, you can contact him on Linkedin, or email dimitry@datasource.ai

Our main goal with Dimitry is to fulfill our mission of democratizing Artificial Intelligence to small and medium businesses, as well as creating a great culture, having a non-technical leader to grow the team and having sponsors for data science competitions, bringing value to these companies with the results obtained from Machine Learning models sent by our community, and as a result having prize money, constantly and consistently for the whole community. Welcome Dimitry!

Competitions

So far we have held 6 competitions, and we are in the middle of the seventh competition. In a journey of more than a year we have learned tons about competitions, how they work in detail, how to host them, how to evaluate them, how to automate tasks, and much more. At the same time we have learned from you, from those who have won competitions, and have filled out our feedback forms. We would like to take this opportunity to thank you for this!

Based on this knowledge we have made a number of changes that are worth sharing with you.

Discussions within competitions

This is a feature that we were asked many times in the surveys, so it is now enabled for all competitions. If you have questions or comments you can post them here.

Maximum of 50 submissions per day

When a competitor makes more than 50 submissions per day, the button will be disabled, as it has reached its maximum.

If a competitor is sending this amount of models per day, he is probably doing it automatically, trying to overfit the Test.csv, which is not good for the competitor, nor for the other competitors.

Remember to always choose your best models to send, so you don't have to wait until the next day. As an additional tip, we recommend that you make different splits of the data in the Train.csv, which in turn serve as Hold-out test sets, so that you can simulate new unseen data scenarios, and run the same competition evaluation metrics on them. This way you will be more confident about the possible results when you send the csv file to our platform.

Competition completion process

This is perhaps the most important change we have made within the platform, so pay close attention.

The normal process of participation within the competition is as follows:

You download the Train.csv dataset.
You make EDAs and build a base model
Make a .predict on the dataset Test.csv
Create a csv following the guidelines of the SampleSumbission.csv file.
Upload the csv to our platform to obtain the score.
You appear in the public leaderboard
You continue to work the model with advanced techniques and test different models
You repeat the submission process
You get different scores (and you improve them).

This is the normal process, but it has the problem of overfitting. The model that has the best score, we can say that it has been overfitted to the data given in Test.csv. That is why we have decided to introduce a new dataset which will be released at the end of the competition, which will act as a "real life" dataset, on which the model has not been "overfitted". We will call this dataset FinalTest.csv. And the process to send the model is as follows:

Once the date is reached (see competition timeline) the dataset called FinalTest.csv is enabled.
You download them to your environment
You choose your best model (the one that has given you the best score so far with Test.csv)
And you do a .predict on FinalTest.csv
Be careful because you will only have ONE chance to send this last model, so choose well.
Create the csv following the guidelines of the SampleSumbission.csv file. In the final form (from the Submit Final Model button) you must include
1. The csv to obtain the score
2. The .ipynb (Notebook)
3. You will no longer need to send the Notebook to our email address.
You will see your final score on the screen, but it will not be immediately reflected in the private leaderboard.
You will have a period of one week to make this submission.
At the end of this week, all scores will be revealed and the private leaderboard will be revealed.
The private leaderboard is the one we will use for points, gifts and/or cash prizes.

Timeline of completion

Following the previous example, as you can see in this image the competition (a fictitious competition) started on March 22, ended on April 14, and will be completed on April 21. That means:

Until April 14 you can use the Test.csv dataset to make your predictions and be on the public leaderboard.
From the following day, and until one week later, is the final submission window. It closes definitively on April 21. In this period of time you must send your predictions on the FinalTest.csv dataset.

Late submissions

Some competitions will allow late participation for academic and learning purposes. If the competition is already over, it means that the respective prizes (points, prizes in kind or money) have already been assigned, so it is not possible to participate late to win them.

But you can download the datasets, play with them, have fun, learn, send your results, get the scores and finally appear in the public table. This is a good way to keep practicing and demonstrating your data science skills!

Certificate of participation in the competitions

Within your Dashboard > My Profile > Certificates you can find the certificates of your participations. You will be awarded a certificate as long as you are in the first 10 places of the private leaderboard, and once the final date is reached, they are automatically assigned. These certificates can also be shared on your Linkedin as "certificates" automatically as well:

Haz click en “Add to profile” y aparecerá lo siguiente

And then you can share your achievements with recruiters!

Here is an example of the certificate in PDF

Your public profile

We have also changed a bit the public profiles, to see the participation in competitions, so you can show it to recruiters and the community.

We are currently working on other great opportunities within the competitions, so expect news soon!

Most Related Articles

Business

What Are the Expected Results of a Data Science Project?

From a company perspective, data science projects should always be viewed as experiments. Remember that we are talking about science, and science bases many of its theories on the results of a series of experiments. From here, many companies start with the wrong assumptions, thinking that the results are exact sciences, of which there must be a single, true answer. The reality is that many data science projects fail because of the lack of iteration that is needed once the first results are obtained, and because they do not adopt a scientific approach to the process, that is, an experimental approach. But what are the expected results? Well, the most common results expected from a data science project are:APIsIntegrationsApplications and/or PlatformsReportsPresentations1- APIsAPIs are a set of subroutines, functions and procedures (or methods, in object-oriented programming) provided by a certain library to be used by other software as an abstraction layer. This sounds a bit confusing, but it's easier than it sounds. This is code that can be shared between machines. Here the requests are not made by a user from a browser, but by one program to another program, and the result is chunks of code that can then be read and processed. APIs is the most common form of expected outcome for a company that intends to develop a data science project, specifically talking about predictive models, as it allows to easily integrate the solution within the company's current internal programs, without the need to worry about deep integration, or the incompatibility of programming languages, devices or internal systems. This is how a company that has a strong web, IoT, or mobile presence can immediately use a data science solution without the need to invest large resources. The requirements for an API, as an expected outcome of a data science experiment are:Well-written, understandable and reproducible help pages or documentationThe code must be well documentedThe code must be version-controlledAlso read: ¿What Is A Data Science Tournament?2- IntegrationsIntegrations are a bit more complex from a technical point of view, since they involve integrating a solution within the company's current systems. So, for example, if all the current web development was done in Java, and the machine learning solution was delivered in Python, the software engineering team will have to figure out how to integrate both languages into the stack, either through internal microservices or other application coupling techniques. Let alone if the company has a monolithic stack. If the latter is the case, you should opt for an API. 3- Applications and PlatformsAnother common solution is to create an external service, completely different from the company's main service. Here the aim is to host the service in a different domain, where the objective is to access only to obtain the results of the predictive modeling. This is common for companies that use engineering as marketing, making predictive programs to add leads to their pipeline of prospects. The requirements for these applications and web pages, as an expected outcome of a data science experiment are:Ease of use of the toolHelp pages or documentationThe code must be well documentedThe code must be version-controlled4- ReportsHere we would no longer be talking about predictive models, but about data analysis in general. The most common and expected result is a report or series of reports where it is expected to understand the historical data, the reason for the historical results and the conclusions of the same. They are usually full of statistical data useful for decision making. They are useful for management, marketing or human resources committees. There are many formats for this, but the ideal would be not only to present the report, but to have the opportunity to make a presentation and tell a story about the data. Ideally, the reports provided to you should beClearly writtenInclude a narrative around the data.Creation of an analytical datasetAnalysisClear and even interactive graphicsConcise conclusionsOmitting unnecessary detailsReproducibleAlso Read: How Poker Can Teach Data Science Fundamentals5- PresentationsPresentations are where data scientists tell stories with data. A detailed but conclusive report on historical data is expected for decision making. This helps any area of the company, and can be presented at any business committee. The exact same criteria for presentations:Clarity:Include a narrative around the data.Creation of an analytical datasetAnalysisConcise conclusionsClear and even interactive graphicsOmitting unnecessary detailsReproducibleConclusionAs we can see there are several types of results when we expect to run a data science project. We see the importance of having clear objectives, what we can achieve, but most important of all: take the project as an experiment, not as a magic solution to all our problems. Also read: What Is Open Innovation In Data Science?

Daniel Morales

Apr 16, 2021

Business

Data Democratization and AI in the Financial Sector - Podcast

In this blog post we will talk about the democratization of data in the financial sector. The format will be a bit different than usual, as it is an interview with our CEO Dimitry Kushelevsky given to PrivacyLabs.ai. The interview was given in Podcast format, and the original audio can be found here: https://www.buzzsprout.com/1769590/8683204-data-democratization-and-ai-in-the-financial-sector-with-dimitry-kushelevskyYou can also find the PrivacyLabs.ai post here: https://privacylabs.ai/data-democratization-and-ai-in-the-financial-sector/Paul StarrettHello, everybody. Welcome to another podcast by PrivacyLabs. My name is Paul Starrett. I am the founder of PrivacyLabs.. Remember, PrivacyLabs. is one word. And this podcast today is going to be with Dimitry Kushelevsky. And this is in a series of podcasts on privacy preservation, and democratization of data, which is the focus of this podcast and similar technology specifically, generally within the area of machine learning and artificial intelligence. Just a little bit of background on Dimitry and myself, we had the pleasure of meeting through an investment group about three months ago, we both are advisors in various capacities for a company called Ealax.com company that specializes synthetic data for financial crime. But since then Dimitry and I have had many conversations around this topic. And I thought it would be wonderful to tap his brain for this area in democratization since his company datasource.ai is specializes in that. And his background is really perfect for this topic. So we’ll be talking with him about that. And I think without further ado, Dimitry, if you introduce yourself and your company, and then we’ll just dive right in.Dimitry KushelevskyThat’s great. Well, well, thanks again, for involving me in your podcast. It’s, it’s an honor, and I am most happy to continue our conversation, which has been very productive and very engaging so far. So let’s see. So where do I begin? So, as you mentioned, I am the CEO and co founder in datasource.ai, a startup that we were started with the sole purpose of democratizing AI, more specifically, data science in the form of machine learning, and making its incredible capabilities available to the entire world. Right now, it really is what I loosely describe as a 1% problem, 1% versus 99%. It seems that many people, many business organizations, many individuals in tech, are already very familiar with the concept of AI and what it can bring the specific benefits that it can bring, as far as improving their operations as far as bringing additional revenues and boosting their potential leads to boost their profits. The bottom line, if you will, however, very few companies out there really can boast that they’ve actually taken a serious strategic approach to deploying AI algorithms in their software infrastructure and their software stack. And, you know, it is, like I said, it’s more of a 1% ai problem where a handful of the visionary companies with typically with big budgets, they’re typically, you know, multinational global corporations, they realize that there is a great deal to be gained with very low potential risk at the same time. So they seemed perfectly comfortable spending some money on developing a data science team and making their you know, I should say, becoming an early adopter of AI when it comes to actual implementation of various AI algorithms, as well as data science tools overall in their in their operating infrastructure. Meanwhile, the mainstream of the business organizations out there are still very much left on the outside looking in. So far, if if a company wanted to deploy any serious AI capabilities in their software infrastructure, that pretty much by default, required that they hire and either hire an in house data science team, and acquire an actual infrastructure engineering team that would develop a physical as well as base software infrastructure to run data science and AI algorithms. And that of course, costs quite a bit of money. And it does require a considerable amount of expertise, which today is still in a great deal of deficit. It’s still fairly hard to come by. And the schools of course, the power universities across the globe are producing data scientists as quickly as they can. But there is still a pretty significant deficit for that area for that specialization. So that, where does that leave the, basically the 99%, as I refer to them, so far, most of them simply have not been able to, to even seriously play around with AI and machine learning capabilities. And they’ve basically been doing what they’ve been doing for the last 20-30 years. Most of them, you know, who, who did want to, who did want to do some sort of a decision making implement some sort of a automated decision making in their software stack, they typically use rules based software, that is, of course, very limited, because it’s not, it’s not based on the dynamics of the immediate situation in the immediate scenario at hand. So to use a very common example, if you have an ecommerce store it, it, of course, can have some basic rules, rule sets, but built in baked into a script, that would, that would tell the machine or the controller to perform a certain task, whenever a visitor comes to, you know, looking for a specific recommendation, or looking for, you know, looking to do something in their store or purchase something in their store. That’s great. But of course, that if you have a rules based algorithm, that’s not based, that’s not using AI, in essence, you’re trying to trying to serve as this potential client by looking in the rearview mirror. And, of course, there’s only so much that you can do, of course, you know, the really cool part of machine learning and AI, is that you can actually have a machine or an algorithm monitor all the real time details surrounding this particular visit, or in my fictitious example of an ecommerce store. And based on what it’s seeing, it can make a real time decision that is a lot more likely going to result in in the in the purchase or in the customer being delighted, because he or she managed to get a great recommendation, when perhaps they least expected it. So anyway, the long story short, and that is, by deploying AI by by utilizing the toolkits that are available with machine learning data science, and other affiliated technologies in that space. Very few people today argue that there is nothing nothing to be gained. However, very at the same time, very few people, especially the smaller and medium sized businesses with typically tighter budgets and, and more limiting real human resources constraints, they’re typically locked out, you know, just costs too much. And they just don’t have that kind of those kinds of resources and expertise to, you know, to throw into data science or AI or machine learning. So that’s basically where we come in, we are trying to bring the both the price points associated with AI and machine learning down to a point where a typical, you know, middle of the road, SMB business, should be able to afford it. And at the same time we are performing, we have implemented a number of unique features, such as automation, that would make it very easy for that type of a user that type of client to actually implement elements, functional elements of AI and machine learning in their infrastructure. Without that requirement that I mentioned before, without requiring that the they hire onboard data scientists or spend a lot of money on a data science infrastructure to complement their existing operational infrastructure. So that’s, in essence, what we’re trying to do and we’re hoping that ultimately we can deliver a tidal wave of benefits to a very large number of of people and businesses that otherwise until now have been unable to, to access them.Paul StarrettGreat, no, I and that’s a great lead in actually, I think you stated the the existing state where things are the 1%, and then the lockout, if you will, of the remaining 99%. And I think it’d be helpful to get down under the hood a bit more into what datasource.ai does. If listeners aren’t familiar, there’s a company called Kaggle, which was recently, I guess they were purchased by Google. And Kaggle, what they do is they put out a challenge or a problem, and they ask for people to submit to kaggle solutions. And if they are, if their solution is chosen, they’re given a cash reward. Often that’s, you know, 50,000, 100,000, it’s quite a bit of money. But the idea is to get all of these contributors who are competing for that prize. And in so doing, they’re getting really this very high quality very sort of, well, sort of the competition has drawn out the best of the, the those who are contributing what we call sort of the crowdsourcing. And what you’re doing datasource.ai is taking the concept and making it much more available, kind of the Henry Ford, if you will, you’re, you’re allowing it to come to the masses. And so you have a smaller sometimes, you know, the cash prize, if you will, could be 5000, it could be free, really depends. But the idea is that this the SMB, the small to medium sized business, then has access, they put up a cash mount, like $5,000, I’m just picking names out of that are numbers out of a hat, you they then come to you, and then you get this competition. And I think that let me know if I haven’t stated that properly, but also need you to state of I think you’ve got quite a few projects, going.Dimitry KushelevskyWe got it, we are definitely turning some heads and attracting, frankly, a lot of heavyweights in the data science community who, as we’ve already demonstrated, who are happy to contribute their skills and the energy and creativity to, you know, to help us become successful. Yeah, we’ve done a number of projects, as you mentioned, that, in essence, our data science competitions, but so far, or most of them, were did not have a cash prize associated with them, we just wanted to, you know, to try out our, our platform to make sure that the features and automation and other capabilities are working as, as planned. And at the same time, we wanted to test just the general assumption behind our business model, which is, you know, there is a very committed very high energy, very vibrant community supporting data science, as well as implementations of AI and machine learning in the mainstream businesses and other organizations. So, so far, we’ve been very, very pleased with what we observed, we are actually beginning to monetize our, our platform now. So it’s very exciting time as well, because I want to to offer actual cash prizes, to, to the winners of the of the most successful algorithms that our contestants have got submitted. And also what we’re doing, you know, thank you for bringing up Kaggle. While the concept behind crowdsourcing AI or machine learning algorithms is actually quite similar between what we do and what Kaggle does. But there are certainly a number of unique capabilities, starting from the differential between the markets, the target markets that they focus their offerings toward, versus what we’re trying to do. So as I mentioned earlier, we’re really looking to bring it down to both a very low price point, as well as a very low requirement of, of the expertise and other dedicated resources that a any given client would have to have on board in order to use our system. But in order to ultimately, you know, develop a high quality machine learning algorithm and implemented in the, in their software infrastructure. Typically Kaggle project still would require data scientists onboard those data scientists will typically come with the project, you know, the customer, the client would be expected to bring him in the cash prizes with Kaggle are significantly greater, I’d say typically on the order of magnitude greater versus our target cash prize values. So by doing so, once again, we’re trying to really bring all these great benefits of AI and machine learning and data science into the global mainstream. So obviously, we’re you know that that entails that we would try to turn it into a very much a high volume, low barrier to entry type business model, and want to have lots and lots of businesses, you know, who could, who could, you know, realize very quickly that, hey, I can actually for for very little money. And without having to go and hire dedicated data scientists, to my team, I can actually go and develop one or more machine learning algorithms that are going to be high quality, they’re going to be designed by humans, by expert humans. And they are extremely likely based on that indicators that we’ve seen from the earlier deployments, they’re extremely likely to improve our business and grow our bottom line, which is ultimately what we’re trying to do. I mean, you know, ultimately, as far as our purpose goes, that we behind our company, behind both of our, but myself and my co founder, Daniel, we are really trying to, you know, we are passionate, obviously, we’re passionate about AI and data science and machine learning. And we are really focused on bringing all those great capabilities, all those great, fairly easily attainable benefits that the, you know, that customers can utilize, right down to the average business, the average organization around the globe, no matter what their budget, no matter what their size, no matter what their, what their ability is to, you know, to hire on board expertise and other resources. So that’s obviously because of that deep desire that Daniel and I share, and have shared from the very beginning, we have developed and launched a platform that is highly automated already. Although, of course, without question, as we progress as we grow. And we have additional developer resources, of course, we’re going to continue to, to enhance it. And, you know, and to add additional features and capabilities that that are only planning today. And the ultimate benefit is that as we get more and more clients, utilizing our platform to crowdsource high value, high capability, high quality machine learning algorithms, as they deploy those algorithms, they will undoubtedly be getting very impressive results based on everything we’ve seen in all the studies we’ve read so far, they are really setting themselves up for a great deal of additional success, even if they’re a successful company already. So that, of course, is why Daniel and I are very excited to be doing what we’re doing. And we’re even more. So more. So we’re even more excited about the future that, you know, that this technology holds that we could potentially bring to the mainstream business customers around the globe as we grow as a company.Paul StarrettYes, and that’s, that’s great. I it it let’s it leads me to think of the the crowdsourcing, it’s not only does the individual company get the benefit of the, of your platform and your expertise between you and your co founder, in addition to all of the teams that are competing, to satisfy some goal that the competition so to speak, is put to, there’s also, this is going to lead into, I think the part here where we’re gonna get into the challenges that come with this, that what you can do is you can have, let’s say different companies that are perhaps in the same vertical the same domain, share your information, to gain the synergy across their different insights, learn from the machine learning efforts. The problem is, especially in highly regulated industries, with if getting the data is the big problem. And the one of the biggest barriers there, of course, is privacy regulation and data protection laws. And the idea there is that there are techniques, there are a solutions that allow you to essentially create a different data set that’s called there’s various things here now, it’s a big, it’s a fairly large topic. We cover this, I just finished a podcast with Patricia Thaine, which you’ll find on our website which discusses privacy preservation technologies in the grand scheme. But for right now here with machine learning, we’re going to focus on synthetic data. What that is, is is a method by which an algorithm will take the original data that contains private sensitive data. And it replicates it. But it leaves behind any remnants of the sensitivity, or of the privacy of the underlying data, thereby kind of lifting it up and out of those concerns. So now you can share it it’s not a panacea, there’s a thing called the privacy budget, which says that the more that you remove the privacy and sensitive information, the less valuable your data becomes to a machine or machine learning algorithm. And it’s not a it’s not a simple process, but it’s very doable. And so Dimitry, I think, you know, Ealax company mentioned earlier, they do this, and be able to do it for things like a banking and financial services. And I know, Dimitry, you personally have quite a, quite a bit of background in this area of financial services. What is your perspective on the promise of synthetic data and your thoughts on what it is and, and, and how we expect to see that utilized not only for a company to do it just for the internal purposes, but then perhaps to share it with other?Dimitry KushelevskyYeah, absolutely. So without question financial, the financial vertical financial industry is one of the one of the verticals, that is really, really well positioned to take advantage of AI and the power of the capabilities that that it can bring to them, again, with, you know, with the help of a company like ours, for a very low cost and a very low resource requirement. And, again, it seems that, I guess, because the financial industry is so close to business, and so, so close to recognizing the the material aspect of what this kind of technology these technologies can bring, they they’re getting it, you know, they clearly they’re, they’re sensing that this is not just a fad, AI is here to stay. And, again, there’s they’re seeing like the smaller local institutions are seeing that the the larger brands in their industry are deploying AI either I would say the, you know, the larger financial vertical representatives are among those early adopters who, you know, who have done some strategic early deployments, and they actually have benefited from them pretty significantly. So, you know, what, what does what does does the future hold or what does what kind of capabilities, what kind of benefits does does it hold for for Finance? Well, there are so many great applications, right, I normally start looking at any business opportunity or even a use case scenario by by examining the what what the customer’s needs are, and in this case, in the financial vertical, the customer’s needs are quite extensive, right, they are the most of the banking institutions and financial institutions already have considerable amount of data that they have been collecting about their customers just as a part of their day to day operations. And of course, because they are required to do so by law, right. So, for one, they already have a great important ingredient that many representatives of other verticals may or may not always have. So, they have the data, they also have very specific means such as they want to remain competitive, they want to, they want to be able to offer new services, they want to target their, their marketing and other customer focused materials better. And ultimately, they of course, they want to save on their operations as well. Another another huge opportunity for the financial industry across the board, of course, is something that we discussed earlier. Is, is the fraud and, you know, criminal activity prevention. So AI, of course, I’m you know, I’m I’m very excited, you know, banner waving, waving, you know, person in the AI ecosystem. So yes, I do admit that I might be a bit biased here. But AI, I really would, would strongly submit that AI provides a tremendous opportunity, perhaps much more powerful than any other source of tools available today, to address all of these use case scenarios, and they’re really exciting part to me here is that we would be, by developing AI algorithms and other AI based solutions, we could directly and very positively impact you know, those customers and meet their needs. You know. So that’s, that’s really exciting part, ultimately, everything has to, you know, begin and end with the customer. So anytime that we have, we have a customer who already has a demonstrated set of needs that can directly impact their, their business in a very positive manner. Of course, any business person will be very excited to offer their platform or their solution to help their their users get and get exactly that effect. So, yeah, there’s a, there’s a lot, a lot to do a lot of opportunity. But of course, there is always, as always, there is a challenge. And the challenge is quite significant in financial spaces, that has to do with regulation. And it has to do with the severe privacy protection regulations that virtually all the financial institutions have to abide by across the globe. Right. So that is one big challenge that that without, with that, unless we find a way to solve it as an industry, I think, you know, Ai, and machine learning and data science will be extremely limited in terms of the depth and breadth of those benefits that we can deliver. So having having companies like Ealax around producing very close proxies for the customer’s actual original data, however, without disclosing any of the any of the private or personal or confidential information associated with the bank, or its customers, or without with institutional risk customers, could may very well be the difference between all those institutions, being able to take advantage of these great, but your business benefits and not being able to do so. So it’s really quite a big development.Paul StarrettYes, I agree. And I think I wanted to sort of slip in an elevator pitch that I have to kind of encapsulate what you said about, you know, how data is becoming much more vexing even for the midsize, and small companies. Because, as we know, the the amount of data that companies generate is growing exponentially every year. And the only way to really wrangle it is with with machine learning. That’s all you’re left with. So it becomes the new normal becomes the best practice. I think some unique things that we can share with our listeners, is that synthetic data does allow not only for us to drop out the sensitive or private information, again, though, want to emphasize it’s not a panacea. There’s there’s some, some knobs to turn. And there was some loss of insight, but often no free lunch, right? Exactly, exactly. So privacy budgets, you got to pay somewhere. But I think generally it’s very much a net gain. But there’s an upside to that as well as it with synthetic data, you can actually gain more insights from the underlying data that go above and beyond what you’d expect to build in a machine learning model from that data. Because the synthetic data can generate new types of transactions and new types of scenarios that a machine learning algorithm can then use. It also has the ability to some other issues around regulation has to do with explainability of machine learning, how’s it working? Do we know what the model the machine learning model is doing? You can you can add into this synthetic data, metrics, and other information that help you establish, you know, how the explainability, which is a very big piece of the privacy, regulations and so forth. GDPR has specific requirements around that, as do most laws, and just for just a picture of my own, you know, blow my own horn here and pay some bills, that’s what PrivacyLabs, does we help come in and make sure that I have a background in machine learning abd law. And so I’m able to help bring things together, get the machine the get the explainability in there, and to make sure that the the compliance professionals understand the technology, and what’s happening and make sure that all kind of comes together, profitable and compliant way. So that’s kind of our role in this. And I of course, look forward to working with you and, Ealax and other companies to to sort of bring this to the market. I think that’s, I think from the standpoint of the so that really the goal here is that democratization of data and I think maybe we can finish on this topic. That we’ve basically covered the idea that the individual institution, whether they’re small or midsize, really, I think is where the the, the, the issue of the need is, is most vexing. The data is getting bigger and faster, more complex. And then machine learning really is the best way to save money and reduce risk and so forth. But this also the ability to build to make a better world and Dimitry this is a big piece of Absolutely, it’s in your heart is that, again, could we have, let’s say financial services, institutions share all of their data together to build kind of a, for example, a fraud machine learning model, that is sort of a superset of all of the intelligence has come from all of the things. Again, I think that when we get into things like synthetic data and other things, that becomes much more realistic. And you have this sort of crowdsourcing in its own right, in that regard.Dimitry KushelevskyAnd you get to use the wisdom of the crowd to solve solve some of the biggest challenges that were dogging the entire industry across the globe. So yeah, this is one of the many excellent value points behind the entire technology.Paul StarrettYes, yes. In the area of for those who are a little bit more maybe savvy in the direction of data science, a thing called transfer learning where you’re taking, essentially, the typical case is deep learning neural networks, and you’re able to take the prior models that have been built, and then leverage that background. Transformers are a typical example. But again, that’s that just sort of a aside, mentioned, for those of us who are a little bit more into data science. I think that kind of rounds up again, the purpose that the idea here was the democratization of data sharing, it’s being able to leverage democratization for crowdsourcing of information around a specific problem for a company, such that they can then become the can enter the market and remain competitive by being able to leverage and have access to machine learning, but also in the ability to have domain share information for the common good. So I think that we’ve done a great job, frankly, I think in this what is roughly half an hour,Dimitry Kushelevskythere’s a lot of ground to cover. For some, like yourself, I’m sure you know it, there’s a great temptation to get into the weeds, because there are so many great use cases and so many great applications, and ultimately, so many incredible benefits, business and personal benefits that we can deliver to literally billions of people out there with this with this type of technology. That, of course is very, very exciting. And, you know, frankly, that’s, I think, very much a part of our future. You know, if I just read a PwC sourced study recently, where they claim that by the year 2030, they explore we expect that AI is going to add a little over $15 trillion, that 15 trillion. Yeah, one 515 trillion dollars to the global economy. It’s incredible, just absolutely incredible, frankly, even today, closer to home, so to speak, or closer to our timeframe, right now, the machine learning but your industry is measured somewhere around nine or between nine and $10 billion. Obviously COVID kind of played with those numbers, like with any other numbers, but I believe that’s still more or less where we are today. But the really exciting news, and I believe the study this study, mistaken came out of McKinsey, they are actually forecasting a 39% year over year, compound growth rate for the next foreseeable future, I believe the by the year 26 or 27, they’re expecting this number to go grow up to around 120 127 billion. So it I mean, these are astronomical numbers. You know, and you mentioned earlier that, yes, there are certainly multiple applications that are multiple entrants into the AI and machine learning sourcing space. And I’m, I’m certain there will be more I don’t think it’s that big a reach to to forecast that it’s going to get better and better and bigger and more. You know, that densely populated as far as the AI industry goes. But my you know, the way I see it is there’s so much great potential, it’s truly just an ideal, you know, textbook case of plenty for mentality, it’s something that we are going to, we can, we can build new solutions within to develop a tremendous amount of value added to, you know, to literally millions, if not billions of customers. So there’s plenty of, there’s plenty of good to be done, you know, that’s a really, really exciting part. For everyone who is already in this space or is considering, you know, entering it, including the folks who are potentially going to be our future customers, we welcome them to come and check us out. And, and, you know, we offer a free consultation for anybody who’s interested in exploring what, you know, what we offer, and how it may be able to benefit their their business, their operations or, you know, overcome any other challenges that they might be facing?Paul StarrettYes, yes. And I did want to sneak in here one more comment about an and then I’m gonna ask you to retreat for your, your closing thoughts on what you think we haven’t haven’t covered or something you think needs to be emphasized. But I think one of the other things that we keep talking about synthetic data. And I just want to iterate the reason we say that is because Gartner has predicted this 60% of machine learning will be based on synthetic data by 2024. That’s right around the corner. So I think that kind of gives us a sense of, there’s an there’s an area, and I’ll make this brief because it’s a technical area, that the software development lifecycle has really moved to what they call an agile framework, which requires very quick turnaround. And that is the new normal for the development of anything, any kind of software or any solution that’s being used by enterprise. And the problem is, is that to get the data, it takes a long time, contracts and laws and other things require months. And you don’t have that time when you have an agile process in software development that requires a daily kind of turnaround. So this synthetic data allows you to generate that data much more quickly and get get to pay dirt. I just wanted to do that. That’s a very new hot topic that we’ve kind of tripped over here from other discussions. So other than that, I’m going to finish here. I will anything, Dimitry, you think we should, you know, we’ve got a few minutes here. Anything you think that we should know, that we haven’t discussed or anything you want to emphasize?Dimitry KushelevskyYeah, well, the one of the most interesting challenges that we are up against right now is, is we rather obviously, don’t want to boil the ocean, if, if you if you know what I mean, there are so many great use case scenarios, there are so many great applications for AI for machine learning for, you know, quite literally running data science competition, that we you know, we have to be very judicious as far as which ones we pursue, it was a great temptation between both founders to try and just go after every interesting opportunity, every challenge that has a real business need and real data behind it, that the customer may already have a potential customer. But we find ourselves deliberately, you know, keeping ourselves disciplined in a way that we want, you know, we’re trying to validate our major assumptions that will rather obviously, you know, provide us the, our go to market and our business, evolution projectory for, you know, for the foreseeable future. So, I With that in mind, so yeah, it’s a great problem to have. And with that in mind, I, again, I want to welcome any, anybody who’s interested in playing in this space, and even just checking us out and seeing and discussing with one of our experts, or one of us directly, what we can, what we can do and how, in specific terms AI and machine learning can, can help them overcome their challenges and grow their business and bolster their bottom line or take better care of their customers. So once again, I of course, we would love to, I would love to welcome additional people who are either as excited about AI as we are, or perhaps they’re just intrigued. And they, you know, if nothing else, they want to see, hey, let’s talk and let’s see what what this technology and technology may potentially have in store for them and their business. SoI again, I welcome people to listening to this or intrigued about the potential benefits that they can gain with AI Data Science and machine learning, I welcome them to come visit us. If you know today’s if they are interested, they, if they are intrigued by what you and I just discussed, they’re intrigued by the content that we’ve posted on our web page. I, of course, would love to chat with them, and they can just click on the free consultation by and schedule a few minutes to chat with us, I think, you know, every single conversation is, is very interesting to us. Because, again, it kind of helps us to triangulate the most promising opportunities for us to deliver maximum value. So don’t wind up boiling the ocean, but we ultimately wind up, you know, meeting the meeting our our mission requirements of our mission and helping businesses accomplish their Akash accomplish their goals for success. And hopefully better than any other alternatives out there in the marketplace, which I do strongly believe that we can. So thank you for thank you for the opportunity.Paul StarrettYes, no, my pleasure. And I just so people know, I guess the website is its datasource.ai. And it’s all one word, no hyphens, no dots or anything data source.ai. And I believe is it dimitry@datasource.ai?Dimitry KushelevskyYeah, dimitry@datasource.ai. You know, if and that’s, believe me, just having my first name is a blessing, as you know, because in this email address, because I have a long, you know, Ukrainian last name that that would confuse anybody. So, yes, but I, of course, would welcome you know, any, anyone who wants to reach out and, and connect with me directly.Paul StarrettGreat or they can go to your website, as you indicated. Great. Well, listen, I’m just going to close out here with some thoughts on PrivacyLabs sort of role in this is that the process of bringing artificial intelligence or machine learning into your enterprise infrastructure in one form or another, is a horizontally kind of active topic. And that’s where we can help to look at the security requirements, the compliance, I have an attorney who’s kind of specialized in compliance law, I’m much more technical, but I can help discuss the topics with the compliance folks and help sort of scope things and one thing we do in privacy Labs is we are we work with partner companies like One Trust and BigID, and TrustArc, and at one another, one of my favorites is Centrl. That we can use those tools to help kind of herd the cats to kind of bring everything together. We specialize in machine learning and automation and an audit so that we can make sure that everything’s going the way it would be expected either by by way of a regulator or to to make sure you’re, you’re covered legally at some level. So that’s kind of what we do. And again, Dimitriy thank you so much. And I think we’ll close out here, and I’m sureDimitry KushelevskyI wanted to give you a quick plug, Paul, yes, because I deeply appreciate what you do. As far as opening the gains for potentially a very large number of, of business owners and business executives, who, because of you and your work, will be able to take advantage of what we offer. So that’s I really appreciate having having met you and having had a bunch of really productive conversations that we had already. And I look forward to continuing very much along the same lines.Paul StarrettThank you. Those are kind words, and I wouldn’t disagree with you if I say so myself. I think we’ve really we’ve really positioned ourselves and it’s usually with with my guidance directly, personally. Yes, we’re sort of the concierge if you will, to kind of help people get in and cover all the bases horizontally and peripherally. So great. With that said, we will close ourselves out here. And Dimitry, we will have another podcast soon. Probably one of the updates or some other vertical or something. But thank you again. And thank you listeners. I hope that you learned a lot and watch for future podcasts from us. Thanks. Thanks all.

Daniel Morales

Apr 16, 2021

Business

Separating Hype From Value In Artificial Intelligence

You've probably heard a lot about data science, artificial intelligence and big data. Frankly, there has been a lot of hype around these areas. What it has done is inflate expectations about what data science and data can actually accomplish. Overall, this has been negative for the field of data science and for big data. It is useful to think a bit about the questions that can be asked to separate the hype of data science from the reality of data science.The first question is always "What is the question you're trying to answer with the data?" If someone comes to talk to you about a big data project, an artificial intelligence project, or a data science project, and they start talking about the newest technology they can use to do distributed computing, and analyze data with machine learning, and they throw a bunch of buzzwords at you, the first question you should ask is "What is the question you're trying to answer with the data?" Because that really narrows down the question and filters out a lot of hype around the tools and technologies that people are using, which can often be very interesting and fun to talk about. We like to talk about them too, but they're not really going to add value to your organization on their own.Also Read: Data Democratization and AI in the Financial SectorThe second question to ask yourself, once you've identified the question you're trying to answer with the data, is, "Do you have the data to actually answer that question?" So often the question you want to answer and the data you have to answer with are not really very compatible with each other. So you have to ask yourself, "Can we get the data in such a way that we can answer the question we want to answer?" Sometimes the answer is simply no, in which case you have to give up (for now). Bottom line, if you want to decide whether a project is hype or reality, you have to decide whether the data people are trying to use is actually relevant to the question they are trying to answer.The third thing to ask yourself is, "If you could answer the question with the data you have, could you even use the answer in a meaningful way?" This question goes back to that idea from the Netflix competitions where there was a solution to the problem of predicting what videos people would like to watch. And it was a very, very good solution, but it wasn't a solution that could be implemented with the computing resources that Netflix had in a way that was financially expedient. Even though they could answer the question, even though they had the right data, even though they were answering a specific question, they couldn't actually implement the results of what they found outIf you ask yourself these three questions, you will be able to decipher very quickly whether a data science project is all hype or whether it is a real contribution that can actually move your organization forward.How do you determine the success of a data science project?Small businesses rarely use cutting-edge technology, simply because it is not within their budgets, expertise or resources. However, almost all are called upon to experiment with such technology, because if they don't, someone else will, and ultimately whoever does will gain in competitiveness, cost or profitability.Defining the success of an AI project (which is technically called data science or machine learning) is a crucial part of managing a data science experiment. Of course, success is often context-specific. However, some aspects of success are general enough to merit discussion. My list of hallmarks of success includesAlso Read: What Are the Expected Results of a Data Science Project?The creation of new knowledge.Decisions or policies are made based on the outcome of the experiment.A report, presentation or app with impact is created.You learn that the data cannot answer the question you are asking.Some more negative outcomes are: that decisions are made that ignore clear evidence from the data, that the results are equivocal and do not shed light in one direction or another, that uncertainty prevents the creation of new knowledge.Let's talk first about some of the positive outcomes.New knowledge seems ideal to me. However, new knowledge does not necessarily mean that it is important. If it produces decisions or policiesIf it produces actionable decisions or policies, even better. (Wouldn't it be great if there were evidence-based policy, like the evidence-based medicine movement that has transformed medicine?). Having our data science products have a big (positive) impact is, of course, the ideal. Creating reusable code or applications is a great way to increase the impact of a project.Finally, the last point is perhaps the most controversial. I consider a data science project to be successful if we can demonstrate that the data cannot answer the questions being asked. I remember a friend telling a story about the company he worked for. They hired many expensive data science consultants to help use their data to inform pricing. However, the prediction results were not helping. They could see that the data could not answer the hypothesis being studied. There was too much noise and the measurements were not accurately measuring what was needed. Sure, the result was not optimal, as they still needed to know how to price things, but it did save money on consultants. Since then, I have heard this story repeated almost identically by friends in different industries.Also Read: * How the Biggest Companies in the World Design Machine Learning Applications* What Is Open Innovation In Data Science?

Daniel Morales

Apr 16, 2021

Business

How to Build Your Data Analytics Team

Peer reviewed by Kat Holmes — Data Director ITV‍As businesses recognize the decisive power of data to achieve business goals, most are hoping to put data in the driver’s seat of their business and product strategies. This entails putting together a strong data team which can effectively propagate its insights across different areas of the business. Unfortunately, this is no easy task.To be truly data driven, companies need to build three capabilities: data strategy, data governance and data analytics.3 pillars for data-driven companies — Image from PitchStrategy: Data strategy is your organization’s roadmap for using data to achieve its goals. It requires a clear understanding of the data needs inherent to the business strategy. Why are you collecting data? Are you trying to make money, save money, manage risk, deliver exceptional customer experience, all the above?Governance: Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the efficient use of information in enabling your organization to achieve its goals. A well-crafted data governance strategy ensures that data in your company is trusted, accurate and available.Analytics: The term data analytics refers to the process of analyzing raw data to draw conclusions about the information they contain. Typically, those involved with data analytics in an organization are data engineers, data analysts and data scientists.Ultimately, your ability to leverage data will depend on these three pillars. If you’re reading this and realizing that your organization possesses none of these, don’t worry. That’s why we’re here. A good place to start is to build a strong analytics team, one that is closely tied with the strategic goals of your business. It is the first pillar of your data organization, and the focus of the article.When building a data analytics team, heads of data typically grapple with the following questions:How big should this team be?How many data engineers, data analysts, data scientists?How does the team interact with the rest of the organization?Which structure for the data team? Centralized or embedded?They rightly do so; having a strong data team is not a luxury anymore, but essential to the very survival of a company today.Let’s start with the basics though.Where are you in your data journey?Before building a data team, it’s important that you realize where you are in your “data journey”, because this will directly affect the structure of your team. This part is thus dedicated to a simplified data maturity assessment. Beware, company size and data maturity are two different things. Your organization can be large but immature on a data level.Data maturity is the journey towards seeing tangible value from your data assets. We propose a simple framework of data maturity assessment, in which you measure your ability to understand your past, know your present and predict your future. What do I mean by this?Well, in most companies each department has its own set of KPIs that support the execution of the corporate strategy. It’s not enough just to define them, they must also be clearly tracked, and you must also have the ability to predict future outcomes against these KPIs. This ability rests on a clear knowledge of your present, which, in turn, builds on a strong understanding of the past. Do this, and you have found a simple way to assess your data maturity. For example, if you’re unable to identify the revenue drivers for your company ( your past), it means you need to work on your data maturity by bringing visibility to your business before you seek to predict future outcomes. We don’t recommend skipping steps. It’s like Maslow’s hierarchy of needs, but for data.Read Also: What Are the Expected Results of a Data Science Project?Data hierarchy of needs — Image by Louise de LeyritzLet’s look at a couple of practical examples:Marketing ROI. Define your ROI, across multiple channels, by using an identified attribution model. Then understand its evolution in the previous 12 months, and especially its drivers (identify performing channels, time of the year, product, ….). Then track on a daily/weekly/monthly basis its evolution thanks to a reporting tool you trust ( present). Forecast your marketing budget based on these predictive models ( future).Customer Satisfaction. Define your customer satisfaction measure. Is it NPS, CSAT? Everyone in your company should share a common understanding of how it is computed. As with our previous example, compute its evolution in the previous 12 months, find its drivers ( past). Then track daily the satisfaction of your customers with trusted dashboards. Identify action to take from today to increase it. Your understanding of the past and the present state of customers satisfaction will allow to predict churn efficiently ( future)Understanding your past and present is commonly referred to as performing descriptive analytics. Descriptive analytics helps an organization understand its performance by providing context to help key stakeholders interpret information. This context is usually in the form of data visualization, including graphs, dashboards, reports and charts. When you are analysing data to forecast the future, you’re engaging in predictive analytics. The idea with predictive analytics is to take historical data, feed it into a machine learning model that considers key patterns. Apply this model to current data, and hope that it will forecast the future. We’ll use the terms of descriptive and predictive analytics throughout the article to refer to understanding the past, present or predicting the future.If you realize that your organization is not fully mature (ie. you don’t have a clear understanding of your past and present), here are our recommendations for what should be the next steps of your data team.Key players on a data analytics teamA data analytics team is usually composed of four core functions, which are detailed below.Data engineer: They are responsible for designing, building, and maintaining datasets that can be leveraged in data projects. As such, data engineers closely work with both data scientists and data analysts. We also include the new role of analytics engineer here, although, in practice, this role lies between analytics and engineering.Data scientist: They use advanced mathematics and statistics, and programming tools to build predictive models. The roles of data scientists and data analysts are pretty similar, but data scientists focuse more on predictive analytics than descriptive analytics.Data analyst: They use data to perform reporting and direct analysis. Whereas data scientists and engineers typically interact with data in its raw or unrefined states, analysts work with data that’s already been cleaned and transformed into more user-friendly formats.Business analyst/ops analyst: They help the organization improve its processes and systems. They focus on dashboarding, answer business questions and propose their interpretation. They are agile and straddle the line between IT and the business to help bridge the gap and improve efficiency. They frequently work with a specific business area such as marketing or finance, and their SQL literacy can range from basic dashboarding to advanced analysis.Head of data analytics: They provide strategic oversight to the data team. Their goal is to create an environment that allows all different parties to access the data they need painlessly, build the skills of the business to draw meaningful insights from the data, and ensure data governance. They also act as a bridge between the data team and the main business unit, acting both as a visionary and a technical lead.‍Read Also: What Is Open Innovation In Data Science?How large should the team be?Different companies will build data teams of different sizes, no one size fits all. We have studied the data team’s structure of 300+ companies, with a 300–1000 employee range and derived the following insights:As a general rule, you should aim to have a total of 5–10% of data analysis savvy employees in your company. Some companies such as Amazon or Facebook are training a huge portion of their employees, but we have excluded them for our analysis.The first hires of a brand-new data teams are often a data engineer and a data analyst. With just these two roles, organizations can already engage in some basic descriptive analytics. When building a larger team, think in terms of the skillset you need. A typical data project requires the following skills: database, software development, machine learning, visualization, collaboration, and communication skills. It is very rare to find individuals who possess all these skills. You should thus be aware of which skill each candidate brings to the table. Regardless of how many people you decide to hire, your team should ideally cover this skill set. Where you are in your data journey also impacts who you hire and at which stage. Generally, data analysts focus on understanding the past. That is, they take the data you have and try to understand the drivers of growth and other metrics. Business analysts/obs analysts are oriented towards the present (dashboarding). Finally, data scientists focus on predicting future outcomes. So, if you have trouble understanding your past, hire a data analyst instead ahead of a data scientist.What should ultimately guide the size of your data team is the number of business problem statements and the complexity of the most serious problems. Look at the size of your roadmap and establish how many people you need to complete your data projects within a reasonable amount of time. If you realize it would take more than a year for your data team to complete its projects, then it’s probably time to expand the team. We also encourage you to look at your run vs build ratio. Members of your data team ‘run’ when they work on daily business operations, focusing on the present performance of the organization. They ‘build’ when they work on long-term projects, such as adding new features to the product. Your data team should be running 2/3 of the time and building 1/3 of the time. If your data team spends all its time focusing on day-to-day needs, you are jeopardising the future of your company, and it is probably time to expand the team.Finally, you might have to make some project-specific hirings. If you’re a fintech conducting a project on fraud detection, or a company specialising in dispatching for logistics, you might want to hire someone who knows the specifics of your industry.How does the data team integrate with the company?There is no perfect structure for an analytics team, and your structure is likely to change many times. If your data team structure hasn’t changed for the past 2 years, then it’s likely to be a sub-optimal structure. Why? Because the data needs of your company are evolving rapidly, calling for an adaptation of your data team’s structure. Also keep in mind that the more static your organization, the harder the next change will be. For this reason, we don’t prescribe a given structure, but rather present the most common models and how they can be suited to different types of businesses.The very first step to take when structuring your data team is to find the data people that already exist in your organization. They might not be just the people with the term “data” in their title, but they could be any employee who’s not afraid of data analysis or has SQL skills already, such as business analysts/ ops analysts. If you don’t take the time to locate pre-existing data people carefully, you are likely to end up with an unplanned data team structure, unlikely to fit your business needs.‍Centralized modelCentralized model for data teams — Image by Louise de LeyritzThe centralized model is the most straightforward structure to implement, and it is usually the first step for companies who aim to be data driven. There are, however, a few drawbacks to this model, which are referenced below. This structure usually leads to a centralized data “platform”, where the data team has access to all the data, and services the whole organization in a variety of projects. All data engineers, analysts and scientists within this team are managed directly by the head of data. With this structure, the data team is reporting in a dotted line to data stakeholders based in business units, in a consultant/client-type relationship.Read Also: The 3 Basic Principles of a Data-Driven CompanyThis flexible model is adaptable to the continuously evolving needs of a growing business. If you’re at the beginning of your data journey, that is, you still struggle to have a clear vision of your past and present, this is the structure we recommend. The data team’s first projects will seek to bring visibility to the business, ensuring all departments in your organization have KPIs and dashboards they can trust. This kind of structure is particularly good for analytics where reusability and data governance are important.Advantages✅ The data team can help with other teams’ projects while working on its own agenda.✅ The team can prioritise projects across the company.✅ There are more opportunities for talent and skillset development in a centralized team. In fact, the data team works on a broader variety of projects, and data engineers, scientists and analysts can benefit from their peers’s insights.✅ The head of data has a centralized view of the company’s strategy and can assign data people to projects that are the most suited to their capabilities.✅ Encourages career growth, as data engineers, scientists and have clear perspectives of seniority roles.‍Drawbacks❌ High chance of disconnect between the data analytics team and other business units. In this model, data engineers and data scientists are not immersed in the day-to-day activities of other teams, making it difficult for them to identify the most relevant problems to tackle.❌ Risk for the analytics group to be reduced to a “support” function, with other departments not taking their responsibilities.❌ As the data team serves the rest of the business, other business units might feel like their needs are not properly addressed, or that the planning process is too bureaucratic and slow.‍Decentralized/Embedded modelDecentralized model for data teams — Image by Louise de LeyritzIn a decentralized model, each department hires its “own” data people, with a centralized data platform. In this model, data analysts and scientists focus on the problems faced by their specific business unit, with little interaction with data people from other areas of the company. With this structure, data analysts report directly to the head of their respective business unit.Advantages✅ Embedded teams of data people are agile and responsive, because they are dedicated to their respective business functions and have good domain knowledge.✅ Product managers can assign data tasks to the people most qualified to work on them.✅ Business data teams don’t have to fight for resources to build their data project because the resources sit in the teams.Drawbacks❌ Lack of source of truth, duplication of data content❌ Data people end up working on redundant issues due to a lack of communication between different teams.❌ The creation of silos leads to productivity erosion since data people can’t draw on their colleagues’ expertise as they do in the centralized model.❌ This model makes it harder to optimally staff data people on different projects.❌ Business managers, usually lacking technical backgrounds, will find it hard to manage data people and understand the quality of their work.‍Federated model/ Centre of excellenceA federated model is most suited to companies that have reached data maturity, have a clear data strategy and engage in predictive analytics.Center of excellence mode l- Image by Louise de LeyritzIn the Centre of Excellence model (COE), data people are embedded in business units, but a centralized group that provides leadership, support and training remains. If data analysts and scientists are deployed across business departments, you would still have a data leader (or a core of data leaders according to company size) who prioritizes and supervises data projects. This ensures that the most beneficial data projects are tackled first.This strategy is most suited to larger, enterprise-scale companies with a clear data roadmap. The centre of excellence model entails a larger data team, as you need data scientists both in the COE and in the different business branches. If you are a small or medium company, your needs might not require a data team of this size.Read Also: How to Make Your Company a Data-driven Organization?This approach retains the advantages of both the centralized and the embedded model. It is a more balanced structure in which the data team’s actions are coordinated, but also keeps the data experts embedded in business units.Again, it’s extremely important that you know who your data people are. When building a centralized team at the beginning of your data journey, make sure you don’t have business analysts/ops embedded in other departments. Otherwise, you will end up with an unwanted mixed model, creating complete chaos in your organization. When creating a COE, you need to ensure it’s wanted and planned.Advantages✅ The Centre of Excellence model provides the advantages of both the centralized and the embedded models.It still presents some drawbacks, though:Drawbacks❌ This model requires an additional layer of coordination and communication needed to ensure alignment between COE and business units.❌ Not fit for purpose for small — medium sized organizations, so these companies can then hook it to the benefits that can come with this hub and spoke model.Final wordsBuilding a strong analytics team is a key pillar you need to build if your company is to become data-driven. The extent to which you will extract business value from data ultimately depends on the strength of this team, and how symbiotic it is with the rest of your business. There is no made-to-order advice for the size, composition and structure of your data team. That’s why you need to understand the data maturity level of your organization, so that you can build a data team suited to your business’ needs and aligned with your business strategy.At Castor, we write about all the processes involved when leveraging data assets: from the modern data stack, to data teams composition, to data governance. Our blog covers the technical and the less technical aspects of creating tangible value from data.At Castor, we are building a data documentation tool for the Notion, Figma, Slack generation. Or data-wise for the Fivetran, Looker, Snowflake, DBT aficionados. We designed our catalog to be easy to use, delightful and friendly.Want to check it out? Reach out to us and we will show you a demo.Originally published at https://www.castordoc.com.

Daniel Morales

Apr 16, 2021

What's new in DataSource.ai?

Contents Outline

Daniel Morales

What's new in DataSource.ai?

New CEO

Discussions within competitions

Maximum of 50 submissions per day

Competition completion process

Timeline of completion

Late submissions

Certificate of participation in the competitions

Your public profile

Related Posts

Categories

Join Competition

Daniel Morales

Daniel Morales

Daniel Morales

Daniel Morales

What's new in DataSource.ai?

Contents Outline

Social Sharing

Daniel Morales

New CEO

Discussions within competitions

Maximum of 50 submissions per day

Competition completion process

Timeline of completion

Late submissions

Certificate of participation in the competitions

Your public profile

Related Posts

Categories

Join Competition

Most Related Articles

Daniel Morales

Daniel Morales

Daniel Morales

Daniel Morales