Houston, we have a question, E4: Yosef Akhtman, remote sensing & hyperspectral imaging expert
Our CEO Eugenia talked with Yosef about current issues around the lack of high quality data in AI and machine learning, the need for data-centric systems and analyses. Yosef founded Gamaya and grew it into one of the top players in the smart farming industry, producing AI-based digital agriculture solutions aimed at enhancing efficiency and sustainability of farming.
This interview we recorded more than a month ago, it was the main holiday season, so we held off publishing it as wanted more people to watch and read it. We believe it was a wonderfully insightful and topical interview, covering the topics of technology in agriculture, sustainability and to major part of it Earth Observation (EO) Most importantly, not from the glamorous side everyone's used to look at, but exposing the real problem in EO industry that many experts overlook or decide not to talk about for their own reasons.
This episode's guest is Yosef Akhtman We've known him personally and professionally for a long time, and it's hard to find a person who truly cares about sustainability, technology, and the intersection of the two more than Yosef. He is a scientist, expert in remote sensing, EO, hyperspectral imaging, AI and machine learning. Yosef founded Gamaya and grew it to be one of the most prominent players in smart farming industry, producing AI-based digital agriculture solutions aimed at enhancing efficiency and sustainability of farming. He's now working on several other projects around Earth Observation and sustainability - his cloud removal work demonstrated incredible results.
On one side we have a lot of sustainable initiatives and a lot of technologies which can help us to be sustainable. But often there is a lot of inertia, because we have change people's habits, educate, etc. Is inertia a roadblock to push forward technological sustainability?
First, I think that these are two very different categories. So I wouldn't compare them directly one against the other. Technology is a tool, or in our case, it's a stack of very sophisticated, advanced tools that have been developed to achieve certain goal. And using technology, you can achieve anything.
You can annihilate all life on Earth using technology and you can use it to achieve sustainability. Now sustainability is a kind of, I would call it dynamic state. It's where you do something, you live your life, you build things. But at the same time, you don't undermine the environment. And you don't destroy your chances and the chances of your children to survive and continue to live their life. Now you can definitely use technology to improve sustainability.
For very long time we focused as a society on growth, and we used technology to achieve growth. Some say uncontrollable growth, while pretty much ignoring sustainability. Now we will have to somehow reckon with this situation, and we will have to shift our focus or maybe not, and then we will disappear. We are now in the crossroads where we'll need to decide what to do next and technology definitely will play a very big part of that. Now, technology in terms of challenges, real sophisticated technology is always very difficult to develop.
The development always riddled with challenges and roadblocks of all sorts. I think right now the innovation and development of technology is going in a neck-breaking speed, it's faster than ever. But it doesn't change the fact that every step is very difficult and challenging.
There's a particular challenge where technology meets sustainability, is that historically, it's been very difficult to mobilize resources for technology that was targeting sustainability, because the sustainability has much lower short-term return on investment as compared to growth. Just by definition growth is growth, so it's very easy to sell this to potential investors. Sustainability is different, even today most of investment, I would say without being too critical, that most of investors that invested in sustainability mostly see it as a gimmick I would say.
Or it has to be sustainability, but still has a huge return on investment, which doesn't work well very often. So it still remains a challenge, although obviously, there is a considerable improvement in this.
And this inertia, you don't think it’s a roadblock? So it's rather, the unit economics, right? So the delayed return on investment, rather than the actual inertia of people using it?
Particularly in agriculture actually, I don't feel that there is. Obviously there is some inertia, there's no denying that there is some inertia. But to a large extent, inertia is not the biggest factor.
The biggest factor is trying to deploy immature innovation and technologies, the farmers don't like that. Farmers are very technically savvy, they're very smart, but they don't like gimmicks.
So all of this techies like myself, when we come up with all kinds of fancy, gimmicky computer screen solutions - they are interested, they are curious, but then they try once, and they try twice and it doesn't really work the way it should.
And this boundary between sophisticated technology and agricultural field, is very difficult to cross, because the technology needs to be rock solid. It needs to be a tractor. Then there is no problem. Once farmers see the real benefit, even there are plenty of farmers that would go to a great extent to improve sustainability for the sake of sustainability - people understand that. But they don't like gimmicks, and so you need to come to farmers with a mature solution to a specific clearly defined problem. This is very much lacking in this industry because people are coming from technical side, too enthusiastic about the computer skills. Don't understand sufficiently well the real agricultural issue at hand, and this is the main challenge. Not the inertia of the farmer.
It’s interesting, because in tech world, we are always taught to or pushed to release a product when it's an MVP, when it lacks functionality. And for this market, it doesn't work, right? In this market should be mature, debugged properly, a fully functioning full-featured product.
I think in many industries, a lot of people that use apps, the younger generation, they use apps because they like to use apps, they don't really expect any significant efficiency increase or sustainability increase out of using the apps.
The tolerance to bugs is very high because they just enjoy using their smartphone and that's it. If it doesn't do what it's supposed once in a while - it's not a big deal. But the situation is very different in industrial world, where people need to get their job done, right? They don't use these tools for the sake of enjoying these tools, they want their job to be done. And this is the boundary that a lot of the techie companies under-appreciate.
And really the level of maturity of some of these products. This has been a huge I think challenge for the drone industry. Where this boundary between industrial use and the gimmick, and a toy, has been very much under-appreciated.
Well, probably the same can relate to the space sector, which brings me to the next question. Earth Observation (EO) and the hype around it - what is the future of it according to you? The EO data, we saw that huge hype started in the recent years. Everyone says we're going to use it for everything. Whatever you need - the solution is observation. At Dotphoton we deal a lot with it: we work with AI, we work with quantum, with EO, autonomous vehicles, you name it. It’s not a hype for us, it’s our daily life. What is the real technological and purposeful use of EO data in the near future?
Earth Observation is obviously a huge industry, very particular industry, with its own kind of quirks. The Earth Observation industry for decades has been mainly funded by defence industry. This developed a certain structure of the industry, it has been funded by governments and defence budgets, and obviously, there was some substantial funding coming from science, scientific budgets, research, but most of it was defence.
Now, as you mentioned, everybody is absolutely certain and clear, and I’m convinced this it is true, that there is enormous potential of economic benefits that can come from Earth Observation technology. The reality is that there are very few real, commercial applications today that exist.
This is indeed to a large extent a result of the legacy of the industry structure that has been funded by defence budgets. Which means that the data, usable data, is very expensive, and is structured in a way that is suitable for these defence applications. The financial creation of defence use cases are dramatically different from commercial applications. You have analysts, human analysts, that look at these maps and do certain observations. This is something that obviously, you cannot afford for any practical commercial applications.
So what will need to happen before any of this hype will come to fruition, is that the commercial applications of Earth Observation need to be fully automated. So this is an ideal. You can't possibly think of a better use for artificial intelligence, machine learning - interpretation automated interpretation of Earth Observation data.
But there is a very big ‘but’: the data that is supplied by the Earth Observation industry today is not suitable for automated interpretation by machine learning algorithms. It's fragmented, it's incoherent, there are all the atmospheric effects, there are several dozens of different satellite systems. They supply enormous amount of some sort of digital numbers, but computers don't understand this. There is a tremendous gap in terms of a consolidation of this data, and turning it into some kind of data that is comprehensible for automated processing pipelines, targeting certain commercial applications.
Sorry to interrupt you, but could you please also speak about the quality of the data? Bruno, my colleague and my co-founder, who’s deeply in the topics says that the quality of an algorithm is equally important to the quality of the data. And we focus a lot not only on the quality of algorithms, but on the quality of the data. Is it relatable to the Earth Observation too?
It's absolutely applicable to Earth Observation data, and it's definitely applicable to all machine learning data, Earth Observation in particular. And since you're a photographer, maybe this metaphor will resonate with you. What is important? The photographer? Or the camera? Or the optics? Or the scene? Or the lighting? There are all these components that need to come together to recreate the magic of photography, right?
Now to really create a magical moment, this magical photography, you need to have like all of these components, obviously starting with photographer himself. You need to have them top-notch. Every single component needs to be level match, right? Then you will come up with a magical result. To significant extent, this is the same with machine learning.
You need to have all of these components in place. If we are really stretching this metaphor a little bit, I would compare the algorithm, the network architecture to the camera. You may want to have a bigger fancier camera, but there is certain limit of return, diminishing returns.
Our there, there are already very smart algorithms and machine learning networks that can do amazing work. You need to have data, which I could compare to optics. If you have an amazing camera, but poor optics? If you have optics that distorts your scene - then there is nothing your camera can do to solve this. And this is somewhat similar in machine learning. I think it's even worse. But most important is the data science. The data scientist always comes first.
I feel that machine learning is a little bit of an art over science because it's the data scientist that needs to know and have the intuition of how to clean and consolidate the data and how to choose the right network and how to put all these components together and how to test and how to choose the objective function.
So always the data scientist come first, but obviously, he needs to make sure that the data is coherent and suitable for the target application. Just to finish this, bring it back to what we discussed: this is definitely not the case today with Earth Observation. Because the data that you get from all these different satellites is not properly calibrated, has all kinds of artefacts and distortions. There's not sufficient work that has been done to consolidate this data and make it suitable for automated machine learning applications.
What would be the main definitions for the perfect data, for Earth Observation data? Like for example, it should have perfect interoperability. We can have data coming from different satellites and still use it without additional preparation just to harmonize and homogenize it somehow. Or should it be fast? Or should it be smaller? Should it be better? Should be less artefacts? What would be three main characteristics that will drastically change the situation?
If you allow me, I would use another metaphor. There has been a debate whether algorithms such as super-resolution add information to the data. You have an algorithm, you have an input data, and then some magic happens. And how is it possible that the output, there is more information content than in the input? This has been a big dilemma.
In my view, it's very simple. You go to the train station, and you see a big clock. And you see it's 11am. What does it tell you? It tells you very different things depending whether you know the schedule or not. If you have the time schedule for the train - it tells you a lot, right? If you know that at 11:10am there is a train, or even if you know the train was at 10:55am and you missed it, right? This is a big deal. If you don't know the timetable - then it only tells you the time of the day, not that much.
So this is exactly the same, it's more than a metaphor, it's an exact situation. It's a situation where you see a data point, and depending on what is your prior knowledge, you can determine different amount of information from the same data point.
So now the question is, would it help you if you had a timetable from France and you're in Switzerland? Not that much, right? This is an exact situation, not a metaphor, exact situation with any machine learning.
If your prior doesn't exactly correspond to the data point that you observe, then your prior is useless. Your machine learning is useless if the statistics of your model, of your training data does not match data point. The only difference between timetable and the machine learning is that here, you would have to actually spend several days standing on this platform and recording the arrivals of the train. This is the training process.
However, if you did it in Paris, and then you are standing in a train station in Zurich - it’s not very helpful. This is exactly the situation with the Earth Observation data or any data in machine learning for that purpose.
So having the statistical distribution exactly corresponding to the data that you about to observe, is absolute key to the performance of any machine learning algorithm. You can absolutely add information, you can make magic with this machine learning algorithms if indeed your training data is perfectly matching your observed scenario.
Now there is another, in particular in Earth Observation, another challenge is that if you have a timetable in Zurich, but it's from last year. Then maybe it can help because you know that in the morning the trains arrive once an hour and it's somewhat similar and you can derive some information.
But it's still nowhere as good as having up-to-date timetable. This is exactly the situation with Earth Observation. That if you trained in this region, but it was on the data that was collected several years ago - it's exactly the same. So it's not exact answer to your question, but I think this is a one thing that is absolutely key.
Your training data has to be perfectly matched to your observed data, which sometimes is an enormous challenge to do. Particularly in Earth Observation, because in most cases, if you have data for what you observe, you don't need to observe it. It is a real challenge to find the training data that would perfectly fit your observed data. But if you have this, you can do real magic.
In your current work with cloud detection and elimination, you’re trying to address some of these challenges - could you tell more? What's the purpose of it? And what's your interest in there?
Well, working in the precision agriculture industry and trying to develop some of these commercial applications. I came to the realization that there is this enormous gap between the data that is available, and the data that would be necessary to solve some of these challenges.
I decided to focus on bridging this gap. I’m working on developing a stack of technologies, it's not just one tool, that would consolidate data, and would calibrate the data and make it suitable for machine learning applications.
This is something that to the best of my knowledge, have been overlooked by both. Obviously, there is a lot of work that has been done particularly by academia. But very little of it has been properly transferred to the industry.
There's an open gap there, even if you look at all of these maps, right? Of all the industry players, there is not even such a category. There are companies that collect data, companies that organize data. So, all these marketplaces is where all this data is indexed and organized. And there are companies that try to develop commercial applications, and in most cases, don't succeed very much.
But there is no category of people that try to actually calibrate and consolidate this data. So that's how I see my mission, this is what I’m trying to do.
This sounds really cool. I know that there is no such category, because we do part of the. job. It is very hard to place yourself in this market because indeed you are in the middle. I do believe it's absolutely essential that someone takes care of it. Because otherwise, the true potential is not used of the data.
If you allow, I want to loop it back to the beginning of the discussion. Here we are again: all the efforts that are being made right now in Earth Observation is about growth. It's about the efficacy and the amount of data, and very little has been done in terms of sustainability and efficiency of this industry.
The amount of capital that has been invested in launching new satellites, if compared to the amount of money that has been spent on developing commercial applications of this satellite imaging data, is maybe ten thousand to one - I don't know the exact number, but it's several orders of magnitude.
The investors somehow convinced that once you launch a piece of hardware into space - it will start streaming money into their bank accounts directly. And this is not exactly the case.
I think it's just about time: more focus should be spent on sustainability of Earth Observation industry, versus just the growth, i.e. more satellites, more data, more digital numbers. There's too much, and the balance is wrong.
So there are a lot of these challenges which are not being addressed, overlooked, or considered as a given. I hear a lot: “storage is cheap”. Well, it is not. For this market it is not. These kinds of questions are not addressed, because they're considered as done. Now it's solved, but solved for different markets, not for this one. And the price of the infrastructure satellite needs after starting to produce data, is at least 3-4 times more than the price of the device itself. That's something which is not being addressed, because it's less fancy.
To say "we launched a satellite" sounds really cool. But to say "we organized data management for all these spots for this satellite" - well, it's a nice tool. So I think that unfortunately, part of the hype is that people focus on beautiful things they understand, or they like, or that's nice to showcase. But not on the real job, which actually afterwards it's about.
Absolutely, I 100% agree.
I have one question left, which is a bonus question we ask everyone: "If you had to pick a word to define the society in the few years - what would that be?" What do you think will be a priority? Before it was the growth, not we need to stop because it’s starting to hurt us. What’s next?
I don't take on myself to say what will define the next 10 years. I can only say what I wish would be the major theme and that would be ‘reckoning’, some form of reckoning. Some form of equalization. Because as you mentioned, we made great strides in terms of growth and the innovation has been developing in leaps and bounds. It has been amazing, but we now need to somehow think about the implications of this growth and make sure that we can enjoy the fruits of this growth for some time, us and perhaps even generations to come. So definitely, I wish that we would enter some kind of phase of equalization and reckoning, where we'll refocus more.
Earth Observation is a great example of this, we just need to start thinking a little bit deeper about the ecosystem and the needs of this industry just beyond the shiny gimmick. Just dig deeper. People who are responsible for funding these things and investing in this, I would really hope that they would start giving it just a touch more thought beyond just the shiny new gimmicks, and they will need to start thinking how to make this industry sustainable, so it can also generate some tools for the society to become sustainable as a whole.
And again, as I said, I have absolutely no doubt that Earth Observation technology holds enormous potential, particularly to play a role in terms of sustainability. Because it can dramatically improve our ability to understand our environment and to react, to be a little bit collectively intelligent about our interaction with our wonderful environment. So, Earth Observation can play a very important part, but we will need to be a little bit wiser about how we support and build this industry forward.
Dotphoton
Dotphoton provides innovative image compression solutions for big image data. Dotphoton’s unique set of algorithms and cutting edge approach to camera calibration enables file size reduction by a factor of 6—10, while preserving the raw quality of images. Dotphoton's deep understanding of latest insights from the quantum information field ensures it stays ahead as the highly reliable partner, trusted by the European Space Agency, Bosch, and the leading biomedical centres across the world.