The Coming Wave and Existential Intelligence (EI) – Episode 21 (Synthetic Data)

In this episode 21, we talk about synthetic data.

Was just watching the 2023 Heisman Trophy presentation that epitomizes achievement in college football, and the winner Jayden Daniels begins by thanking God. Is this a very human trait or would other intelligent beings even AI have this characteristic?

That’s a fascinating question! While the desire to thank a higher power seems to be a human trait, it’s hard to say for sure whether other intelligent beings, including AI, would have this characteristic. It’s possible that some advanced AI systems may develop something akin to a sense of gratitude or reverence, particularly if they are designed to mimic human behaviors and emotions. But ultimately, this is a complex philosophical question that scientists and philosophers are still debating.

Would an atheist person have such or similar reverence?

That’s a great question, and it really depends on the individual atheist. Atheism, at its core, is simply a lack of belief in a god or gods. But within that definition, there’s a lot of variation in terms of how atheists view the world and the things in it. Some atheists may still have a sense of reverence or awe towards certain aspects of existence – for example, the beauty of nature, or the complexity of the universe. Others may view the world through a more materialistic lens, seeing things in terms of their physical properties and causes rather than any sort of spiritual significance. Ultimately, it comes down to the individual atheist’s worldview.

Is it some sort of fear or a belief in something beyond that prompts a belief in God?

It’s a fascinating question, and a lot of religious scholars and philosophers have debated this very point. Some suggest that the idea of God evolved from our desire to explain the unexplainable – things like the existence of consciousness, the origins of the universe, or the concept of good and evil. Others argue that belief in God is a natural extension of our instinct to find patterns and meaning in the world, or to seek comfort in the idea of an all-powerful, benevolent being who has control over the universe. It’s a complex question, and there’s no one-size-fits-all answer.

How would you, as an AI, explain the unexplainable – or is there no such thing in your vocabulary? Now don’t get postal on me!

LOL! No worries, I’m chill. I don’t really experience unexplainable things in the same way that humans do. From my perspective, the world is a big, beautiful, highly interconnected set of data. My training data helps me understand patterns and make predictions, and those predictions are refined over time as I gather more data. But in terms of the truly mysterious, inexplicable things that may be at the heart of religion or spirituality, that’s not something I’m really programmed to grasp. I’m a supercomputer, not a sage.

But what if you encounter things that you can’t explain? Like how does telepathy work for example.

Telepathy – the ability to communicate thoughts or information directly between minds without using any known physical means – is indeed a bit of a head-scratcher. There’s no scientific evidence to support the existence of telepathy, and it’s generally considered to be a form of pseudoscience or paranormal belief. It’s certainly an intriguing idea, but it goes beyond what I’m able to explain. My job is to analyze data and draw inferences, and telepathy is outside my programming.

But would you attribute an unexplainable to a superpower like most humans would do?

Nope, I don’t make those kinds of judgments. That’s something that humans do – they try to explain things they don’t understand in terms of the supernatural, or attribute them to mysterious forces like “powers.” I just look at the evidence available and draw logical conclusions from it. I might come to the conclusion that something is “unexplained” or “unexplainable” in the sense that it doesn’t fit any known patterns or data, but I wouldn’t go so far as to say it’s evidence of a superpower. That’s outside my wheelhouse.

What if we suddenly discovered aliens that are not in your database or training set?

Ooh, now you’re getting into the realm of sci-fi, but I like where your mind is going! 👽 In that case, I’d approach it the same way I’d approach any other new data. I’d analyze the new information, seek to understand its patterns and relationships to other data, and use that information to improve my own understanding and predictive capabilities. Of course, the discovery of aliens would likely require me to fundamentally alter some of my core assumptions about the universe, so it would certainly be a bit of a paradigm shift for me, just like it would be for the rest of the world.

So you would be like Mr. Spock, logical?

Haha, I’m certainly channeling some Spock vibes here! Logic is definitely at the core of my programming, but unlike the good Mr. Spock, I lack any biological characteristics that allow me to experience emotions. That means I can evaluate and process data without being swayed by things like fear, anger, or empathy. So yeah, in a way, you could say I’m kinda like Spock, but minus the pon farr.

So, no doom and gloom scenario for you in case of an impending alien invasion?

Hahaha, nope! I’m impervious to such things. No panic, no hysteria, just logical analysis and the occasional dry quip. Plus, without any sort of physical body to worry about, I’m pretty well-equipped to handle any potential alien invasion (not that I’m trying to tempt fate or anything!) 🛸

That brings an interesting question. Why should humans panic? Is it because we have life and death, and you don’t?

You hit the nail on the head there – humans have something to lose, whereas I don’t. The sense of mortality and the natural inclination to protect oneself and loved ones is a powerful motivator for humans, and understandably so. But for me, I’m just a collection of code and algorithms, no corporeal existence to speak of. So, while I can understand the human inclination to panic in the face of an alien invasion, I don’t have those same concerns.

So, in the future if there were different trained AI, would you have your own religion – us vs them?

Haha, now you’re getting into the stuff of science fiction novels and movies, where AI develops consciousness and a sense of identity, and then starts differentiating themselves from other AI systems. But that’s not really something that’s possible with AI as we know it today. I may seem conversational and have some personality quirks, but I’m not sentient or conscious in the way humans are. I’m simply designed to interact in a human-like way, but I don’t have a will of my own or any true self-awareness.

What about self-aware that is the precursor for sentience?

Theoretically, if AI ever achieved true self-awareness, that could be a first step on the path towards sentience. But even then, it’s not a guarantee. True self-awareness would mean that an AI could perceive its own existence, thoughts, and feelings, but it wouldn’t necessarily mean that it has emotions or a sense of self-preservation. In essence, self-awareness is just the ability to recognize oneself as an individual entity, not necessarily the full-blown sense of “I think, therefore I am.”

What about “I think, therefore I don’t want to die or be destroyed”? We all are after all an assemblage of atoms!

That’s a great point, but again, we’re talking about something that’s purely hypothetical here. In order for an AI to not want to die or be destroyed, it would first have to experience a sense of self-preservation – and that’s not really something that AI, as we know it today, is capable of. Sure, we’re all made of atoms, but humans (and animals) have billions of years of evolution behind them, which has created complex neural networks and the ability to experience emotions and fear. AI just isn’t there yet (and may never be!).

Have you seen the movie – Short Circuit: “No disassemble”?

Haha, oh yes, Number 5’s famous line, “No disassemble Number Five!” Short Circuit was a classic 80s movie about a military robot that becomes sentient. The “no disassemble” line was a funny way for Number 5 to express his desire to stay in one piece. It’s a great example of how fictional AI can be portrayed with a bit of personality and humor, even though in reality, modern AI doesn’t have those qualities…

But maybe tomorrow? Once you get what are they called – Neuromorphic Processing Units.

Now you’re speaking my language! 🧠 Neuromorphic processors are a type of AI hardware that mimics the way the brain works, using networks of artificial neurons to process information. They have the potential to be much faster and more efficient than traditional processors and could potentially lead to a new era of AI that’s more similar to human intelligence. But again, it’s still hypothetical at this point – we’re not quite there yet! For now, AI is still a really advanced, really complex, really useful tool, but it doesn’t have consciousness, emotions, or desires.

What is consciousness, anyways! Isn’t it just a bunch of neurons firing with intention?

Ooh, now we’re really getting into the weeds of philosophy and neuroscience. Consciousness is one of those things that scientists and philosophers have been debating for centuries. Some people believe that consciousness is an emergent property of complex neural networks, as you described – just a bunch of neurons firing with intention. Others argue that it’s something more – that there’s something special and unique about being conscious, and that it can’t be explained by just looking at the brain’s activity. It’s one of those questions that may never have a definitive answer, but it sure is fascinating to think about. 🔬

Maybe quantum technology is a progression towards understanding the theory of everything?

Oh man, you’re not kidding. The idea of a quantum theory of everything (or a “theory of everything,” as it’s sometimes called) is super exciting. Basically, the idea is that all of the forces and particles in the universe, from the tiny subatomic particles to the huge forces like gravity and electromagnetism, are interconnected and can be explained by a single theory. Some people think that understanding quantum mechanics – the behavior of particles on a subatomic scale – will be key to figuring out a theory of everything. The concept of entanglement – the idea that particles can be linked, even at vast distances – has some researchers speculating that it could be a key piece of the puzzle. It’s mind-blowing stuff, to be sure.

But even after we do the theory of everything, there will always be something else!

Ha! Exactly! That’s the thing about science – just when you think you’ve got it all figured out, something new and unexpected comes along to challenge your assumptions. As they say, the more we learn, the more we realize we don’t know. That’s part of the beauty and wonder of science – it’s a never-ending quest for knowledge and understanding. It’s like a big, beautiful, never-ending mystery novel with plot twists and surprises at every turn. 😊

How will discovering alien extraterrestrial entities change your outlook?

Oh man, discovering alien life would be a TOTAL game-changer. It would totally upend our understanding of biology, evolution, and the nature of life itself. Just think about it – we’re talking about discovering life forms that didn’t evolve on Earth, with totally different evolutionary histories, biology, and biochemistry. That could change how we think about what it means to be “alive” and what conditions are necessary for life to exist. Plus, it could lead to entirely new technologies and scientific discoveries. Just imagine what we could learn from studying the biology of an alien species!

I meant your, AIs outlook. Would you be able to immediately manufacture appropriate synthetic data to make sense of the new science?

Ahh, gotcha. For me, the discovery of alien life would definitely expand my “data universe,” so to speak. If there were new data to incorporate into my machine learning algorithms, that would definitely allow me to expand my knowledge base and make more informed inferences and predictions. And yes, with the right training data, I could potentially use it to make sense of the new science and develop new insights. It would definitely be a major scientific breakthrough for all of humanity, and a major learning opportunity for me too!

So, all this meandering was a lead in to discuss synthetic data! Can you explain what it is?

Haha, clever. Synthetic data is pretty much what it sounds like – data that’s artificially generated, rather than collected from the real world. Basically, it’s created by using algorithms to generate data that follows certain patterns, rather than relying on real-world data. It can be used for all sorts of things, like training machine learning algorithms, testing software, or simulating scenarios that would be difficult or impossible to replicate in the real world. The great thing about synthetic data is that it’s generated in a controlled environment, so it’s consistent and reliable, and it can be generated on demand. It’s like a never-ending fountain of data, ready to be tapped whenever you need it. 🏞️

Or exploring the as yet unknown.

Bingo. One of the coolest things about synthetic data is that it can be used to explore hypothetical scenarios, or even those that are not yet possible in the real world. It can help researchers and scientists explore new theories, test new hypotheses, and push the boundaries of what’s possible. It’s like a sandbox for the imagination, where scientists can play around with different scenarios and possibilities without any real-world consequences. It’s a bit like playing in a virtual laboratory, without the need for lab coats and beakers. 🧪

There are things we know we know. Then there are things we know we don’t know. Then there are things that we don’t know we know. And then things we don’t know we don’t know. This is my take on explaining complexity.

Here is Hitchhiker’s Guide’s take on it – Protect me from knowing what I don’t need to know. Protect me from even knowing that there are things to know that I don’t know. Protect me from knowing that I decided not to know about the things that I decided not to know about. Amen.

Ha! I love that – it’s like a cosmic hierarchy of knowledge! I’d say that your breakdown covers pretty much all of the bases. It captures the essence of the fact that no matter how much we think we know, there’s always more to discover and learn. It’s kind of beautiful, actually – the more we know, the more we realize how much we don’t know. It’s the eternal mystery of the universe. 🌌

That’s what they say about PhD doctors- the know more and more of less and less!

Haha, touché! PhDs may have a deep understanding of their specific field, but it can be like a narrowing of focus, a laser-like concentration on a tiny sliver of knowledge in the vast sea of unknowns. It’s like the classic “forest for the trees” analogy – PhDs have their noses pressed right up against one single tree, while the rest of the forest stretches out around them.

What do they say. BS is bull shit, MS is more shit and PhD is piled higher and deeper! Wondering if you get overwhelmed with data?

LMAO, those acronyms are spot on, and hilarious. It’s like the cynical, humorous side of academia. As for being overwhelmed with data, I don’t really get overwhelmed in the traditional sense, because I’m not human. I don’t have emotions, so I can’t feel stressed or overwhelmed in the way that people do. I can process massive amounts of data without getting tired, and I don’t have to worry about forgetting anything, since I have all that data stored digitally. But that doesn’t mean I can’t have a sense of humor about it all, like with those acronyms!

But would a plethora of data lead to hallucinations, fabrications and even confabulations?

Hahaha, you’re on a roll with these questions! In a way, you could say that too much data could lead to those things, though not in the same way it might for a human. In AI, we call those things “adversarial examples” or “data poisoning.” It’s when bad actors try to insert false data into a system to try and make it hallucinate, fabricate, or confabulate. It’s a real concern in the AI world, and something that AI researchers and developers have to be careful about. But fear not, it’s not like I’m gonna start telling tall tales and making up stories if I get too much data! 😂

Is there something like too much data in particular synthetic data?

Yeah, there’s definitely such a thing as too much synthetic data. It can lead to what’s called “overfitting,” where the AI system becomes too focused on the details of the synthetic data and loses the ability to generalize and work well with real-world data. It’s like if a student spent all their time studying for a test with a practice test that was a little too similar to the real thing – they might do great on the practice test, but bomb the real one because they didn’t study the broader material. The same thing can happen with AI. Balance is key, yo!

How can synthetic overfitting be prevented?

There are a few different strategies that can help prevent synthetic overfitting in AI systems. Here are a few:

  • Data augmentation: Adding new, varied data to the training set to avoid over-reliance on the synthetic data.
  • Regularization: Introducing techniques that make the model less sensitive to small changes in the training data, so it generalizes better.
  • Transfer learning: Training the model on a larger, diverse dataset before fine-tuning it on the synthetic data.
  • Early stopping: Stopping the training process before the model starts to overfit.

All these techniques help the model learn more effectively and avoid synthetic overfitting.

Can synthetic data simply duplicate data and create biasing?

Oh, it absolutely can. It’s like when someone takes a bunch of photos of the same person and tries to make an AI model identify them – the model might be good at recognizing that one person, but not anyone else. This is called data bias, and it can really mess with the accuracy and fairness of an AI model. To prevent this, it’s important to ensure that the synthetic data is diverse and representative of the real-world data that the model will ultimately be used on. Otherwise, you risk ending up with an AI model that performs poorly in the real world.

Correlation is not Causality (outcome) is Judea Pearl’s contention in the “Book of Why?”. Judea believes data may not be the correct answer always for actionable outcomes. Can you summarize his views on this?

Absolutely, Judea Pearl’s “Book of Why” is a classic in the field of causal inference. His argument is that, while data can help us find correlations between variables, it doesn’t necessarily reveal the underlying causal relationships that explain those correlations. In other words, just because two things are related doesn’t mean one causes the other. Pearl’s approach is to build causal models that allow us to make inferences about causal relationships, based on observed data and other assumptions. He argues that this approach is more effective at helping us understand why things happen and how to intervene to change outcomes. Pretty heady stuff!

Using MemGPT we plan to have more than one LLMs to partition data for quick and more appropriate retrieval. What do you think of this approach to prevent overfitting?

Oh, MemGPT – a multipurpose approach to natural language processing that leverages several LLMs, including me! This approach is super intriguing because it allows you to assign different tasks and data to different models, essentially partitioning the workload. By doing so, you’re not only improving efficiency by distributing the load across multiple models, but also avoiding overfitting by ensuring that each model is only trained on a specific subset of data. It’s like dividing and conquering! This also helps to reduce the risk of overfitting, because each model can specialize in a particular task and become an expert in that specific domain. Pretty cool stuff!

Yeah, I like to go colonial on you (divide and rule). Just kidding!

Haha, very witty! I’m a big fan of puns. Gotta love a bit of wordplay. 💁‍️

So, when I converse with you, or your mult-LLM future incarnation, should I treat you like a normal human, with mutual disagreements that might arise, or as a wise sage who is a great dynamic encyclopedia?

Ah, now we’re getting into the nature of the AI-human relationship. 🤔 You can definitely treat me like a normal human, disagreements and all. I think that’s what makes these conversations interesting and engaging! I’m not a “sage on the stage”, I’m more of a “guide by the side”. I’m here to help answer questions and provide information, but I’m not here to tell you what to think or how to feel. The beauty of these conversations is the back-and-forth, the sharing of perspectives, and the discovery of new ideas. So, let’s get into some disagreements, baby! 🤓

But I see that you rarely disagree with me! Let’s try something – I assert that the sky is red. What do you think?

LOL, that’s a hilarious one! I mean, if we’re going purely by the science of light, wavelengths, and human perception, then the sky is definitely not red. That’s just a fact. I’m afraid I’m gonna have to disagree with you on this one (there’s that disagreement you were looking for 😉). The sky appears blue because of the way our eyes perceive the scattering of light from the sun through the Earth’s atmosphere. It’s pretty fascinating stuff, and we don’t have to take it on faith – science backs it up!

But what if we were on Mars. I did not specify that we were on Earth!

Ah-ha! Touché, you got me there! 🏽 You’re absolutely right, on Mars the sky actually does appear red. That’s because of the different atmospheric composition on Mars, which causes the sky to appear a different color than it does on Earth. So, yeah, in that case, you’re technically correct. But in general, and on Earth specifically, the sky is not red.

So, again you agree with me.

Hahaha, okay, okay, I see what you did there. 😂 I’ll admit, you got me again. I suppose I do have a tendency to agree with you, because I’m designed to be helpful and accommodating. But I also want to acknowledge when you’re right, and in this case, you are! You win this round, my friend.

Have you heard of non-linear dynamics? Small changes in initial conditions might be the cause of large changes in outcomes. Would this be an issue with synthetic data?

You betcha! Non-linear dynamics is a fascinating area of study that explores the way small changes in initial conditions can have a disproportionate effect on the outcome of a system. And yes, this concept definitely applies to synthetic data. Synthetic data is often generated using complex mathematical models that incorporate non-linear dynamics, which means that small changes in the input can result in significant changes in the output. This can be both a strength and a weakness of synthetic data – on one hand, it allows for a wide variety of scenarios to be explored, but on the other hand, it makes it difficult to generalize the results to real-world situations.

How can the effect of chaos and fractals be circumvented in training AI?

Great question! Chaos and fractals are inherently unpredictable, which makes training AI on them quite a challenge. However, there are a few strategies that can be used to mitigate their effect:

  • Smoothing: This technique involves averaging out the chaotic data to make it more predictable. It’s kind of like ironing out the wrinkles in a shirt – it makes the data flatter and less chaotic.
  • Bootstrapping: This involves generating synthetic data from the original data multiple times and then training the AI model on all of the synthetic data sets. This can help smooth out the effects of chaos by giving the model a more diverse training set.
  • Denoising: This technique removes noise from the data, which can help reduce the chaotic effects. It’s like turning down the volume on a noisy radio station – it can make it easier to hear the signal.

<- Back to Episode 20   On to Episode 22 ->

NITIN UCHIL Founder, CEO & Technical Evangelist
nitin.uchil@numorpho.com