Interview with Fedor Pikus - Chief Software Architect (Calibre LVS, DFM, PERC, LFD) at Mentor Graphics
This sounds like a question for hardware engineers, where it certainly generates a long list of cool things. I’m a software engineer, which means I don’t get to play with most of those toys. But I’m still an engineer, which means that if someone asks me “Do you know what time it is?” I can answer “Yes”. So a technically correct answer to your question is that my favorite hardware tool is a supercomputer, the bigger the better. I have to admit, it’s one of few rare areas where I do miss the “good old days” where we had Crays, Thinking Machines, Paragons, and other hardware like that. Today it’s orders of magnitude more powerful but less exciting: what’s bigger than a 1000-node cluster? A 5000-node cluster.
I’m old school, so Vi. Every now and then a new cool software tool comes along which has a chance of becoming my favorite, but they never last long enough. Purify was really cool, I used to run it almost every day and it saved me a lot of time, at the time it was definitely close to being a favorite. Then it fell behind in compiler and OS support, to the point that it does not work anywhere that is useful for me. Sun Studio performance analyzer, when it came out, was like the coolest thing since assembler: a performance analyzer which actually works in real time without delaying the program by 10x and is accurate. The implementation was way better than any other profiler which relied on hardware counters, I mean it was really the “profiler done right”. That was great while it lasted. Now Oracle pretty much dropped support for Solaris for X86, and the profiler is showing its age. There were few more tools like that, really cool while they lasted, and then they were gone.
Race conditions and other concurrency bugs are definitely the hardest, but they are not really interesting to talk about. I’ve had some bugs which show up only under extreme capacity requirements, for example a bug is triggered when a particular data collection reaches 128GB (yes, it’s GB, and of real RAM, not disk, in EDA we have to deal with such things). Very hard to find because it takes a while to plow through that much data in a debugger, but again not that exciting.
Here is a cool one: I’ve had to fix a crash in the program which happened only on Itanium CPU. The crash always happened on the same line of code, which processed two consecutive elements of an integer array, basically like this: a(i) + a(i+1). The line was in a loop, and the first pass always succeeded, but the second one always failed. In a case like this, one looks for memory corruptions, for overrunning arrays, but it was none of that. Turns out that Itanium has a long word load instruction which can load a long or a pair of consecutive integers, and the compiler detects that the array elements are consecutive, and loads both a(i) and a(i+1) in one operation. This is kind of nice, but the instruction also requires 8-byte data alignment. The first time, a(0), it works, but the second time the word begins with a(1) which is ona 4-byte boundary. What made the problem really weird was that on another machine with the same CPU and compiler the program did not crash, just ran really slow. Turns out that the OS can intercept the would-be crash from a misaligned load, the kernel then does the load byte-by-byte and returns control to the program. The difference between the two machines was, of course, that one of them had that handler enabled.
Several C++ books (“The C++ Programming Language” by Bjarne Stroustrup, “Effective” books by Scott Meyer and “Exceptional” books by Herb Sutter, “Modern C++ Design” by Andrei Alexandrescu (and I actually read it :), “C++ common knowledge” by Steve Dewhurst. Computational geometry books, the “Programmer’s Bible” a.k.a Knuth’s series of books “The Art of Computer Programming”, and few current periodicals (Linux Journals, Transactions of ACM on CAD, IEEE Spectrum, few more).
You know, I thought about that a lot, how do I solve the problems? Can I verbalize it and make it concrete to the point that I can teach it to others, or is it unique, personal?
When it’s a design problem, I think I do have some tricks. I start from the top but I decompose the problem into parts, compartmentalize it. I specifically look for contradictions. In a way, inventions are all about resolving contradictions, when something must and must not be a certain way. Most of the time we don’t consciously become aware of the contradictions, we just see the problem as impossible. But once you unmask the contradictions, you can separate them: nothing can both be and not be in the same place at the same time, but you don’t always need same place and same time, so you separate contradicting requirements in space or time. A very simple example is a cardboard box with books: I want the box to be light and not add weight, but I also want to to lift a loaded cardboard box, and light is not strong, so the bottom will fall out. Strong bottom will be thick and heavy. So I want it strong, but not strong. The solution is, of course, known to every mover: you thread strong ribbons under the box, and now the bottom is strong in enough places that you can lift the box but not strong in most places so the box is light. Separation in space.
For writing code, I don’t have any particular tricks, just a lot of experience and proficiency with the programming language and the domain. One thing I do is every time I revisit the code, for whatever reason, I read it, and if it was not easy to understand I write down in comments whatever it was that I thought was not worth mentioning when I wrote it, but turned out to be hard to figure out.
When it comes to debugging, It’s really hard to nail down, very often it’s just intuition. It happened to me more than once that I would go into a program with many thousand lines and say, let’s look at this function, what does this loop do again? And that’s where the bug would be. I can’t really explain why I went there first, something did not look right but not on any logical level.
I’ve been fortunate to have more than a few projects I liked, over the years. The last favorite project is the one related to my talk at DesignCon on EDA software integration. The goal of the project is to create an integration framework, a nexus which allows different design and verification tools to exchange information. The key breakthrough here is the understanding that such system is more than a sum of its parts, there is a sort of “emergent capacity” phenomenon going on: usually integration platforms are viewed as a convenience and ease of use feature, nice to have. But the framework we created does something different and new: it allows to solve new class of problems. Even more importantly, it allows to express new problems: you can’t solve a problem which you cannot talk about. The “vocabulary” of the EDA tools is quite limited: a DRC tool has all the expressive capability you need to describe layout geometries and their relations, but try to bring in a circuit concept like “differential amplifier”, there is just no way to express it. A logic analysis tool, like LVS, certainly knows about circuits and has the language to describe them, but it lost most of the layout information. How would you program a rule which says “a differential amplifier must be totally encircled by a guard ring with width no less than 0.1 micron”? No single tool gives you the language rich enough to do it. There may be some new specialized tools which address one particular problem or another, but then the user has to integrate them into the flow with existing DRC and LVS and other tools which check the other 99% of all rules, maintain consistent rule decks in several different languages, and so on. So we came up with a way to integrate existing EDA tools from different domains to solve such problems which do not belong to any particular domain, but straddle several ones, mix different design and verification paradigms. The “multi-paradigm EDA”.
It’s not easy to blow up stuff in software. Unless you write software for a nuclear reactor, I suppose :)
Would a virtual explosion do? I once wrote an adaptive memory allocation system which analyzed current memory use of the program, predicted likely future memory use based on the past history and some heuristics applied to the data to be processed next, and decided on the optimal memory allocation (which temporary memory to delete, which to keep, whether garbage collection would pay off now or later, that sort of stuff). So I had a tiny AI managing the memory. Why? Remember the 128GB earlier? When you work with hundreds of gigabytes of data, memory management becomes critical. Anyway, one of the byte counters which tracked memory allocated over long time could have an integer overflow, which made it start from 0 again. The result was that the system was given the input that, once it allocates enough memory, the overall memory use will go down with every new allocation. The AI found the quickest way to allocate new memory with as little computations as possible, and the memory allocation just exploded.
As a chief architect for several products in the Calibre family, I often work on several things at once. But the “milti-paradigm EDA”, the integration nexus I described earlier, takes the lion’s share of my time now. We are extending it to tackle new areas where multi-domain problems exist: electrostatic discharge, electro-migration and reliability, analog design verification, and more.
Well, obviously I hope that my project will succeed and become the jewel of our business for years to come :)
Seriously, I think that discovering new problems to solve, new ways we can make the IC designers and semiconductor manufacturers more productive, is going to be more and more important. There will be continuing emphasis on tool performance and capacity, to be sure.
For me, “our industry” usually means EDA industry. I see one main technological challenge and one structural challenge. The technological challenge is not really new: at least for the last 10-15 years, the race between the complexity of the designs and the performance of the EDA software was neck and neck: the current software running on the latest generation of processors was able to handle the next generation of processors, barely, with no room to spare (of course, not all chips are processors, but I include other large chips contemporary to a generation of processors, like graphics and cell phone chips). So there was always strain there. Now the performance of the individual processors does not grow nearly as fast as it used to, and processors are also not the largest chips any more, so the strain is getting worse. Software efficiency is increasing at pretty consistent pace, and we added concurrency to the mix to make up for not-growing processor speed with their growing numbers.
The other challenge is structural, and affects not just the EDA industry but the IC industry in general: the consolidation and its impact on the supply chain. Ten years ago, if you had new cool software, you had to woo the users: layout designers, process engineers, verification engineers. If you made their job easier or let them do something they could not do before, you gained acceptance. Now the first question is, “does my foundry support your tool”. But the foundries have different interests, they have their own business to run. Sure, they want their customers, the chip creators, to be successful, but to the extent and in ways that help the foundries make money. So you have fewer and fewer “decision points”, forks in the road which determine whether you succeed or not, and less control over which path is taken at the fork. I clearly see signs of pain as the industry struggles to adapt to this environment.