Most of these guidelines are taken from Prof. Edward A. Lee's guidelines for projects in his Specification and Modeling of Reactive Real-Time Systems course at the University of California at Berkeley:
Project - 60% of final grade
15% - Topic research - Literature survey
15% - Results
15% - Project Presentations- 30 minute presentation during finals week
15% - Project Reports
Grading requirements
I'm kidding, though your project will heavily determine your final grade. If you do an outstanding project, you will get an A. If you do an mediocre project you will not get an A. I realize that you can do a lot of work and perform "good" research and not get "good" results. If this is the case (and I will be able to tell how much thought and work you put into your project), then you will get an A.
The course will give you an overview of several areas in embedded systems. Your project will be an in-depth study of one of these areas. I expect that the project will lead to a publication at a workshop/conference. Therefore, I expect that the final report and presentation will be of conference quality. In order to achieve this goal, you must start early. There will be 3 project reports to "insure" that you do start early.
The first report is a literature survey into the area that you wish study. It requires that you perform extensive searching of current and past research related to your topic. The report is maximum of two pages and briefly discusses the relevant works in the area. Additionally, it will describe how your work will differ from these works i.e. the novelty of your project. The literature survey should be similar in style to the "Related Work" sections found in most typical conference and journal papers. It will be due February 4.
The second report is a progress report. The report should be organized
in section as follows:
- Background: description of what is the reason for your project
- Project Goal: what is the goal, what is your project novelty
- Approach: how you think to tackle the problem
- Work: What have you done so far
Please try to be clear and schematic.
Maximum length: 3 pages
The final report is due during finals week (exact date TBD). It should be of conference quality. That means that it is formatted in IEEE conference proceedings format, 2 columns, 10 - 12 point font, with abstract, introduction, related work, project overview, background: description of what is the reason for your project, project goal - what is the goal, what is your project novelty, the approach you followed, the results you obtained - please stress difficulties you encountered and limitations of your approach, conclusion and bibliography.
It is important that you pick a topic that interests you. If you can relate your current research to some embedded system topic, that is even better. However, your project must be new work; you cannot solely use results from old work.
Expanding Template Generation and Matching using Predicated Execution
The first area involves techniques to expand the functionality of the versatile parameterizable blocks. We are restricted by control flow as the VPBs must execute atomically. But, if we could use predicated execution, i.e. speculatively perform part of the operations of the VPB, then we could be able to increase the complexity and occurrences of the VPBs. This would lead to gains in computation speed and power consumption, as we would perform fewer operations on the slow, power hungry fine-grain reconfigurable fabric
Region Formation for Hybrid Reconfigurable Architectures
Another area of future research is the formation of regions – portions of the control data flow graph. The formation of regions has been studied in Very Long Instruction Word (VLIW) architectures. Yet, it has never been looked at in the context of hardware compilation for reconfigurable architectures. There are many fundamental differences between VLIW architectures and reconfigurable architectures. The main difference is that reconfigurable architectures allow different functional units various times during the running of the application and a possibly different set of functional units for different applications. The VLIW architecture is permanently fixed. I believe that this changes the problem in a basic manner and the VLIW region formation algorithms must be revisited for reconfigurable architectures.
Speculative Scheduling in Control Data Flow Graphs
Speculative execution executes an operation before control flow dictates that it executes. It is helpful to increase the parallelism, which in turn increases the hardware’s efficiency. There are many factors to consider, including the critical path, the average number of operations to each of the exits, and increasing the number of operations in the application code. Once again, reconfigurable architectures present different challenges than other types of architectures making this a novel research area.
Predicated Execution and Hyperblock Formation for Hardware Synthesis
Speculatively executing operations and creating hyperblocks has been studied for for VLIW computers. But, it has not been looked at for hardware synthesis. You could identify the differences and between these two areas and develop algorithms for hyperblock formation. Additionally, you could look at different methods of predicated execution. Should there be a predicated bit for each register as in done in VLIW processors or is a different scheme better in hardware synthesis?
Application Partitioning/Estimation for System-on-chip
The majority of the research on hardware/software partitioning assumes a single processor (software) and an ASIC (hardware) on the same chip. Future computing systems will have many different types of computing elements – ASIC, many different processors, FPGA and other programmable hardware. We need partitioning methods to determine the computing element where each portion of the application should run. In order to quickly and accurately perform this partitioning, we need estimation engine for each of the computing elements. The estimation engine takes a portion of the application and determines different metrics (e.g. power consumption, speed, die area, utilization) for each computing element.
Memory Synthesis and Management for System-on-chip
The performance and location of the system’s memory components have an enormous role in the throughput, latency, power, area and other performance characteristics of a system. Often, the components of the system have embedded local memory elements, e.g. embedded RAM of an FPGA. Efficient use of these elements is paramount to the performance of the application on the device. Additionally, different memory hierarchies can affect the area and timing characteristics of the system. A memory hierarchy for one application may not be suitable for another. Synthesis techniques that take into account the distinct features of the application are an interesting and powerful technique for optimizing system performance. [Liao96] is some previous work in this area.
Hybrid Local/Global Controllers
A global control for determining the execution of the hardware resources is beneficial as the circuit is directed from only one location and one controller. Yet, the centralized nature of a global controller has the drawback of having to connect to every resource of the circuit. This may cause routing and signal delay problems depending on the size of the circuit. A set of local controllers alleviates this problem, yet introduces the problem of synchronization among the controllers. A meet in the middle approach would have a small number of “global” controllers that direct local controllers. This could reduce the routing complexity of routing without the need for a large amount of synchronization. This hierarchy could be extended even further, where there are multiple levels of global and local controllers, where synchronization is done within each level and the control is between two adjacent levels of the hierarchy. An automatic push-button synthesis for hybrid local/global controllers would be interesting and beneficial to the design of digital systems.
Partitioning to Minimize Data Communication
The partitioning would group control flow graph nodes into regions, where these regions are executed on different parts of the system (e.g. each region could be implemented on a DSP, ASIC, FPGA). By choosing regions that minimize the data communication, we reduce the amount of memory transfers in the system.
Pipelining Instruction Generation
The instruction generation problem focuses on finding regularity at the instruction level. The generation of instructions creates "super" instructions - set of parallel or sequential operations. The sequential nodes are prime candidates for pipelining. Since the instructions are executed frequently (hopefully frequently enough to keep a pipeline full), pipelining the generated instructions will increase the performance of the system.