What you are talking about is essentially on-the-fly digital
compositiong. You see simple versions of this on nightly weather forecasts when the weatherman stands in front of a projected satellite map behind him. That is accomplished by blending together two separate source images: one of the map, the other from a camera shooting the weatherman in front of a green (sometimes blue) screen.
In simple terms, the electronics involved replaces the green background with the map. When people go to fairs and amusement parks and they have those attractions that place them in front of the green screen, that is the same technology.
While relatively inexpensive, one downside to this method is that whatever is greenscreened will always be superimposed on the foreground. You could not use this method to insert yourself into the movie scene unless you were happy walking in front of every piece of scenery.
Another downside is that it would be huge chore getting the lighting to match you and the scene you would be in. You would end up standing out like you were glowing or something.
Methods used in flims like Forrest Gump and the DS9 episode you mentioned require a little more work. The basic premises are the same: take two separate sources and combine them into one final scene. Except now the complication of digital editing comes into play. A digital artist would have to literally drop you into the background. If you were to walk behind an object, the artist would have to render or draw that object right over you, frame by frame. The artist would then also be able to alter the lighting and coloration of you to match the rest of the actors to make it look seamless.
I guess what I am trying to say is, yes, you could do it but the quality of the end result is dependant on what you would be willing to spend. The low-tech method could be achieved on the cheap by enlisting high school media kids or even simply a buddy with a digital video camera and the right software. It would look amateurish, but it can be done.
The high-tech, wow-look-at-that method would involve professional grade equipment in a studio lit and set up to track your movement to match the scene, then hiring digital video artists to work on compositing you into said scene...